Re: Reducing unncessary string copying



On Mon, Feb 20, 2012 at 7:08 AM, Enrico Weigelt <weigelt metux de> wrote:
>
> Hi folks,
>
>
> I've observed in many C applications, often compile-time defined
> data like strings are copied unnecessarily.
>
> For example, if you pass names to certain objects (eg. opening a
> file, creating a window, etc) that will never change of disappear
> during the process' lifetime (eg. compiled-in), those strings
> dont need to be copied.
>
> Now the big question becomes: how to decide this ?
>
> I'm currently experimenting with a new string reference datatype
> that has an extra bit which tells whether the string is static.
> Essentially a struct consisting of the char* pointer and an
> extra byte for the flag. That struct then will be used instead
> of the const char* pointers. A few inline functions handle the
> conversion from/to normal const char* and on-demand copying.
>
> Just some theoretical example:
>
> Some function
>
>    FOO* create_foo(const char* name)
>
> now becomes
>
>    FOO* create_foo(GCStr name)
>
> and the call now would be
>
>    create_foo(G_CSTR_STATIC("hello world"));
>
> in case we dont have an "static" string, but something with
> limited lifetime, it could look like this:
>
>    create_foo(G_CSTR_DYNAMIC(bar));
>
> Inside create_foo() we'll then replace g_strdup() by some
> G_CSTR_COPY() and g_free() by G_CSTR_FREE(). These functions
> will know whether the string has to be copied/free'd.
>
>
> Let's just leave the question of API-compatibility aside,
> what do you think about this idea ?
>

We already have a facility for something like this in GLib in the way
of string interning (GQuark), which just inserts the string into an
hash table and uses the key anytime you wish to refer to the string
(or uses the value stored in the hash table, in the case of the actual
g_intern_(static_)string() funcs).

We could just start passing the Quarks around everywhere (e.g.
gtk_label_set_text_quark (GtkWidget* widget, GQuark quark);) ... but
unless your application has millions of strings, you're probably not
going to see any reasonable improvement (maybe a few hundred kilobytes
in allocations over the lifetime of the running application; even with
an application the size of Nautilus, I wouldn't imagine it being much
of a savings).

My advice is if you're really interested in this, try it in the small
scale. Write a patch to GtkLabel to make it support taking a GQuark
argument, then port your favorite application to use it, and see how
big of a difference this actually makes with Valgrind's memory
profiling tool Massif. If it makes a big enough difference, open a bug
and post your results.

-A. Walton

>
>
> greets
> --
> ----------------------------------------------------------------------
>  Enrico Weigelt, metux IT service -- http://www.metux.de/
>
>  phone:  +49 36207 519931  email: weigelt metux de
>  mobile: +49 151 27565287  icq:   210169427         skype: nekrad666
> ----------------------------------------------------------------------
>  Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
> ----------------------------------------------------------------------
> _______________________________________________
> gtk-devel-list mailing list
> gtk-devel-list gnome org
> http://mail.gnome.org/mailman/listinfo/gtk-devel-list


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]