Re: glib: gmain, perf, etc...



>>>>> Tim Janik <timj gtk org> writes:

>>  - Memchunks appear to be really slow.  They actually show up in the
> memchunks are scheduled for removal/replacement for this and a
> couple of else reasons. i unfortunately didn't get a chance to
> tackle this for 2.0 though ;(

>>    I see the trashstack, but it's not clear why the trashstack logic
> right, i needed something faster than memchunks to maintain free
> lists of structures where caching made lots of sense. that's when
> trash stacks went in.

So my own pooled allocator works thusly:

 - Allocators are #defined into existence on a per-type basis (ie,
   templates).  I use macros; inlines would work as well.

 - The _alloc() function returns the top one off the list (allocating
   another block's worth and list-ifying them if needed).
   (1/blocksize function calls)

 - The _free() function just puts the thing back on the list.  (Zero
   function calls)

 - The list is a trivial singly-linked list, using the first 4 bytes
   of each item as a next pointer.  There is also a realloc'd
   nonshrinking array of block pointers.

 - Periodically, the program calls the GC function, which picks a set
   of randomly selected contiguous block pointer values and does the
   GC:

    * The freelist is iterated once, and every item in any of the
      blocks of interest is pulled out and put in a temporary
      per-block array.  Each temp array has a counter.

    * Any of the temp arrays that end up full have their block free'd.
      The array of block pointers is fiddled to be contiguous again.

    * All other blocks have their items put back into the list.

This turns out to give delightful high load performance for wide
ranges of the tunable parameters (alloc block size, GC block set size,
GC frequency).  Allocation/deallocation overhead is near-zero except
for the occasional block allocation, and the GC load is well under
0.5% in mcount-adjusted profile, worst case.

My high-load scenarios are in the ballpark of millions of allocations
and frees per second; about 1/4 of them remain allocated for several
seconds, while others are freed immediately.

I'd offer my code, but it's tainted, so I contribute this description
of a working design for whatever it's worth.  The important thing is
that for at least my application (which I'll guarantee is a lot busier
than most gtk apps), any effort to keep the set of free items sorted,
indexed, or contiguous is not useful.  Simple linear iteration on a
timer with manual computation of item-block correspondance is a
cheaper way to find and free things, even in high-volume cases.

The main flaw, of course, is that applications with high-frequency
memory utilization spikes would tend to use lots of memory always, as
the GC has the buffering effect of a capacitor.  Some people may
prefer an optimally space-efficient skiplist or tree based affair
capable of freeing blocks ASAP.  It should even be possible for such
an implementation to roughly approach the time performance of a simple
list-based affair in many uses, although that would require judicous
avoidance of most existing glib facilities (the trees, lists, etc).

> we had some patching efford going on in the past, both, to disable
> freed-struct caches, and to zero-them out at free-time. you might
> want to take a look at glib again.

Ah, good.  This lets you debuach yourself, etc ;)

>>    state machine scheduling.  I added a g_source_detach, which I use
> i don't remember the details, but owen had good reasons for not
> offering source detachment (at least at this point).

It was trivial to implement, but gmain is not really set up for
frequent context changes, so performance gets pretty awful if you
often detach things.  Either you start the loop over, or you rummage
all over the place trying to adjust context state to where it won't
run that source anymore in the existing iteration.  Both options
sucked.

GMain seems to concentrate on generality, which IMHO isn't as useful
as specifically implementing most of the actually existing types of
event sources (fds, timers, signals, children, toolkits, etc) in a
more tightly coupled way.  Extensibility should be offered in the form
of hooks into various points in the loop, not via the ability to add
further items to a very inefficient generic list of event objects.

Ask your users to raise their hands if they have implemented a
GSource.  I guarantee you can count them on a hand or two.  The
generality is not useful, and impairs the speed of the most frequently
run code in every GTK program.  Even small overheads matter here; all
the gratuitous function calls for figuring out source readiness etc
are measurable when they really shouldn't even exist.

>>  - It would be useful to have in-place constructors for some of the
> this comes up now and again (i've been requesting this myself for some time).
> however, the problem is that we'd have to essentially duplicate lots of
> our APIs, resulting in doubled maintenance, and doubling the learning curve
> for newbies. compared to that, the resulting benefits become negligible.

Oh, no, it's not that painful at all.  You'd add a new API call for
every object, sure, but it'd be the same pattern everywhere, so the
user learning curve would be the least of the problems.

The maintenance is also small: you add an inline for the existing
allocating call in your header, and replace the current foo_new() with
a nonallocating foo_new_inplace() call that the inline foo_new() or
the user calls.  You'd need an "inplace" bit in most objects to avoid
calling free on inplace things, but that's about it.

Really, the problem is less the objects themselves but all the random
pointer nodes in most of the glib structures.  You'll need
implementation changes in lists, trees, etc to fix this ;(

> eventhough owen and me cooked up the newly added GQueue

I'm a little unclear why the GQueue is separate from the GList.  By
definition, you can push and pop any which way from a doubly linked
list.  Why aren't there just some "queue" functions on the lists?

>>  - A skiplist would be very handy.  I know that Havoc turned down one

> right, it takes a good and well implemented patch to include this though
> (or someone willing to work on such code based on comments given on this
> list untill it gets "good and well implemented" ;)

Yes, well, there is that.  Alas, the skiplists here aren't even
company property, let alone mine.  Perhaps I'll make a clean one in a
year or two if it's still needed then.  Sigh.

Skiplists have the added advantage that concurrent access to them is
much easier to do efficiently than for trees.  If (as it seems) you're
really keen on threads, this is another fine thing.

OTOH, threads really suck, so this is not much of a real advantage. ;)

>>  - The g_log subsystem is frightfully slow, even for things that are

> g_log is still on my list of things to change the internals of. the
> current code is about 3 times more complex than it has to be, just
> because the memory subsystem wants to report pathologic situations
> through it as well ;(

God, that's hopeless.  You'd have to have preallocated everything all
the way up through the user, and guarantee not to push the stack over
a new page boundary, as well.  You can check the stack without too
much trouble, but you'll never be able to say what goes on in an Xlib,
or GDI, for that matter.

Speaking of memory, g_malloc is a pain.  It breaks the caller
information part of the malloc hooks in glibc, so glib users must use
a stack-tracing tool like debauch or purify to leak-check their
software.  Does gmalloc really add anything besides stats and minor
semantic tweaks?

--
Grant Taylor - gtaylor picante com - http://www.picante.com/~gtaylor/
    Linux Printing Website and HOWTO:  http://www.linuxprinting.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]