Re: memory allocations.
- From: Alan Cox <alan redhat com>
- To: drepper redhat com (Ulrich Drepper)
- Cc: alan redhat com (Alan Cox), chabotc reviewboard com (Chris Chabot), hp redhat com (Havoc Pennington), iain ximian com (iain), gnome-hackers gnome org
- Subject: Re: memory allocations.
- Date: Wed, 27 Feb 2002 20:17:14 -0500 (EST)
> You cannot have one pool per thread. That would cause too much
> fragmentation and is even impractical with thousands of threads.
> There is a pool of memory pools and access obviously has to be
> controlled.
You can have one pool per thread group and hash - which for low
counts cuts the number close to zero. Thread specific pools are a huge
win in many studies I've seen
- You increase the cache performance (especially on things like
ARM which are untagged virtual)
- You reduce locking massively
I have my head in kernel space so I'm probably oblivious to some rather
important C library things here.
> The whole thing would be mostly void if we'd have the fast locking
> with kernel help. If there is no contention the overhead is minimal.
> Plus the one or other optimization in this area which we already
> talked about.
Locking is only one tiny piece of this
> I don't understand what you want to do. Keeping per-thread pools?
> This is unlikely to be better. You'd have to create and destroy them
> with each new thread. You have to keep track of thread creation etc
Everything I know from kernel space says that you want to keep feeding
a thread back the same memory. You also want to minimise mmap use in
a threaded application (eg its good to use mmap for I/O unithread but
the same code threaded on SMP frequently goes down the toilet)
The costs you have to consider are
- If you avoid a lock you avoid a locked operation and a cache
line bounce on SMP (100+ clocks even with kernel help for the
case where another cpu used it last)
- If you draw from the same pool you just used on that CPU you
are in L1 cache (60+ times faster than main memory) or worst
case L2 (8 times faster)
- If you do an mmap on a threaded app you have to do a cross
processor TLB shootdown (1000 clocks+ on x86)
So for small allocations its difficult to see how its a lose. For large
allocations I understand the fragmentation argument, and large allocs
are unusual so I wouldnt argue if you said those should be done in one
pool. What I am talking about is carving up something like 64K or less
chunks for small allocs without locks. Now that can be done in glib
by giving glib a kickass slab implementation or even by using the
kernel slab code pulled into an optional GPL library for free gnome
apps to use - but it could also be done to help everyone.
> What an application with thousands of memory requests should do (and
> not at the g_malloc level) is to keep memory pool for object of one
> type. These could eventually be kept in per-thread. If applications
> are allocation (and not freeing) thousands of little blocks this
> should be handled by analyzing the object's lifetime and putting them
> in one struct.
Lots of the glib ones are small but variable size. For fixed size objects
you can use a slab allocator with per cpu front caches. I've not seen
much to beat it - so I agree in principle about big objects.
> This is all common programming knowledge. Nobody seems to have done
> any work here and instead now tries to blame other parts of the
> system. First clean up your yard before complaining about your
> neighbor.
Its more a question of - should we clean up lots of yards, or can we
invent a better type of yard that self cleans.
Alan
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]