Re: memory allocations.

From: Alan Cox <alan redhat com>
To: drepper redhat com (Ulrich Drepper)
Cc: alan redhat com (Alan Cox), chabotc reviewboard com (Chris Chabot), hp redhat com (Havoc Pennington), iain ximian com (iain), gnome-hackers gnome org
Subject: Re: memory allocations.
Date: Wed, 27 Feb 2002 20:17:14 -0500 (EST)

> You cannot have one pool per thread.  That would cause too much
> fragmentation and is even impractical with thousands of threads.
> There is a pool of memory pools and access obviously has to be
> controlled.

You can have one pool per thread group and hash - which for low
counts cuts the number close to zero. Thread specific pools are a huge
win in many studies I've seen

- 	You increase the cache performance (especially on things like
	ARM which are untagged virtual)
-	You reduce locking massively

I have my head in kernel space so I'm probably oblivious to some rather
important C library things here.

> The whole thing would be mostly void if we'd have the fast locking
> with kernel help.  If there is no contention the overhead is minimal.
> Plus the one or other optimization in this area which we already
> talked about.

Locking is only one tiny piece of this

> I don't understand what you want to do.  Keeping per-thread pools?
> This is unlikely to be better.  You'd have to create and destroy them
> with each new thread.  You have to keep track of thread creation etc

Everything I know from kernel space says that you want to keep feeding
a thread back the same memory. You also want to minimise mmap use in
a threaded application (eg its good to use mmap for I/O unithread but
the same code threaded on SMP frequently goes down the toilet)

The costs you have to consider are

-	If you avoid a lock you avoid a locked operation and a cache
	line bounce on SMP (100+ clocks even with kernel help for the
	case where another cpu used it last)
-	If you draw from the same pool you just used on that CPU you
	are in L1 cache (60+ times faster than main memory) or worst
	case L2 (8 times faster)
-	If you do an mmap on a threaded app you have to do a cross
	processor TLB shootdown (1000 clocks+ on x86)

So for small allocations its difficult to see how its a lose. For large
allocations I understand the fragmentation argument, and large allocs
are unusual so I wouldnt argue if you said those should be done in one
pool. What I am talking about is carving up something like 64K or less
chunks for small allocs without locks. Now that can be done in glib
by giving glib a kickass slab implementation or even by using the
kernel slab code pulled into an optional GPL library for free gnome
apps to use - but it could also be done to help everyone.

> What an application with thousands of memory requests should do (and
> not at the g_malloc level) is to keep memory pool for object of one
> type.  These could eventually be kept in per-thread.  If applications
> are allocation (and not freeing) thousands of little blocks this
> should be handled by analyzing the object's lifetime and putting them
> in one struct.

Lots of the glib ones are small but variable size. For fixed size objects
you can use a slab allocator with per cpu front caches. I've not seen
much to beat it - so I agree in principle about big objects.

> This is all common programming knowledge.  Nobody seems to have done
> any work here and instead now tries to blame other parts of the
> system.  First clean up your yard before complaining about your
> neighbor.

Its more a question of - should we clean up lots of yards, or can we
invent a better type of yard that self cleans.

Alan

Follow-Ups:
- Re: memory allocations.
  - From: Ulrich Drepper
- Re: memory allocations.
  - From: Sander Vesik

References:
- Re: memory allocations.
  - From: Ulrich Drepper

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]