Re: Some initial thoughts about 2.4



Daniel Egger <degger fhm edu> writes:

> Am Mon, 2002-12-30 um 20.02 schrieb Owen Taylor:
> 
> > The idea of object file reordering is to put infrequently used functions
> > into different pages than frequently used functions. GTK+-2.0 has
> > a lot of code:
> 
> Interesting, this is a new idea to me. 

There is quite a bit of precedent for it ... Microsoft has done
it for a long time. SGI had a utility called 'cord' that did
this. Nat Friedman did some experimentation under the 'grope' name
some years ago, which showed promise (something like halving the
load time for GCC) though nothing that useful came out of it.

> However given that a page is
> traditionally 4 or 8k I'm wondering how much functions one can group
> together considering that the whole library has 2.1M (from your
> statistics). 

That 2.1M is 7000+ functions, so we are talking something on
the order of 15 functions on a 4k page.

> Which effects would we want to trigger? Better utilisation
> of I-Cache? Prevent paging? If paging, which sort of paging?

The optimization part is the hard part, certainly.
I-Cache is relatively difficult to instrument. 
to figure out right; the easy things to try to optimize are:

 - Number of pages paged in during startup
 - Working set of resident pages after start up

A first approximation would be to group together:

 - Functions needed only during startup
 - Commonly used functions
 - Rarely used functions

> > By reordering the functions in the executable, I think it would
> > be possible to reduce the amount of pages that have to be loaded off
> > the disk for the first app that uses GTK+ by a significant
> > fraction.
> 
> Okay, say the library is mmapped in and the OS is configured to not
> do readahead but instead page in the missing functions in fractions of
> whole pages as we walk through the application, how much gain would you 
> estimate by improving locality? 

An OS that doesn't read in whole pages is really a bit too far
from my experience to make any guesses at.

> And how would we figure out which
> functions to put together considering that different applications surely
> have a completely different footprint?

Apps are different, but not *that* different. That is, every
app uses gtk_widget_show_all(), nothing will use
gtk_progress_bar_set_discrete_blocks().
 
> > (though prelinking did make a 10-20% difference for gnome-terminal
> > in some timings i did)
> 
> This is interesting. Dynamiclinking in the C case is bog simple and
> really hard to speed up. How did you achieve that?

libgtk has some 2000 relocations in it that have to be processed
at startup. And remember that any page with a relocation has to 
be copied and can't be shared between apps.
 
> Do you have any pointers or papers? This looks like an interesting
> area for some research.

I thought there was a whitepaper on Jakub Jelinek's prelinking stuff,
but I don't see it in a quick search. There may useful docs in
the prelink tarball:

 ftp://people.redhat.com/jakub/prelink

For object file reordering, I don't have reference off-hand but
it shouldn't be that hard to dig something up.

Regards, 
                                        Owen



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]