Re: Disk seeks article [Re: New Optimization Section on d.g.o]



On Mon, 2004-09-27 at 08:59 +0100, Mark McLoughlin wrote:

Hi, Mark.

> 	In articles like this, I think it'd be nice to have details to allow
> people to diagnose/simulate the problem and quantify the improvements
> from any changes. That's because you'll often need to make tradeoffs
> based on cold, hard data.

Consider it a living document, like my will -- constantly changing due
to Keith, worthless excuse for a son, why won't he call, never visits
his mother, probably cracked out somewhere, well, he won't receive a
dime from me!

Or, like the US Constitution, which allows amendments to craft the core
document to whatever the current religious^W needs are. 

Anyhow, seriously, please feel free to make changes.

> 	 e.g. GConf has lots of little files spread across the disk. We have
> code to consolidate all those little files into one big file or a small
> number of big files. Code isn't the problem. What we need is some way of
> making a decision based on real world measurements of the seek times
> with lots of little files vs. the parsing time and memory usage of a big
> file.

I actually was not thinking of Gconf - more icons, themes, .desktop
files, and fonts.  But depending on how many of files Gconf consistently
touches, though, Gconf is a culprit as well.  Which I guess begs your
point, how do we tell?

When we see this, we know we have a problem:

        $ strace gedit 2>&1 | wc -l
        12130
        $ strace gedit 2>&1 | grep ^read | wc -l
        1771
        $ strace gedit 2>&1 | grep ^open | wc -l
        1044

12k system calls and 1k open files?  Ugh.

And gedit does not do much, so you know most of those calls and files
are in GNOME libraries.

> 	I guess there's a few things I'd love to be able to do:
> 
>   1) Simulate initial startup time by completely clearing the disk cache
>      before taking measurements

Everyone keeps asking me for this. ;-)

I should do a jettison_everything () system call.

>   2) Simulate the files-scattered-across-the-disk problem
> 
>   3) Profile the disk accesses made by an application - disk seeks, 
>      cache misses, read times etc. etc.
> 
>   4) Profile the disk accesses happening at login - i.e. with multiple 
>      applications all doing lots of disk accesses, who are the ones
>      doing all the reading and who are the ones getting hammered because
>      they have to wait for their turn?

I am not very familiar with oprofile - I wonder how much of the above it
can help us with.

There are not great tools in Linux to obtain I/O statistics, but we can
get measurements of I/O activity, I/O wait time, and of course aggregate
real-time elapsed.

BTW, to anyone wondering, the document was meant to be a bit tongue in
cheek.  We are all guilty. ;-)

	Robert Love





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]