Re: Suggestion for file type detection approach

yes, the directory scan was already cached - pretending to test
first-run is bullshit because you don't know if anything may have
already scanned the directory and thus pre-cached it.

unless someone knows of a way to be sure that there's no cache, it's an
impossible-to-do-fairly task.

anyways... about gnomevfs-ls:

- seems gnomevfs-ls stat()'s some of the mime-info files multiple times
even after it has opened/read them in.

- read()'s 4k from each file (4k is a pretty efficient size to read, but
if we don't *need* 4k, it might be overkill - file seems to read much
smaller chunks - 32 bytes and more if it needs it). I guess what should
be done is to decide what the max amount we ever need is and simply read
in that amount. if there are just a handful that need more or need a
chunk later in the file, perhaps encode this somehow in the mime-info

also to keep in mind when comparing gnomevfs-ls vs ls, gnomevfs-ls has
to load/parse/etc several mime-info files - you need to remember that
normally an application using gnome-vfs will only have to load those
mime-info files once for the entire session and so other than that first
time nautilus loads a directory (which is basically just startup), it
will never need to load/parse those files again. whereas when comparing
ls vs gnomevfs-ls, it is a significant overhead.

anyways, because pre-caching makes such a huge impact on sniffing, it
does suggest that the process is i/o bound (duh, makes sense). and so
cacheing these sniff results in EA or some other file or something might
definitely be a way to go.


On Sun, 2004-01-04 at 10:28, Arvind Narayanan wrote:
> On Sun, Jan 04, 2004 at 10:30:47AM -0500, Jeffrey Stedfast wrote:
> > 
> > just because the bulk of the time spent will be in i/o, does not mean
> > that profiling will be useless.
> > 
> > anyways, just out of curiosity, I timed 'ls -l' and 'file' in /usr/bin
> > and here are the results I get:
> I tried this too. You're getting such quick times because the disk 
> blocks are cached already. It would have been very slow the first time
> you tried it. Reboot and try file again if you like.
> This is what I get:
> $ time file /usr/bin/* > /dev/null  # first time
> real    0m16.017s
> user    0m0.310s
> sys     0m0.480s
> $ time file /usr/bin/* > /dev/null
> real    0m0.455s
> user    0m0.290s
> sys     0m0.160s
> $ time file /usr/bin/* > /dev/null
> real    0m0.450s
> user    0m0.300s
> sys     0m0.150s
> > 
> > `time /bin/ls -l /usr/bin`
> > real    0m2.783s
> > user    0m0.060s
> > sys     0m0.010s
> > 
> > `time file /usr/bin/*`
> > real    0m2.788s
> > user    0m0.060s
> > sys     0m0.120s
> > 
> > looks to me like "sniffing will suck performance-wise" is a hunk of
> > crap, at least assuming file-type sniffing is done efficiently.
> But you can't do it efficiently unless you can magically predict
> which directory the user is going to click on next. And you can't
> pre-cache the entire disk, can you?
> Arvind

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]