Re: Suggestion for file type detection approach



>Given the performance bottleneck imposed by sniffing, I suggest that it
>is not used anymore in directory listing routines. It should be used
>when the user tries to open an unknown file. Let's imagine this case:

I don't think this is a good way of thinking about things.  The questions are: 
1. Is sniffing a good idea?  
2. If so does it work correctly?
3. If (1) and (2), is it performing fast enough?

I think others have argued persuasively that sniffing is a good idea
since unix doesn't generally give file name suffixes to files (even
though gnome does), and often files have incorrect suffixes.

A number of people have said that there are instances where sniffing is
not correctly determining the file type.  If true, this is an excellent
argument for fixing those cases, but not at all an argument for throwing
away sniffing altogether.

>From your benchmarking it is clear that (3) is a problem, and sniffing
is taking too long.  This is not an argument for getting rid of it
though, just an argument for speeding it up.  Someone suggested running
it through a profiler, but I doubt that will be worthwhile--the problem
is almost certainly the multiple disk accesses (your disk is having to
seek for each file).  As others have pointed out, a two pass technique.
based on extension and then sniffing is also a bad idea since icons, etc
would change in the case of a discrepancy.

Sniffing is slow because it opens every file and reads some of it every
time you open a given directory. If you want to make this fast, cache
filetypes; now opening the huge mp3 folder is just a matter of reading a
single cache file and sniffing those files with a modification time
later than that of the cache file.  Naturally this would only need to be
done for those really huge directories, it would probably be a waste for
directories with only a hundred files or fewer.

I don't think we need to worry about which approach is ultimately going
to perform faster. For a program like Nautilus either there is or is not
a human-noticeable lag time; improving performance when there is no lag
time is totally pointless.

A number of people have used Windows as an example of why we don't need
to sniff files; though there are a number of features in Windows worth
copying this is definitely not one of them.  I have worked on some
commercial software and the number one frivolous bug report or unfixable
user issue occurs when the user attempts to open a file that has an
incorrect filename extension.  People on this list have suggested that
this is a user problem not a software problem (i.e. that the user was
stupid and beyond help), but I can assure you that however obvious the
connection between the hidden Windows filename extension and the error
message that our program gave is to me and you, it was not obvious to a
large number of otherwise very intelligent people who just weren't as
knowledgeable about computers.  People always think that the icon for a
file is somehow part of the file (it makes sense if you don't think
about it too hard), and so if a file has, say, a jpeg icon it doesn't
occur to them that it is not a jpeg.

-Jay




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]