Re: Suggestion for file type detection approach

Wouldn't it make some sense to make a first pass and attempt to assign
an icon and a set of actions based on the extension, and then go back and
do a file sniff and if the detection is different than the first pass, add an
emblem and add the list of actions for the second type to the list of
actions for the first type?

If there are two default actions because the file is detected two different
ways, isn't it trivial to prompt the user for their choice of the two actions they


Dave Benson wrote:

It's trivial to calculate, assuming that it requires one seek per file.
My HD have a seek time in the 5ms range, so 1k files takes at
least 5 seconds; there's also some time required to actually do the
reading.  No optimization can really help, except reorganizing the FS,
and caching.  On NFS the situation is probably worse.

Other measurements have been cited earlier in this thread...

- dave

On Sat, Jan 03, 2004 at 10:28:45AM -0500, Jeffrey Stedfast wrote:
has anyone actually done any profiling? or even testing the sniffer
itself to see how fast it could detect all this? I seriously doubt it is
as slow as people are making it out to be.


On Sat, 2004-01-03 at 10:21, Fabio Gomes wrote:
[ I am replying only to gnome-devel-list to reduce traffic ]

Em Sex, 2004-01-02 ās 19:46, Edward Jay Kreps escreveu:

Sniffing is slow because it opens every file and reads some of it every
time you open a given directory. If you want to make this fast, cache
filetypes; now opening the huge mp3 folder is just a matter of reading a
single cache file and sniffing those files with a modification time
later than that of the cache file.  Naturally this would only need to be
done for those really huge directories, it would probably be a waste for
directories with only a hundred files or fewer.
This would be great. The cache could reside inside the directory
metadata, since this would allow users to modify it to fix misidentified
files. And this directory metadata API already exists. :)

I don't think we need to worry about which approach is ultimately going
to perform faster. For a program like Nautilus either there is or is not
a human-noticeable lag time; improving performance when there is no lag
time is totally pointless.
Don't forget about the waste of traffic and server disk I/O that can be
generated if content sniffing is used across the network by lots of
users. :)

A number of people have used Windows as an example of why we don't need
to sniff files; though there are a number of features in Windows worth
copying this is definitely not one of them.  I have worked on some
commercial software and the number one frivolous bug report or unfixable
user issue occurs when the user attempts to open a file that has an
incorrect filename extension.
Hmm. What would you do if your customers had the same problem, but with
content sniffing misidentification instead of wrong suffix? :)

People on this list have suggested that
this is a user problem not a software problem (i.e. that the user was
stupid and beyond help), but I can assure you that however obvious the
connection between the hidden Windows filename extension and the error
message that our program gave is to me and you, it was not obvious to a
large number of otherwise very intelligent people who just weren't as
knowledgeable about computers.
It is possible to educate users about the fact that files must have
suffixes to be properly identified by the system and that it is possible
to fix wrong suffixes by right-clicking and running the filetype
detection tool to rename the file properly. The system can also
intelligently wanr the users when they are about to save or rename a
file with wrong suffixes.

Besides performance, big part of the discussion is about manageability.
Content sniffing is not manageable by users or sysadmins. It can only be
managed by programmers. IMHO, we should not impose to users (in the way
it is currently implemented) a feature that, in some cases, get in their
way and they cannot even call the technical support to walk around it.

Imagine that you have a company with 100 machines running the GNOME
Desktop and content-sniffing misidentifies some file type that is
crucial to the work of this company. Some people suggested that this is
merely a bug and should be reported accordingly. OK, let's think like
this, so the 100 users must change the way they work (ie, stop opening
files from Nautilus) until 1) the bug is reported, 2) the bug is fixed,
3) a new stable version of gnome-vfs is released, and 4) the system
administrator (or the outsorcing company) upgrades every station.

In the example above, will users return to Nautilus after everything is

I imagine that content sniffing is really useful for home users, who
download multimedia stuff from P2P. These files really come with wrong
suffixes all the time. But the most common ones are Video or Audio files
that end up being open with the same program just like if they had the
right suffixes.
But is content sniffing really useful at work? Corporate Linux desktop
networks often consist of NFS mounts with lots of folders and
internally-created documents. When a user cannot open a file on the
network, she will probably ask the technical support. If the problem is
with a file that she have received by mail, she will probably ask the

The current implementation of content sniffing not 100% accurate. This
makes me remember voice recognition, language translators and Optical
Character Recognition (OCR). These technologies are great but computers
still have keyboards and mice. It is not possible to enforce input
technologies that are not 100% acurate/reliable. They cannot be used as
a mandatory input source in applications. Instead, they are used as
tools to reduce costs and ease user's lifes. The same should apply to
content sniffing: It is not 100% accurate but it is currently being
enforced for file type detection in Nautilus.

Happy new year!
gnome-devel-list mailing list
gnome-devel-list gnome org

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]