Re: Suggestion for file type detection approach



It's trivial to calculate, assuming that it requires one seek per file.
My HD have a seek time in the 5ms range, so 1k files takes at
least 5 seconds; there's also some time required to actually do the
reading.  No optimization can really help, except reorganizing the FS,
and caching.  On NFS the situation is probably worse.

Other measurements have been cited earlier in this thread...

- dave

On Sat, Jan 03, 2004 at 10:28:45AM -0500, Jeffrey Stedfast wrote:
> 
> has anyone actually done any profiling? or even testing the sniffer
> itself to see how fast it could detect all this? I seriously doubt it is
> as slow as people are making it out to be.
> 
> Jeff
> 
> On Sat, 2004-01-03 at 10:21, Fabio Gomes wrote:
> > [ I am replying only to gnome-devel-list to reduce traffic ]
> > 
> > Em Sex, 2004-01-02 ās 19:46, Edward Jay Kreps escreveu:
> > 
> > 
> > > Sniffing is slow because it opens every file and reads some of it every
> > > time you open a given directory. If you want to make this fast, cache
> > > filetypes; now opening the huge mp3 folder is just a matter of reading a
> > > single cache file and sniffing those files with a modification time
> > > later than that of the cache file.  Naturally this would only need to be
> > > done for those really huge directories, it would probably be a waste for
> > > directories with only a hundred files or fewer.
> > 
> > This would be great. The cache could reside inside the directory
> > metadata, since this would allow users to modify it to fix misidentified
> > files. And this directory metadata API already exists. :)
> > 
> > > I don't think we need to worry about which approach is ultimately going
> > > to perform faster. For a program like Nautilus either there is or is not
> > > a human-noticeable lag time; improving performance when there is no lag
> > > time is totally pointless.
> > 
> > Don't forget about the waste of traffic and server disk I/O that can be
> > generated if content sniffing is used across the network by lots of
> > users. :)
> > 
> > > 
> > > A number of people have used Windows as an example of why we don't need
> > > to sniff files; though there are a number of features in Windows worth
> > > copying this is definitely not one of them.  I have worked on some
> > > commercial software and the number one frivolous bug report or unfixable
> > > user issue occurs when the user attempts to open a file that has an
> > > incorrect filename extension.  
> > 
> > Hmm. What would you do if your customers had the same problem, but with
> > content sniffing misidentification instead of wrong suffix? :)
> > 
> > > People on this list have suggested that
> > > this is a user problem not a software problem (i.e. that the user was
> > > stupid and beyond help), but I can assure you that however obvious the
> > > connection between the hidden Windows filename extension and the error
> > > message that our program gave is to me and you, it was not obvious to a
> > > large number of otherwise very intelligent people who just weren't as
> > > knowledgeable about computers.
> > 
> > It is possible to educate users about the fact that files must have
> > suffixes to be properly identified by the system and that it is possible
> > to fix wrong suffixes by right-clicking and running the filetype
> > detection tool to rename the file properly. The system can also
> > intelligently wanr the users when they are about to save or rename a
> > file with wrong suffixes.
> > 
> > Besides performance, big part of the discussion is about manageability.
> > Content sniffing is not manageable by users or sysadmins. It can only be
> > managed by programmers. IMHO, we should not impose to users (in the way
> > it is currently implemented) a feature that, in some cases, get in their
> > way and they cannot even call the technical support to walk around it.
> > 
> > Imagine that you have a company with 100 machines running the GNOME
> > Desktop and content-sniffing misidentifies some file type that is
> > crucial to the work of this company. Some people suggested that this is
> > merely a bug and should be reported accordingly. OK, let's think like
> > this, so the 100 users must change the way they work (ie, stop opening
> > files from Nautilus) until 1) the bug is reported, 2) the bug is fixed,
> > 3) a new stable version of gnome-vfs is released, and 4) the system
> > administrator (or the outsorcing company) upgrades every station.
> > 
> > In the example above, will users return to Nautilus after everything is
> > fixed?
> > 
> > I imagine that content sniffing is really useful for home users, who
> > download multimedia stuff from P2P. These files really come with wrong
> > suffixes all the time. But the most common ones are Video or Audio files
> > that end up being open with the same program just like if they had the
> > right suffixes. 
> > 
> > But is content sniffing really useful at work? Corporate Linux desktop
> > networks often consist of NFS mounts with lots of folders and
> > internally-created documents. When a user cannot open a file on the
> > network, she will probably ask the technical support. If the problem is
> > with a file that she have received by mail, she will probably ask the
> > sender.
> > 
> > The current implementation of content sniffing not 100% accurate. This
> > makes me remember voice recognition, language translators and Optical
> > Character Recognition (OCR). These technologies are great but computers
> > still have keyboards and mice. It is not possible to enforce input
> > technologies that are not 100% acurate/reliable. They cannot be used as
> > a mandatory input source in applications. Instead, they are used as
> > tools to reduce costs and ease user's lifes. The same should apply to
> > content sniffing: It is not 100% accurate but it is currently being
> > enforced for file type detection in Nautilus.
> > 
> > Happy new year!
> 
> _______________________________________________
> gnome-devel-list mailing list
> gnome-devel-list gnome org
> http://mail.gnome.org/mailman/listinfo/gnome-devel-list

-- 



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]