Re: Suggestion for file type detection approach



[ I am replying only to gnome-devel-list to reduce traffic ]

Em Sex, 2004-01-02 ās 19:46, Edward Jay Kreps escreveu:


> Sniffing is slow because it opens every file and reads some of it every
> time you open a given directory. If you want to make this fast, cache
> filetypes; now opening the huge mp3 folder is just a matter of reading a
> single cache file and sniffing those files with a modification time
> later than that of the cache file.  Naturally this would only need to be
> done for those really huge directories, it would probably be a waste for
> directories with only a hundred files or fewer.

This would be great. The cache could reside inside the directory
metadata, since this would allow users to modify it to fix misidentified
files. And this directory metadata API already exists. :)

> I don't think we need to worry about which approach is ultimately going
> to perform faster. For a program like Nautilus either there is or is not
> a human-noticeable lag time; improving performance when there is no lag
> time is totally pointless.

Don't forget about the waste of traffic and server disk I/O that can be
generated if content sniffing is used across the network by lots of
users. :)

> 
> A number of people have used Windows as an example of why we don't need
> to sniff files; though there are a number of features in Windows worth
> copying this is definitely not one of them.  I have worked on some
> commercial software and the number one frivolous bug report or unfixable
> user issue occurs when the user attempts to open a file that has an
> incorrect filename extension.  

Hmm. What would you do if your customers had the same problem, but with
content sniffing misidentification instead of wrong suffix? :)

> People on this list have suggested that
> this is a user problem not a software problem (i.e. that the user was
> stupid and beyond help), but I can assure you that however obvious the
> connection between the hidden Windows filename extension and the error
> message that our program gave is to me and you, it was not obvious to a
> large number of otherwise very intelligent people who just weren't as
> knowledgeable about computers.

It is possible to educate users about the fact that files must have
suffixes to be properly identified by the system and that it is possible
to fix wrong suffixes by right-clicking and running the filetype
detection tool to rename the file properly. The system can also
intelligently wanr the users when they are about to save or rename a
file with wrong suffixes.

Besides performance, big part of the discussion is about manageability.
Content sniffing is not manageable by users or sysadmins. It can only be
managed by programmers. IMHO, we should not impose to users (in the way
it is currently implemented) a feature that, in some cases, get in their
way and they cannot even call the technical support to walk around it.

Imagine that you have a company with 100 machines running the GNOME
Desktop and content-sniffing misidentifies some file type that is
crucial to the work of this company. Some people suggested that this is
merely a bug and should be reported accordingly. OK, let's think like
this, so the 100 users must change the way they work (ie, stop opening
files from Nautilus) until 1) the bug is reported, 2) the bug is fixed,
3) a new stable version of gnome-vfs is released, and 4) the system
administrator (or the outsorcing company) upgrades every station.

In the example above, will users return to Nautilus after everything is
fixed?

I imagine that content sniffing is really useful for home users, who
download multimedia stuff from P2P. These files really come with wrong
suffixes all the time. But the most common ones are Video or Audio files
that end up being open with the same program just like if they had the
right suffixes. 

But is content sniffing really useful at work? Corporate Linux desktop
networks often consist of NFS mounts with lots of folders and
internally-created documents. When a user cannot open a file on the
network, she will probably ask the technical support. If the problem is
with a file that she have received by mail, she will probably ask the
sender.

The current implementation of content sniffing not 100% accurate. This
makes me remember voice recognition, language translators and Optical
Character Recognition (OCR). These technologies are great but computers
still have keyboards and mice. It is not possible to enforce input
technologies that are not 100% acurate/reliable. They cannot be used as
a mandatory input source in applications. Instead, they are used as
tools to reduce costs and ease user's lifes. The same should apply to
content sniffing: It is not 100% accurate but it is currently being
enforced for file type detection in Nautilus.

Happy new year!

-- 
Fabio Gomes de Souza <fabio gs2 com br> (+55 81 9127-0597)

.- GS2 TECNOLOGIA DA INFORMACAO LTDA :: www.gs2.com.br
|- IT Infrastructure :: Security :: Embedded systems :: Linux
`- Olinda, Brazil - +55 81 3492-7777 - negocios gs2 com br





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]