Re: Suggestion for file type detection approach



"John (J5) Palmieri" <johnp martianrock com> writes:

> Then how would you handle .rpm? If we rely on three letters to
> define every media type that will ever be available then we will run
> out of room very fast.

There is nothing that stops you from using more than three letters
(unless you use MSDOS of course). Anyway, I don't consider having the
filetype encoded in the filename a good idea either, having the
filetype as a property that can exist beside the filename would be
much prefered, however the fileextension are currently the only
portable way to encode the filetype.

> 95% is realy good, where it doesn't work we need to fix.

95% is very very bad, that means a handfull of unfixable wrong files
on every second directory view. If the autodetection should stay the
default it must be 100% accurate or provide a very easy way to
circumvents it. And it should especially be transparent, currenty I
have a bunch of files that get detected wrong, while pretty similar
files get detected correct, this is pretty damn confusing and
irritating.

> This I agree is bad.  Perhaps first guess should be extention with
> second pass being content sniffing.

No, this will just result in 'jumping' filetypes, nothing is more
annoying than a filebrowser that changes behaviour in unexpected
random ways when incrementally loading. A second pass should only
provide more details (thumbnails, image resolution and such), not
modify the filetype itself.

> I like sniffing because sometimes I have a .avi file that is really
> divx. Some AVI files I can read, some I can't so seeing the preview
> is nice because I know I can view it regaurdless of the extention.

With AVI the situation becomes more complicated, after all its a
pretty generic container format, so the content (ie the codec that was
used) matters often much more than the fact that is packed in an AVI
file. Content sniffing might be helpfull here to show more detail on
the specific codec, resolution and such.

>> And last not least, all other operating systems currently really
>> more or less on the file-suffix to be correct, since so far its the
>> only portable way to transport the filetype.
>
> Is this true for the Mac?  I don't think so.

MacOSX shows a little message-box if I change a .jpg to a .txt, pretty
much like Windows. Under the hood it keeps track of the filetype in
additional ways, but I am not sure on the details.

> Besides why should we always follow what the other guy is doing?

Compability, to not confuse users, etc. File-exchange is a pretty
daily bussiness, if Gnome Users start to strip all there file-suffixes
and depend on automatic-detection-magic they won't make many friends.

> Exactly, the suffix give a HINT but is not authoritive.  If the bugs
> could be worked out of the content sniffer (speed, correctness, ect.) it
> is a much better solution.

The suffix should be more authoritive than the content, at least in
some cases, ie when I name a .html, .xml file or whatever .txt, I want
it to be handled as plain text, not launch mozilla once I click on it.
The problem is that there is nothing in the content that gives a 100%
correct answers on what the files should be meant to be, only the
filetype does and that is nowhere in the content decoded.

> I personaly think that (expecialy with XML and the hundreds if not
> thousands of file types defined by it) file extentions are a step
> back.

I prefer file extentions for there simplicity, having and additional
second 'detail' pass over a file might be helpfull, but ignoring the
suffix and replacing it by a guessing-game is really not a good way to
solve the problem of no way to encode the mime-type.

-- 
WWW:      http://pingus.seul.org/~grumbel/ 
JabberID: grumbel jabber org 
ICQ:      59461927



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]