Re: Suggestion for file type detection approach

On Fri, 2003-12-26 at 17:53, Ingo Ruhnke wrote:
> Rodney Dawes <dobey free fr> writes:
> > This is obviously incorrect. Having to have the user rename their
> > files so that they get opened with the right application, is
> > inappropriate.
> If the file-suffix is incorrect the file is broken and needs to be
> fixed, ie the suffix needs to be changed to the correct one. 

Then how would you handle .rpm?  If we rely on three letters to define
every media type that will ever be available then we will run out of
room very fast.

> Having an
> "open with" as a quick workaround is ok and good, but relaying on
> guessed filetypes as default is really not good, since it works only
> in 95% of the cases, 

95% is realy good, where it doesn't work we need to fix.

> is slow 

This I agree is bad.  Perhaps first guess should be extention with
second pass being content sniffing.  I like sniffing because sometimes I
have a .avi file that is really divx.  Some AVI files I can read, some I
can't so seeing the preview is nice because I know I can view it
regaurdless of the extention.

> and can lead to pretty much unexpected
> behaviour and worst of all, is unfixable by the user. 

I've fixed all my problems such as html being open in a web page by
editing the prefered applications pref dialog.  I agree that it isn't
very HIG'y or easy to use but that is just something that needs fixing.

> And last not
> least, all other operating systems currently really more or less on
> the file-suffix to be correct, since so far its the only portable way
> to transport the filetype.

Is this true for the Mac?  I don't think so.  Besides why should we
always follow what the other guy is doing?  We just can't win I guess. 
One side wants to do what is safe, the other what is innovative and in
the end there is always disaggrement which isn't always bad. Too bad we
can't all get together and decide on one format to embed mime-types.

> > Sniffing is always appropriate. Content type should be determined by
> > content, not by name.
> Its not by name, its by the filetype that was given the file by the
> user or application. And I certanly know a whole lot better how my
> files should be handled than some magic not user-modifiable
> auto-detection thingy. 

I think content should be tagged by content but since we have an
imperfect system the user should be able to modify the mime-type,
overriding the sniffed content type.  Problem is, this is not portable
but perhaps that is good.  It should allow for the content sniffer to
become more robust.

> The file-suffix gives a hint on how the data is
> meant to be handled, guessing can just suggest some additional ways
> how the data might be handled, but that might be correct or far of the
> target, depending on the content and the quality of the guesser.

Exactly, the suffix give a HINT but is not authoritive.  If the bugs
could be worked out of the content sniffer (speed, correctness, ect.) it
is a much better solution.  Again the best solution would be for the
file to say, "hey I am this type", in a portable and precise manner (ie
embeded mime types) but we live in an imperfect world.  I personaly
think that (expecialy with XML and the hundreds if not thousands of file
types defined by it) file extentions are a step back.  As a
backup/quick-guess method of figuring out file types they work well.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]