Re: Why file content sniffing sucks



the whole detection on "special" filename is a bit weak. there are
potentially a huge number of them and it is an unclean hack to have to
test against them.

also, why wouldn't I want to have my bash or perl scripts have more than
a simple "exec" icon? they are not ELF executables after all, but if you
just make it so that any file with a +x bit set is auto magically
'detected' as an executable mime-type (whatever that may be), then the
user loses out on a lot of potentially useful info.

I personally have always found explore.exe's biggest weakness to be
unreliable "detecting" of file type.

this whole argument has actually somewhat reminded me of Outlook and
some other Windows mailers that like to send mail with
charset=iso-8859-1 when they are really sending out windows-cp1252 (or
sometimes something else completely unrelated to iso-8859-1).

this has led several clients down the road of sniffing charsets.

in the same spirit as auto-charset sniffing foo, I propose a solution:

let the extension override the content-sniffed mime-type IF AND ONLY IF
the sniffed content can fit within the mime-type that the extension
suggests.

ie:

if we have file.txt that contains HTML, show the mime-type as text/plain
since text/html will gladly fit within text/plain. yes?

actually... come to think of it, content sniffing probably doesn't ever
get anything wrong other than text types anyway... (ok, and .dat which
is often detected as mp3, but hah! same as explore.exe)

this makes it real easy. if after that 8-byte read (or whatever is
needed to cover all the common mime-types-with-magic-strings), if it
looks like text, we can fall back to the extension if one exists, or, if
not - continue with the sniffing (since likely detecting text types
accurately requires a bit more sniffing than most binary types).

Jeff

On Sat, 2003-12-27 at 15:09, Colin Walters wrote:
> On Sat, 2003-12-27 at 05:57, iain wrote:
> 
> > What happens if README has its executable bit set.
> 
> Then display it as an executable.  Someone went to the trouble of
> setting that bit, so it should be respected.
> 
> > Or wine doesn't have its executable bit set for my user?
> 
> Then display it as an executable, but with a special icon that says you
> can't execute it.  Similar to the read-only icon.
> 
> > Or what happens if wine was simply a list of wines I wanted to buy for
> > the new years party, and actually has mimetype text/plain?
> 
> In that case, when we don't have any other information about the file
> (no suffix, no executable bit, isn't a "special" filename like README),
> detection by content is appropriate.
> 




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]