Re: GLib file magics



Tim Janik <timj@gtk.org> writes:

> On 27 Jul 2000, Maciej Stachowiak wrote:
> 
> ok, first, when i pondered a suitable solution for BSE, gnome-mime didn't
> handle magics (and was sitting in gnome-libs, iirc).

Yes, it used to be in gnome-libs, but it has handled magic-style
matching for an extremely long time, a time which I am quite certain
predates BSE. You must not have looked very hard.

> so now, i had a look at gnome-vfs-mime-magic.c, and to not require the
> rest of the gtk-devel audience to compare literall code as well, i've
> compiled a rough pro/con list for both approaches:
> 
> gnome-vfs-mime-magic.c:
> -       closely tied to file io (even so for the mime specification)

This is lame, and we are planning to add versions of the interfaces
that can work on in-memory buffers (and tell you how much a prefix you
need to do all the magic tests, or have some kind of callback setup).

> -       misses 'u' prefix for types
> -       doesn't handle an of the >, <, x, &, ^ test checks

I have no idea what these are, I assume this is for additional kinds
of tests. We have not missed these in defining any types so far, as
far as I know, so I am not sure why that's espcially bad.

> +       features offset ranges (non magic(5)), with NUM:NUM
> +       can dump mime table
> +       handles date, ledate types
> 
> GMagic:
> +       implements the important subset of mime(5) including
>         comprehensive numerical tests

When you say mime(5), do you mean magic(4) or do I have a wildly
different set of man pages from you?

> +       provides size type extension (required by gimp)
> +       accounts for match collisions with priorities

The gnome-vfs-mime interface does too, I think, although priority is
implicit in the ordering of the list of magic rules.

> +       provides a generic interface (no global list, return
>         values are user-defined)

I claimed earlier that this is not particularly an advantage. Maybe I
am wrong, but I;m not sure why.

> +	does match attempts on byte streams
> -	doesn't know about mime-types
> 
> for gnome-mime, i have to say that being unable to do >/< checks
> on the numerical values is a very big minus, you're unable to
> e.g. cover version ranges of file formats that way.
> 
> as for the + in handling dates, take a look at
> /usr/share/misc/magic which only uses dates to improve
> verbosity of the output, or ./data/mime/gnome-vfs-mime-magic
> that doesn't use date/ledate for file checks either, so
> the effective benefit of that radically approaches zero.
> in any case, it'd be a 10 second thing to alias that to
> long/lelong (+belong) for the GMagic code to allow for
> numeric matches there as well.
> 
> as for the offset NUM:NUM extension, that seems to enable the
> code to look for a given string in a specified range of the
> file, so there can be magics like:
> 0:64    string          \<!DOCTYPE\ HTML                        text/html
> 0:64    string          \<HTML                                  text/html
> 0:32    string          \#include                               text/x-c
> 0:32    string          \#ifndef                                text/x-c
> 0:32    string          \#ifdef                                 text/x-c
> that's definitely an interesting feature (though the gnome-mime
> implementation looks awfully slow, with range_end-range_start number
> of reattempts of full sub-pattern matches). i may consider something
> like that for GMagic as well, but then i'd probably base it on
> the wildcard pattern matching algorithm that we use for widget
> path matches in rcfiles, and i'd use a new type name for that,
> since that's a completely non-mime(5) thing ;)
> 
> oh btw, the match masks than can be appended to the types by using '&'
> (e.g. "0 leshort&0xffaa >0xaa # test every second bit in lower byte")
> have to be appended _without_ extra white spaces, the extra
> eat_white_space() calls in gnome-vfs-mime-magic.c should prolly be
> removed (that's to maintain field order).
> 
> > The current code does assume a
> > central list of magic info and filename/extension patterns, and always
> > returns a mime type rather than user-defined data, but I see no
> > particular reason to allow either the match set or the return value to
> > be user-supplied.
> 
> all in all, i'll certainly not move that code into glib, it'd
> not be suitable for gimp (doesn't feature normal mime(5) numeric
> matches), pixbuf (and can't operate on byte streams) or bse (doesn't
> provide the magic table in a file, gnome-mime relies on ordered
> magic registration since it doesn't account for collisions).
> it more looks like you'd get a huge win out of basing the
> mime magic matching backend on GMagic and simply use GMagic's
> data pointer for mime type specs. that is, once GMagic
> features ranges, though i'll probably not add those if i don't
> see backend reuse intends from the gnome-mime side.

We could only do this if it were possible to implement all the
gnome-mime features based on it. 
 
> as for why a user-supplied gpointer data; member is usefull,
> well, gnome-mime can stuff it's mime-type string/struct there,
> BSE can store a procedure type of a loader function there,
> gimp can use that for a PDB proc identifier, and gmagictest
> can use it for storing messages, i do see a use there ;)

It seems more logical to me to map a mime type to a loader after
determining mime type, than to require everything to construct it's
own magic table mapping to loaders. One benefit of this is that even
if you don't know how to load a file type, the centralized file type
database knows what it is, and you can use that info to give a nice
error message, like "The GIMP cannot load files of type
image/x-proprietary-patented-microsoft-format" instead of just bombing
generically. In fact, gnome-vfs also provides a layer for translating
the mime type into a human-readable string.

Anyway, I was hoping to reduce the amount of wheel reinvention going
on here but I guess I have failed. I'm looking forward to the "which
API should I use to determine file types" questions from developers.
:-)

 - Maciej




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]