Re: new mime detection approach



On Wed, 2004-01-21 at 00:27, Manuel Amador (Rudd-O) wrote:
> El mié, 14-01-2004 a las 12:54, Alexander Larsson escribió:
> 
> > How would such a metadata system work? If it uses a file per file, or
> > EAs then it would instantly be as slow as sniffing.
> 
> This assumes a certain, flawed or biased, implementation of EA or XML
> files, biased against them.  The ideal file-based implementation would
> use one XML file per directory and write locks (making large directory
> reads for metadata REALLY fast).  The ideal EA-based implementation
> would leave the optimization to the file system (EA-supporting FS
> implementations already are taking care of this issue) perhaps being
> even faster than the XML option.  I recognize that reading file
> extensions would be marginally faster (not 400 times) than reading MIME
> types from an EA store, but the usefulness of the EA store suddenly
> makes possible to store lots of things for which you previously had to
> look on other parts of the disk, actually enhancing your possibilities
> as a developer while keeping the speed to yourself =).

While on the other hand you assume and ideal EA. In reality, current EA
solutions are slow, since you have to follow a link from the file inode
to the EA block, leading to an additional seek and the long time that
takes. Reading this EA block is essentially equivalent to reading the
first block of the file (what we do for sniffing).

Take a look at this table for some actual measurements:
http://www.suse.de/~agruen/acl/linux-acls/online/#tab:cold-cache

Although, as the table suggests, by wasting space in all inodes, small
EA lookups can be fast. However, no normal linux filesystems today are
set up to do that, and the gains do not come for free, larger inodes
means lower general performance and waste of space.

Its possible that in the future we'll get your "ideal EA
implementation", however given what I am told by the kernel people I
talk to, I wouldn't bet on it.

> >  If it used a
> > metadata system with e.g. one file per directory it could be fast, but
> > that adds a *lot* of complexity with locking and consistancy handling,
> 
> The only thing you need to do regarding locking is:
> 
> lock(metadatafile)
> do your thing
> unlock(metadatafile)
> 
> that's it.

Thats very much not it. First of all, actual real, non-theoretical use
of file locking in GConfd shows us that we can't reliably use it due to
NFS homedirs, AFS and various other issues. Secondly, for metadata to be
practically useful you have to cache them in the app using it, and just
locking around file access doesn't tell other apps that the cache they
have is invalid, and needs to be blown away. This introduces races where
one app can overwrite metadata written by another app.

> You can still use EAs. EAs will also be supported by the GNU file utils
> and tar and the like, and I sincerely don't think a metadata file would
> ever be supported by them (other than collaterally).
> 
> The important issue is:
> 
> * you can use EAs where they are available, and use file type detection
> where they aren't - you can leave the details for other people *
> 
> > file permissions,
> 
> This issue has been cleverly solved by others.  Of course, I know
> there's no solution for the case of a file-based metadata
> implementation, but then in those cases you could choose to store only
> "unclassified" information, such as the MIME type.

How is this solved? You can only change the metadata for a file if you
have write permissions to the file. You don't actually have this for
many locations, such as system directories or shared project
directories. Of course, you could just use EAs for your homedir (if its
not on NFS), but that sounds pretty restricted.

> >  and possible out-of-process communication. There has
> > been thoughs about a common metadata system, but they are far from near
> > implementation,
> 
> The Mac has a working, fine metatada implementation.  Windows XP and
> Longhorn both have metadata implementations in the form of streams
> (Longhorn even has WinFS).  The fact that you think no people from the
> FLOSS camp have tried to implement it doesn't mean it's impossible to do
> (in fact, people like Hans Reiser - ReiserFS - and Seth Nickell -
> Storage - have done great advances in the field).

To get reliable metadata you either do it in the kernel (EAs in
ReiserFS, 685 times slower than pure file lookup according to table
above) or through a daemon (won't be a speed demon). This isn't rocket
science, but doesn't mean that storage of mime type as metadata is an
ideal solution.

> > Additionally a metadata-based system would mean any non-nautilus
> > operation on the file could make it drop the metadata,
> 
> I agree on this if you used an XML implementation.  This can't happen on
> an EA file system.

When we have pervasive support of EAs in things like cp and tar it'll be
a bit better. But you'll still lose stuff when you copy to nfs, copy to
an smb server, burn on a CD and various other cases. In contrast, the
sniffing/extension model works (to the extent it works, its not perfect)
even if you email the file to someone.

> > or make the metadata stale.
> 
> See previous sentence.

This might well happen on EAs if a file is e.g. overwritten with new
contents.

> >  It would also have problems
> > with read-only media and probably other complications.
> 
> Why would it have problems with read-only media?  If the media is
> read-only, any updates to the metadata store HAVE to fail silently. 
> What's the problem.

If metadata writes fails (say on a cdrom) you can't use the metadata, so
you'd have to fall back to some other means. (And there won't be any
metadata already on the cdrom.)

> This sounds like FUD.  Sooner or later it will happen: a metadata system
> will have to emerge, and while it hasn't, it would be wise to take the
> lead instead of ignoring it.  Being leaders ensures continuous attention
> from the mainstream, helps you become the one that sets the standard,
> and has other advantages.  It's in our hands to be leaders or
> followers.  And it doesn't sound like we're being leaders (see
> Longhorn).  We're still on a time frame to beat Longhorn on a workable,
> fundamentally correct, functional, fast implementation of a metadata
> system (by putting at least Storage and EAs together).
>
> (I feel that the role of Storage as an additional layer above the file
> system is a little bit heavy.  I'd rather have Storage *cache* the
> metadata for which the file system is actually the primary source, and
> update the file system whenever a new file comes into "acquaintance"
> with Storage.  Storage should also have the role to provide APIs to read
> and write in a standardized way to the metadata store, which ultimately
> would be the file system, NOT Storage's database.  Storing things like
> file comments, links to people objects, links to mail messages, stuff
> like that, stuff that will enrich the desktop experience much, much more
> than wondering whether Nautilus should detect files by content or by
> type).

This sounds like you're eager to implement such a system, maintain it
and push it through into a well working, much used standard. It might be
doable, it might be right. Who knows, only the future can tell. This is
the beauty of open source, you can convince me, the skeptical doubter
maintaining the current system that your shiny new idea works. By doing
this you can also further the open source desktop. And if you're wrong,
you're only spending your own time playing with something you think is
interesting.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Alexander Larsson                                            Red Hat, Inc 
                   alexl redhat com    alla lysator liu se 
He's a superhumanly strong zombie jungle king possessed of the uncanny powers 
of an insect. She's a mentally unstable antique-collecting Hell's Angel from 
the wrong side of the tracks. They fight crime! 




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]