Re: Making metadata storage SQL-driven



Manuel Amador wrote:
Hi there, everyone participating on this list.

I really don't like the direction this thread is going.

I see you guys getting all wired up on tech and performance and I bet no
one in this thread has actually sat down and fleshed out all
requirements in a detailed fashion.


I have to some degree. I expect writing one to be iterative.

Let's visit a quite simple example of the things we should be thinking
of, which is simply *infeasible* with this cobbled-up embedded MySQL
solution.

There are two kinds of metadata, which need to be handled differently:
- per-user metadata: song ratings, file emblems, notes
- per-system metadata: document author, song tempo, file MIME type, file thumbnail preview

The MySQL embedded / DBUS solution everyone is drumming up here is quite
appropriate for per-user metadata.  But it will make our lives extremely
hard for per-system metadata.

per-system metadata is out of scope for this. You would need a root based solution for this (bad idea) which will compromise privacy of data and security (cf the flak google desktop search got when it was initially released system wide). I dont want others seeing whta docs i have in my home folder (be it through thumbnails or by author)!


The fact that users may also want to set "per-user metadata" system-wide
(say, an admin setting emblems on company folders for all their users)
is also something that needs to be accounted for.

I hope you guys won't take this longish remark personally.

Getting back to the embedded MySQL thingie.  I've seen many of you bash
EAs as not being "performant enough", as if that was the only thing that
mattered, or as if it were more of a show stopper than a SIGSEGV.  Even
if that was true:

you cant use EAs exclusively because they are only supported on Linux and on certain filesystems and you can only set them if you have write permission on the file. The other issue is that EA's are limited to key/value pairs so you cannot *efficiently* use them for relational data such as contextual stuff. (they are also limited to strings so no blobs like thumbnails and maximum size of all EAs on a file is 64KB)



* Performance is not the biggest concern here.  It's functionality and
coherence!  I really, really propose per-system metadata should be
stored in an *extended attribute*: it's a POSIX standard, and it will
make cooperation between projects (I'm thinking KDE and console-based
utilities here) much, much, much, much easier and less troublesome.  If
current implementations have bugs, it's time to start putting pressure
on implementors to fix their fucked-up things and get on with it.

* Even if EAs turned out to be horribly slow (which they are not, on
most filesystems) and inefficient, you still need a common ground for
many apps to get metadata.  Command-line apps and the like.  KDE.  Do
you honestly expect they'll link to your metadata libraries?  The KDE
guys haven't linked to glib ever, so I say, hell no way that's gonna
happen!  You *need* to store and read them from extended attributes.
I'm sure you'll then build a cache or something, or even start using
Beagle to cache and query metadata.


Thats right, EA's are not centrally indexed AFAIK. So you would still need a DB or indexer to store all the EA's values centrally if you wanted to perform a search efficiently (else you would have to go to every single file on disk and retrieve its EA's during a search which is clearly unacceptable performance wise). Note KDE4 are using postgres RDBMS for their metadata framework (tenor) and AFAIK they are not using EA's (or at least I haven't see that in their plans so correct me if Im wrong here)

Start lifting use cases from competing operating systems (Mac comes to
mind), and come up with a few use cases yourselves.  Examples:

both Mac and beagle (and KDE4 and vista) store them in their own databases! They do this for speed and superior querability. Mac and windows can use EAs and index them too as they control their own platform but we dont and thats why neither KDE, Gnome or beagle ( which only uses them to tell if they have indexed a file) can rely on them.

(do we really need to continue on this thread? If alex is happy to accept a dbus intereface for a metadata server in Nautilus then we dont need to worry whether its text based, EA based or DB based. Nauitlus already has such a search interface which beagle can use so I intend to make use of that interface too)


--
Mr Jamie McCracken
http://www.advogato.org/person/jamiemcc/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]