Re: [Tracker] Storing Metadata with files



On 05/25/2010 03:00 AM, Martyn Russell wrote:
What I would like to do is to make it possible to have a tighter
coupling between the indexed files and additional metadata about them.

In what way?

Basically I do not want metadata to be removed just because the indexed
file apparently does not exist at the same place anymore.

Ideally, tracker should e.g. detect if a file has just been renamed and
migrate the existing metadata.


Secondly, I don't feel
comfortable with metadata and document to be separated that much. So far
I have the impression that tracker considers metadata to be mostly
transient in the sense that it can always be recovered from the file
itself, and in my case this would no longer be true. 

That's not how we consider it. User metadata is also important. It does
have certain restrictions with our current model though (such as > 1
application writing the same data about the same resource means the data
is overwritten each time - for example).

I don't have a
particular scenario in mind, but I feel that it's basically asking for
trouble if the simple act of renaming a file (or the indexed directory,
or temporarily changing the tracker index setting)s, would permanently
destroy all the associated metadata (even though no one is supposed to
do anything like that).

Hmm, what causes this for you? That's not expected.

Well, I thought tracker was deliberately designed to do that. Imagine
this situation:

 - A file "letter_to_company" is added the tracker archive folder
 - Additional metadata about "letter_to_company" is added to the
   tracker database with the dbus API
 - Someone wants to "clean up" the archive folder and moves
   "letter_to_company" into a "letters" subdirectory
 - Tracker indexes the "new" "letters/letter_to_company" with the
   metadata that's available in the file
 - Tracker removes all the metadata about the original
   "letter_to_company", because that file no longer exists
 - The metadata added via DBus is now lost irrevocably

Isn't that what's going to happen?

Therefore I like the idea of storing the metadata in a separate file.

This is far from trivial and we have moved away from separate databases
since 0.6 (for the same type of data) for a number of reasons:

 * Speed
 * Maintainable
 * ...

I agree, conceptually, it makes sense, but the reality is this is much
harder to do. We do use a journal to backup all the data, this would
include your user metadata.

Hm. Interesting. So there is a permanent record of every file that has
ever been added to tracker? Or is this journal expired from time to time?

How do I access this journal to e.g. obtain the metadata from a deleted
file?

So far it seems to me that the best approach to get this to work with
tracker would be to either extend the XMP sidecars extractor to extract
more information, or to add an entirely new extractor that reads a
tracker-specific separate metadata file. But maybe there also an
entirely different way to achieve what I want?

Hmm, a new extractor won't work here. To catch ALL files, you would need
to write a generic one and generic extractors are fallbacks for specific
ones at this point.

I was thinking about a specific extractor just for .xmp files which adds
the extracted metadata to the "real" file. Wouldn't that be possible?

Also, the extractor only gets the metadata for that file format, it
doesn't extract or insert the file metadata (size, name, etc).

I don't quite understand.

This would mean much data duplication in
user space and that should be avoided where possible.

I guess that means that I'll have to start coding in C again *sigh*.
What are your thoughts about accepting a patch for such functionality
into the official code base?

We would definitely accept a patch to fix this :)>

This sounds a little bit ambiguous :-). To be avoided but you'd accept a
patch?


Best,

   -Nikolaus

-- 
 ÂTime flies like an arrow, fruit flies like a Banana.Â

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]