Re: [gnome-love] Meta Information in GNOME

From: Ryan Muldoon <rpmuldoon students wisc edu>
To: Jonathan Bartlett <johnnyb eskimo com>
Cc: gnome-love gnome org
Subject: Re: [gnome-love] Meta Information in GNOME
Date: 16 May 2002 23:54:20 -0500

On Thu, 2002-05-16 at 23:04, Jonathan Bartlett wrote:

software community are interested in something like this.  But
unfortunately, to do this well, it needs to be a LOT lower down than
gnome.  I've come to the conclusion that it needs to be an OS feature,
so filesystems, system utilities, etc can take advantage of (or perhaps
more accurately, not break) metadata.  Nautilus and other GNOME things
would be great clients to the OSes metadata service, but they shouldn't
be the service providers.


Actually, I disagree.  I've thought about this a lot, and it seems that as
an OS feature, it would really suck, especially for multiuser systems
(each user needs their own copy of the meta-data, especially when they
disagree).  Also, for the OS to be involved with doing callbacks to create
cached metadata like thumbnails would be insane.

Except that you then need to make all tools that deal with files of any
sort aware of metadata.  Which is an enormous undertaking.  What happens
if I move a file with CLI tools instead of Nautilus?  If the metadata
isn't attached to the file, it gets lost.  Same with emailing someone
the file, etc.  Or, if I have metadata on another user's files, how can
I track where it goes?  These are really basic problems that need to be
solved.  Remember also that things like thumbnails are essentially an
implementation detail.  No one said that the OS needs to come with
thumbnail generation.  But it should be aware of metadata, otherwise
your metadata database is going to break pretty fast.  There are
certainly userland parts to this problem, but there are definitely OS
level parts.

    If you're interested in doing more research on this, I would suggest
looking at the Semantic Web work that the W3C is doing, as well as
efforts like Dublin Core and what the Library of Congress is doing.  One


I've looked into this a little, and I think they are good pointers to how
the internals might work.

I wouldn't say that they are good for internals....but they are useful
for understanding the issues, and how to approach them.  They bring out
a lot of the annoying little problems that are not necessarily obvious
at first glance.

tricky thing you end up with is where the MIME type doesn't really map
to the actual "type" of the document.  Like a JPEG that is a scan of a
book. That cuts down on what you can do programmatically.  The Open
Knowledge Initiative, mainly spearheaded by MIT and Stanford, is also
working on a meta-information management API....as those apis become
published you may want to take a look.


Where does one find this project?

web.mit.edu/oki/
The metadata apis are not anywhere close to being published....probably
not until this fall at best.  We're still banging away on the apis that
metadata requires.  But metadata is a big part of the planned oki
functionality.

Of course, the other two huge
hurdles to something like this are how to deal with the network (how
does one transmit all that metadata across the wire?  especially with
non-metadata-aware OSes, and filetypes that don't natively support
metadata), and how to make the algorithms for generating accurate
metadata not all that intensive computationally.


The issue for individuals managing their own metadata is inherently much
more simple.  It doesn't even require support from external entities
(like websites) at all.  For example, as a user, I could mark categories
on my email, web page bookmarks, and documents.  Then, when I use Nautilus
to browse by category, it pulls up links to the relevant information.

Yes, but if I have my mp3 collection all nice and metadata-rich, and I
give it to you via webdav or ftp, don't you want to have all the
(objective) metadata, instead of having to re-create it all?  Same with
documents, presentations, etc in a workplace.  Unless you get something
for your effort, you're not going to add any metadata in the first
place.

What you are describing (just doing simple category-based searches on
data) is basically what I was suggesting you work on with Medusa and
emblems.  I'm sure lots of people would be happy if you picked that
project up and did some work on it.

Metadata is useless
unless most of it is generated automagically, because users won't be
bothered to add it in.


Although this might be true for home users, heavy users of information
will probably think differently, like those who have to manage a thousand
projects.  If they can tag all of their relevant resources - emails, web
pages, documents, with each project's tag, and also mark priorities, it
becomes very useful.

Home users are becoming heavy users of information. ;-)
However, you have to think about what the user burden is in adding all
of this metadata is.  And what the programmer burden is if it can't be
automatically taken care of by system calls.  I have seen many projects
that aim to make information management easier fail because they end up
just creating more work for people.  So people just stick to the old way
because it is less of a pain in the ass (even if it is only in the short
term).  Anything that takes constant effort is probably not going to
work too well.  That is why generated metadata is a good idea.

Then you get into issues of "degrees of
accuracy" with how well the computer can guess what the correct metadata
is.  So you lose the normal bivalence of computation, and have to deal
with vagueness in logical operations (because most metadata stuff boils
down to testing a logical proof's validity).


I think this is where people start becoming architectural astronauts, and
trying to make a self-aware being through metadata.  Real usage of
metadata is much simpler.

In very limited contexts, yes.  But there is nothing here about sentient
computers. It is just about realizing what the actual problem scope is. 
But maybe I'm biased, as my day job is an IT architect, and I do
research in logic and vagueness.  ;-)

There are two ways to think about the problem - you are essentially
trying to create a userland database with a controlled language for
categories, and the restraints that only certain clients can touch any
files on your computer.  While it is the faster way to go, I think it
lacks robustness.  But as I say, the Medusa thing would be a great way
to get a lot of what you want, without all that much work.  Something
definitely feasible for gnome 2.x.

Anyway, would you happen to know who on the GNOME team would be most
interested in this?

Joakim Ziegler has expressed interest in things like this, but I haven't
seen much public gnome activity from him in a little while.  Rebecca
Schulman (I think that's her last name, I don't quite remember) wrote
most of Medusa, and so might be a good person to talk to.  Someone is
also working on a thumbnailing standard, so that might be a good person
to talk to.  I have a feeling that others might get on board once you
have some demonstrable code, but most of the core gnome people are
pretty damn busy for the forseeable future, it seems.

        --Ryan

Jon

On Thu, 2002-05-16 at 21:25, Jonathan Bartlett wrote:

For anyone who's interested, I am writing a document describing how GNOME
could assist users in keeping track of all their information, better than
any other system I'm aware of.  Please take a read at

http://www.eskimo.com/~johnnyb/computers/MetaInformation.html

and let me know what you think.  I know now is probably not the
appropriate time for major infrasture changes, but it's an idea for
something to do post-2.0.

Jonathan Bartlett

_______________________________________________
gnome-love mailing list
gnome-love gnome org
http://mail.gnome.org/mailman/listinfo/gnome-love

Attachment: signature.asc
Description: This is a digitally signed message part

Follow-Ups:
- Re: [gnome-love] Meta Information in GNOME
  - From: Jonathan Bartlett

References:
- Re: [gnome-love] Meta Information in GNOME
  - From: Jonathan Bartlett

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]