labeling interface idea



Hey- I'm an undergraduate CS major currently participating in Google's SoC for Inkscape.  There was a project I almost submitted to Gnome, but ultimately decided it was too large to be doable in Google's timeframe; its still something I'm interested in doing though, and I was hoping to find out whether its something Gnome could support in a similar manner (minus the Google funds, naturally). 

Briefly, I'd like to add something similar to the labeling functionality in gmail to the linux desktop; I've attached a (much!) more detailed proposal.  I'm sure that the Gnome community would be happy to see me work on this, but I'm actually looking for something more along the lines of a straight yes/no as to whether its something the developers could provisionally endorse and provide mentoring for.  If so, I'd work on it for credit as independent study through my university (Gnome would have no interaction with my school, I just wouldn't want to take something like this on without community assistance).  Regardless, any feedback about the proposal is welcome.  Thanks,

Greg
Gnome Labeling Interface

OVERVIEW

Since the rise of Unix, operating systems have presented the
filesystem as a tree, and obliged users and developers to either
organize their information in trees or create userspace datastructures
that are inaccessible to the operating system.  Many alternatives have
been proposed and developed over the years, often under the name of
"database filesystems", but have never caught on- possibly, the author
feels, because the conceptual difference between the relational
database model and the tree model is too large, and no system that did
not offer backward compatibility could be accepted in the real world.
Very recently though, a new metaphor has started to gain traction.
Although it appears under different names, such as "labels" or
"virtual folders" or simple boolean keyword searching, all these
fundamentally work by organizing information in sets, not trees.  I
propose to add a powerful set metaphor for organizing files in Linux
to Gnome, while taking advantage of the closer conceptual distance
between trees and sets to implement its backend in a way that
preserves intuitive access to files through the traditional filesystem
interface.

PROBLEM

The limitations of the hierarchical filesystem model can be seen in a
simple example.  Suppose that you're downloading pictures of your
cousin Alice's wedding.  You currently store your pictures in one
folder, but you're starting to have enough pictures for this to become
unwieldly.  Should you create an "Alice" folder, a "Wedding" folder,
an "Alice's Wedding" folder, an "Alice" folder containing a "Wedding"
folder, or a "Weddings" folder containing an "Alice" folder?  Or
suppose that you've organized your music by genre, with Willie Nelson
filed under "Country", then you buy Nelson's new reggae album.  This
category of organiztional problem is currently solved in userspace by
music and photo management applications.  Neurotic users can even
develop partial solutions in the filesystem by using links.  But in
the year 2005, enough application domains have encountered these
issues that the creation of a general solution has become worthwhile. 

PROPOSED SOLUTION

I propose to add "labeling" functionality to the Gnome desktop.  My
implementation would place high importance on making it usable by as
many existing applications as possible, by preserving backwards
compatibility with the regular filesystem.  I'll discuss the
data model it would provide first, then the way it would
(initially) be implemented.

Data Model

In brief, this project would implement something often called a
"document store."  It would be expected to be used to store the kinds
of files usually considered to be "documents", though its not limited
to that.  The document store is intended to implement, from the user's
perspective, labeling functionality similar to that found in GMail.

At its simplest, the store would contain two kinds of objects: labels and
files. Labels are unique strings representing sets of files.  Files
are bitstreams, essentially identical to traditional files (including
traditional metadata like Unix permissions, modification times, etc.),
with one exception: they don't have filenames.  In the context of the
document store, files are instead accessed purely by querying for some
label or combination of labels, and selecting one of the search
results.  Processes themselves would access file contents either
through the systems own api, or, more likely, by being given a working
traditional filename for the file contents.

The system described above, while useful, doesn't really add
functionality compelling enough to make it preferrable to one of the
already existing solutions.  However, this proposal asserts that much
more power becomes available when you add a third kind of object to
the store: namespaces.  The addition of namespaces preserves the
simple GMail-like labeling and organizing functionality described
above while providing a powerful hook for Gnome and other applications
to add commonly-understandable metadata to files in a manner that is
both easy to develop for and powerful enough to solve many of the
data structuring problems traditionally dealt with through databases.

Namespaces are ways to group collections of labels.  They are NOT sets
in the sense that labels are, because a label must belong to one and
only one namespace, while files can belong to any number of labels
(including labels from different namespaces).  Every user would be
given their own personal namespace, almost analagous to their home
directory, in which they could create and delete whatever labels they
liked.  This is the namespace that would be used for the GMail-like
functionality described above.  However, by the creation of additional
standardized namespaces, say, for freedb categories, it
becomes possible, using only the label metaphor, to allow users to
perform arbitrarily interesting queries, such as "find all jazz songs
that I've marked as 'favorites'" in an application independent manner.
And, by using the xml technique of allowing mnemonics to stand in for
full namespaces, it even becomes possible to do this in a reasonably
intuitive command-line format, such as "freedb:jazz and my:favorites".
Other examples of possible standardized namespaces are mime-types,
programming languages, content language (i.e. English, Spanish...),
etc.

Implmentation

This functionality, while cool, would still probably not be successful
if it required programs that wanted to access information in the store
to be recoded to use the store API.  But its possible to implement
this so that all files in the store remain accessible as regular
filesystem files too.  In particular, its possible to view the
traditional filesystem itself as simply a collection of fundamental
objects (ultimately represented by inodes) that can be referred to by
one or more names (hard links).  Since directories are ultimately sets
of files, the above data model can be implemented by creating a
directory for every label, and putting a hard link to the file in the
directory for every file that has a certain label applied to it.  If
label directories, in turn, are contained in directories representing
namespaces, the whole model becomes accessible through the regular
filesystem interface relatively intuitively.  The whole store can be
represented as a single root directory that contains a subdirectory
for every namespace, each of which contains a subdirectory for every
one of its labels, each of which contains a file hard link for every
file that has that label applied to it.  By using this implementation,
you get the Unix permission scheme for free, and get reasonable file
access performance (as opposed to backing the store with a relational
database).  Searching will have to be done through a separate index,
probably initially backed by SQLite, and the index will have to remain
synced by preventing direct file creation and deletion within the
store (i.e. non root users must not have write access to the label
directories).  The store api is accessed through a service that is
normally started with the gnome desktop (and that is run WITH write
permission to the store directories).

The real filesystem names of the files should be essentially random;
this is a feature not a bug.  Assuming the use of the 26 lowercase
characters and the 10 digits, no filename should need to be more than
4 characters long.  The details can be worked through later, but
different links referring to the same inode would be given the same
name in all label directories, and the canonical filename would be
under, say, and "all" label under a reserved system namespace that
contains links to every file object stored by the store.

I'd like to implement this in Mono, but could be convinced otherwise.
I also intend to put significant syntax restrictions on label names,
such as banning whitespace, but could be convinced otherwise.

USAGE

To use this system in the real world, users would start some small
program, maybe even a panel applet, that provides a search interface,
and which allows users to see the the backend filename of files, or
copy it to the clipboard.  The details of this interface are
important; we want to wean users from thinking in terms of file path,
but still need to make that info available (perhaps search results
will only be identified by a Nautilus-like preview image, but the
currently selected result icon's path is visible in a separate pane).
In real use, users will likely perform boolean label search (i.e. A
and B or C not D), with the ability to sort results by size, last
accessed, etc.  

In the distant future, if something like this were successful, the
Gnome filechooser widget could be modified to include a store
interface.

NAMESPACE VERIFICATION

This probably belongs under "DATA MODEL", but one feature I'm
considering including, eventually, is something analagous to xml
schema modification.  While brainstorming this, I thought it might be
useful to create 2 kinds of labels: normal ones, and ones that could
only be applied to one object at a time, and thus could be used more
like filenames.  I then worried that this was an arbitrary decision;
what if some developers wanted labels that could only be applied to
two objects at once, or labels that are mutually exclusive (for
example, an object should only have one mime-type label at a time).
Ultimately, it seems that the needs of developers would be too
diverse, and the only truly general solution would be to allow
namespace creators to define these limitations in code.  As such, I'm
considering forcing namespaces to be real URLs that point to actual
code of some sort that can be used to verify the validity of a
namespace; if such code were provided, it could be run whenever a
change to a namespace's label assignments occurred (i.e. whenever a
label under a given namespace is removed or applied to an object).
The performance implications would have to be studied, of course, but
if it were possible to guarantee that the code runs in a sandbox with
no access to libraries, and that the only input it operates on is the
label assignments (as opposed to file contents themselves),
performance could probably be made acceptable, at least if the store
were used primarily as a store for documents, and not a general
purpose filesystem.  Because of these controlled execution
requirements, as well as validator code portability needs, I'd like to
implement this system in Mono.

COMPARISONS TO OTHER PROJECTS

The most exciting oss product currently targeting a similar problem
domain appears to be Beagle.  Because this system would be backed by
the normal filesystem, it and Beagle would be able to exist
side-by-side.  This system attacks a similar problem from a different
angle; while Beagle requries basically zero changes in how users use
the filesystem, this system attempts to provide greater searching and
organizational power by requiring a little bit more learning from
users and developers (though still as little as possible).  

The product that initially convinved me that something like this was
needed was Gnome Storage, whose development, sadly, appears to be in
hiatus.  However, I feel that Storage was probably too ambitious.  As
I understand it, it requires large changes in how applications and
users work, and I think it would have a hard time becoming accepted
even if development were active.  That said, the brainstorming of this
system began with the author asking himself, "Storage rocks.  How
could I try to solve the problems it does, but in a less ambitious
way?"  This system also took significant conceptual inspiration from
the KDE Database Filesystem project.

Apple's Spotlight and Microsoft's WinFS also target this problem
domain, of course, and regardless of whether this proposal has a
future, I hope that the OSS community is able to rally around some
comparable solution over the next 2 years.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]