Re: Persistent Binary Search Trees



On Mon, Oct 5, 2009 at 3:05 PM, Dana Jansens <dana cg scs carleton ca> wrote:
> Hello,
>
> I am interested in contributing code for persistent binary search
> trees to the GLib project.  This would actually count as credit
> towards a grad level data structures class for me, but I have a lot of
> experience working with GLib in the past, and developing C-based open
> source projects.  First some background on what persistence means and
> why it would be nice to have available in an open source data
> structures library.
>
> A persistent binary search tree is simply a BST that is versioned over
> changes, such that you can do a lookup in any previous version of the
> tree.  An efficient implementation of this data structure requires
> O(1) additional space per insertion into/deletion from the tree.  That
> is, after n insertions, creating n different versions of the tree, the
> total space required would be roughly 2*(space for a single version),
> while allowing you to search in any version of the tree.  Searching in
> any version takes the same number of comparisons and pointer
> redirections as a tree with only a single version.
>
> Persistent BSTs are used for operations such as a Plane Sweep [1] in
> Computational Geometry.  At the same time, the code behind the
> persistence can be generalized to provide an additional data structure
> that performs Fractional Cascading [2] for no extra cost to the BST
> implementation.
>
> There are currently no readily available open source implementations
> of a persistent BST.  This would be a great tool for students,
> academic researchers, and other developers.
>
> I have noticed that GLib has a Binary Search Tree data structure,
> which currently uses an AVL tree.  AVL trees can be made persistent,
> but the space requirement becomes O(log n) per insert/delete instead
> of O(1).
>
> There are 2 other BSTs that would work effectively to provide a
> persistent BST, while also maintaining a very efficient implementation
> for when persistence is not used.  They are:
>  - Treaps
>  - Red-Black Trees
>
> Treaps can be expected to outperform Red-Black Trees for any sized
> tree.  The constant in the search cost for a Treap is Approximately
> equal to AVL trees (and slightly less for large trees), and is always
> smaller than Red-Black Trees. (It's a small constant regardless.)
> Treaps are also much simpler to implement, requiring less complicated
> code. However Treaps are a randomized data structure, which some
> people try to stay away from.
>
> My proposed changes to the GTree API are the addition of the following
> functions:
>
> // returns the latest version of the tree available (starts from 0)
> guint    g_tree_current_version             (GTree *tree);
>
> // creates a new version of the tree, changes to the tree will no longer
> // affect the previous versions. returns the new current version
> guint    g_tree_next_version                (GTree *tree);
>
> For each lookup/traversal function, a new function which takes
> arguments (GTree *tree, guint version, ...) would be added, with
> existing functions becoming macros pointing to the new ones.
>
> This API allows a version to contain more than a single insert/delete,
> if desired, making them more of a logical entity.
>
> So my questions are this:
>
> 1) Is there interest in having a persistent binary search tree in the
> GLib library ?
> 2) If so, would there be resistance to using a random data strucuture,
> i.e. a Treap to implement them ?
>
>
> Thanks very much,
> Dana Jansens
>
> p.s. I sent this before subscribing, but its been stuck in the
> moderator queue for more than a week now, so I'm sending again.
> Apologies if you see this twice.
>
> [1] http://en.wikipedia.org/wiki/Sweep_line_algorithm
> [2] http://en.wikipedia.org/wiki/Fractional_cascading
> _______________________________________________
> gtk-devel-list mailing list
> gtk-devel-list gnome org
> http://mail.gnome.org/mailman/listinfo/gtk-devel-list
>

Hi,

I'm just a glib user as well, not a dev, but let me get this right.
For example lets take a directory with files in it and all those file
names are stored in the tree.
Now i rename a file. would that mean a new tree gets created with the
new filename and the rest remains the same?
kinda like version control stuff works.

Example:

--Somedir
-- - file 1
-- - file 2

< rename file 1 to file 01 >
-- Somedir
-- - file 01
-- - file 2

And how i understand it this is what would be in the tree:
-- Somedir
-- - file 01 -- (older tree version) file 1
-- - file 2

So, resulting in a tree history per file..?
I can imagine this being efficient for something like mac's time
machine only then for linux ^_^

Or am i completely wrong here?


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]