Re: Persistent Binary Search Trees



2009/10/5 Dana Jansens <dana cg scs carleton ca>:
> Hello,
>
> I am interested in contributing code for persistent binary search
> trees to the GLib project.  This would actually count as credit
> towards a grad level data structures class for me, but I have a lot of
> experience working with GLib in the past, and developing C-based open
> source projects.  First some background on what persistence means and
> why it would be nice to have available in an open source data
> structures library.
>
> A persistent binary search tree is simply a BST that is versioned over
> changes, such that you can do a lookup in any previous version of the
> tree.  An efficient implementation of this data structure requires
> O(1) additional space per insertion into/deletion from the tree.  That
> is, after n insertions, creating n different versions of the tree, the
> total space required would be roughly 2*(space for a single version),
> while allowing you to search in any version of the tree.  Searching in
> any version takes the same number of comparisons and pointer
> redirections as a tree with only a single version.
>
> Persistent BSTs are used for operations such as a Plane Sweep [1] in
> Computational Geometry.  At the same time, the code behind the
> persistence can be generalized to provide an additional data structure
> that performs Fractional Cascading [2] for no extra cost to the BST
> implementation.
>
> There are currently no readily available open source implementations
> of a persistent BST.  This would be a great tool for students,
> academic researchers, and other developers.
>
> I have noticed that GLib has a Binary Search Tree data structure,
> which currently uses an AVL tree.  AVL trees can be made persistent,
> but the space requirement becomes O(log n) per insert/delete instead
> of O(1).
>
> There are 2 other BSTs that would work effectively to provide a
> persistent BST, while also maintaining a very efficient implementation
> for when persistence is not used.  They are:
>  - Treaps
>  - Red-Black Trees
>
> Treaps can be expected to outperform Red-Black Trees for any sized
> tree.  The constant in the search cost for a Treap is Approximately
> equal to AVL trees (and slightly less for large trees), and is always
> smaller than Red-Black Trees. (It's a small constant regardless.)
> Treaps are also much simpler to implement, requiring less complicated
> code. However Treaps are a randomized data structure, which some
> people try to stay away from.
>
> My proposed changes to the GTree API are the addition of the following
> functions:
>
> // returns the latest version of the tree available (starts from 0)
> guint    g_tree_current_version             (GTree *tree);
>
> // creates a new version of the tree, changes to the tree will no longer
> // affect the previous versions. returns the new current version
> guint    g_tree_next_version                (GTree *tree);
>
> For each lookup/traversal function, a new function which takes
> arguments (GTree *tree, guint version, ...) would be added, with
> existing functions becoming macros pointing to the new ones.
>
> This API allows a version to contain more than a single insert/delete,
> if desired, making them more of a logical entity.
>
> So my questions are this:
>
> 1) Is there interest in having a persistent binary search tree in the
> GLib library ?

Don't let that hold you back. As far as I can imagine I don't have any
use case ready, but that is probably because I've never had a
persistent BST at hand :-)

If I where you I'd probably just develop on out-of-tree library for
the datastructure, keeping the API aligned with glib, and making an
effort to make it easy to merge to glib should the desire arise.

Then do some lobbying for the lib in some areas where you can see use
cases. If you can get tracktion from some real world apps, the chances
of getting into glib are higher. As an alternative you could develop
the datastructure for the collections library used by Vala; libgee.
They have a much more rich collections API and are also still less
frozen API-wise.

I also have a few questions:
What is the relation to the copy-on-write B-trees like used in Btrfs?
Can I version sub-trees or is it always the entire tree?

> 2) If so, would there be resistance to using a random data strucuture,
> i.e. a Treap to implement them ?

I don't really have any opinions here - but then again I am not a GLib
developer.

-- 
Cheers,
Mikkel


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]