Fwd: Persistent Binary Search Trees
- From: Dana Jansens <dana cg scs carleton ca>
- To: gtk-devel-list gnome org
- Subject: Fwd: Persistent Binary Search Trees
- Date: Mon, 5 Oct 2009 09:05:56 -0400
Hello,
I am interested in contributing code for persistent binary search
trees to the GLib project. This would actually count as credit
towards a grad level data structures class for me, but I have a lot of
experience working with GLib in the past, and developing C-based open
source projects. First some background on what persistence means and
why it would be nice to have available in an open source data
structures library.
A persistent binary search tree is simply a BST that is versioned over
changes, such that you can do a lookup in any previous version of the
tree. An efficient implementation of this data structure requires
O(1) additional space per insertion into/deletion from the tree. That
is, after n insertions, creating n different versions of the tree, the
total space required would be roughly 2*(space for a single version),
while allowing you to search in any version of the tree. Searching in
any version takes the same number of comparisons and pointer
redirections as a tree with only a single version.
Persistent BSTs are used for operations such as a Plane Sweep [1] in
Computational Geometry. At the same time, the code behind the
persistence can be generalized to provide an additional data structure
that performs Fractional Cascading [2] for no extra cost to the BST
implementation.
There are currently no readily available open source implementations
of a persistent BST. This would be a great tool for students,
academic researchers, and other developers.
I have noticed that GLib has a Binary Search Tree data structure,
which currently uses an AVL tree. AVL trees can be made persistent,
but the space requirement becomes O(log n) per insert/delete instead
of O(1).
There are 2 other BSTs that would work effectively to provide a
persistent BST, while also maintaining a very efficient implementation
for when persistence is not used. They are:
- Treaps
- Red-Black Trees
Treaps can be expected to outperform Red-Black Trees for any sized
tree. The constant in the search cost for a Treap is Approximately
equal to AVL trees (and slightly less for large trees), and is always
smaller than Red-Black Trees. (It's a small constant regardless.)
Treaps are also much simpler to implement, requiring less complicated
code. However Treaps are a randomized data structure, which some
people try to stay away from.
My proposed changes to the GTree API are the addition of the following
functions:
// returns the latest version of the tree available (starts from 0)
guint g_tree_current_version (GTree *tree);
// creates a new version of the tree, changes to the tree will no longer
// affect the previous versions. returns the new current version
guint g_tree_next_version (GTree *tree);
For each lookup/traversal function, a new function which takes
arguments (GTree *tree, guint version, ...) would be added, with
existing functions becoming macros pointing to the new ones.
This API allows a version to contain more than a single insert/delete,
if desired, making them more of a logical entity.
So my questions are this:
1) Is there interest in having a persistent binary search tree in the
GLib library ?
2) If so, would there be resistance to using a random data strucuture,
i.e. a Treap to implement them ?
Thanks very much,
Dana Jansens
p.s. I sent this before subscribing, but its been stuck in the
moderator queue for more than a week now, so I'm sending again.
Apologies if you see this twice.
[1] http://en.wikipedia.org/wiki/Sweep_line_algorithm
[2] http://en.wikipedia.org/wiki/Fractional_cascading
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]