Re: Proposal for a new ghash table



On Fri, 26 Nov 1999, Emmanuel DELOGET wrote:

> Date: Fri, 26 Nov 1999 12:54:12 +0100
> From: Emmanuel DELOGET <logout@free.fr>
> Reply-To: gtk-devel-list@redhat.com
> To: gtk-devel-list@redhat.com
> Subject: Proposal for a new ghash table
> Resent-Date: 26 Nov 1999 11:49:44 -0000
> Resent-From: gtk-devel-list@redhat.com
> Resent-cc: recipient list not shown: ;
> 
> Hi everibody.
> 
> I've just looked at the code of the current ghash
> (i was wondering what was the underlying code
> for the hash table resizing). Here is another
> approach for a dynamic hash table, based upon
> an article or Per-Ake (Paul) Larson (can't find the 
> reference of this article). I will code it in the
> next days, and will post results here.
> 
> The Larson's algorithm does not introduce any
> freeze/thaw statements. It increases the size
> of the hash table in order to limit the size of the
> hash node list. Basically, it starts with a 2^n
> size. When the number of hash node is bigger
> than its current size, the size of the hash table
> grows to 2^(n+1).

I have an implementation of this in Kazlib, minus
the refinements described below.
http://users.footprints.net/~kaz/kazlib.html The license is BSD-like, meaning
that it can readily be used in GPLed code.

> a particular node, you'll have to
>     1) compute the current hash value V from the 
>         key K (store it as V1)
>     2) if the node is found, return it.
>     2) if not, find the greater n value defined by
>        2^n < V
>     4) compute the new K value
>     5) go to step 2


> Before returning, a good thing should be to move
> the node to position V1 if V != V1. The next time
> you'll want to access to the node, the look up
> will be faster. 

This is an interesting refinement that I do not have in the aforementioned
library (but which could be easily added); the grow is done synchronously
rather than distributed in this fine-grained manner. 

This is obviously better when it's undesirable to have wide variations in
response time; that is, no particular insertion should eat the sudden cost of a
resizing.

The only problem I see, the way I understand it, is that unsuccessful lookups
are now O(log n), so this neat algorithm is paid for by an order increase in
one of the hash operations. 

Thus I'd be feeling ambivalent about adding this to general-purpose, reusable
hash implementation, for which I can't anticipate the frequency of unsuccessful
lookups. 



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]