Proposal for a new ghash table



Hi everibody.
 
I've just looked at the code of the current ghash
(i was wondering what was the underlying code
for the hash table resizing). Here is another
approach for a dynamic hash table, based upon
an article or Per-Ake (Paul) Larson (can't find the
reference of this article). I will code it in the
next days, and will post results here.
 
The Larson's algorithm does not introduce any
freeze/thaw statements. It increases the size
of the hash table in order to limit the size of the
hash node list. Basically, it starts with a 2^n
size. When the number of hash node is bigger
than its current size, the size of the hash table
grows to 2^(n+1). This happens on the insertion
of a new hash node. then it computes the new
hash key and insert the node at the right place.
 
The key problem is the hash table lookup.
Since the size of the table grows and the
since the nodes are not moved in the table
(well... you can move the nodes too, see later),
it is possible to not find the node on the very
first look up. Consider a table with a base size
of 4 that has gown to have have a size of 32.
The hash val of the first element you have added
was 123456. In a table of 4 nodes, the resulting
hash val is 0. In a table of 32 nodes, the resulting
hash val is 8. In order to find the correct key for
a particular node, you'll have to
    1) compute the current hash value V from the
        key K (store it as V1)
    2) if the node is found, return it.
    2) if not, find the greater n value defined by
       2^n < V
    4) compute the new K value
    5) go to step 2
Before returning, a good thing should be to move
the node to position V1 if V != V1. The next time
you'll want to access to the node, the look up
will be faster.
 
Of course, an extra thaw function can be provided
to reorganize the entire hash table for faster
look up when all the elements have been added
to the table...
 
The table is not collapsed when the number of
node decreases.
 
[the following assumes that the base size is 1 -
this is only assumed to show the algorithmes]
 
In order to minimize the allocations, the table is
not entirely reallocated when its size changes.
you simply use an array of 32 tables (looks like)
 
g_hash_node **hash_tables[32];
 
The insertion code looks like (pseudo-code):
 
node = create node(key, value);
if (nbnodes > current size) {
  current_array++;
  hash_tables[current_array] =
    allocate new table(current_size);
  current_size *= 2;
}
hash_val = compute_val(node, key) % current_size;
cur_size = current_size / 2;
cur_array = current_array;
while (cur_size > hash_val) {
  cur_array--;
  cur_size /= 2;
}
add_node_to_array(node,
                            hash_tables[cur_array],
                            hash_val - cur_size);
 
The look up code looks like :
 
node = NULL;
cur_loop = 0;
cur_size = curent_size;
V = V1 = compute_val(node, key) % current_size;
while (cur_loop <= current_array) {
  if (node is found at pos V) break;
  n = (int)log2(V);
  cur_size = 2^n;
  V = compute_val(node, key) % cur_size;
  cur_loop++;
}
if (node) {
  if (V1 != V) move_node_from_to(node, V, V1);
}
return (node);
 
To be found at position V in the whole table, a
node should be at pos (V - cur_size) in table
hash_tables[log2(cur_size)].
 
Hope this is of interest.
 
Yours,
 
Emmanuel


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]