Hi everibody.
I've just looked at the code of the current
ghash
(i was wondering what was the underlying
code
for the hash table resizing). Here is
another
approach for a dynamic hash table, based
upon
an article or Per-Ake (Paul) Larson (can't find the
reference of this article). I will code it in
the
next days, and will post results here.
The Larson's algorithm does not introduce
any
freeze/thaw statements. It increases the
size
of the hash table in order to limit the size of
the
hash node list. Basically, it starts with a
2^n
size. When the number of hash node is
bigger
than its current size, the size of the hash
table
grows to 2^(n+1). This happens on the
insertion
of a new hash node. then it computes the new
hash key and insert the node at the right
place.
The key problem is the hash table
lookup.
Since the size of the table grows and
the
since the nodes are not moved in the
table
(well... you can move the nodes too, see later),
it is possible to not find the node on the
very
first look up. Consider a table with a base
size
of 4 that has gown to have have a size of
32.
The hash val of the first element you have
added
was 123456. In a table of 4 nodes, the
resulting
hash val is 0. In a table of 32 nodes, the
resulting
hash val is 8. In order to find the correct key
for
a particular node, you'll have to
1) compute the current hash
value V from the
key K
(store it as V1)
2) if the node is found, return
it.
2) if not, find the greater n
value defined by
2^n
< V
4) compute the new K
value
5) go to step 2
Before returning, a good thing should be to
move
the node to position V1 if V != V1. The next
time
you'll want to access to the node, the look
up
will be faster.
Of course, an extra thaw function can be
provided
to reorganize the entire hash table for
faster
look up when all the elements have been
added
to the table...
The table is not collapsed when the number
of
node decreases.
[the following assumes that the base size is 1
-
this is only assumed to show the
algorithmes]
In order to minimize the allocations, the table
is
not entirely reallocated when its size
changes.
you simply use an array of 32 tables (looks
like)
g_hash_node **hash_tables[32];
The insertion code looks like
(pseudo-code):
node = create node(key, value);
if (nbnodes > current size) {
current_array++;
hash_tables[current_array] =
allocate new
table(current_size);
current_size *= 2;
}
hash_val = compute_val(node, key) %
current_size;
cur_size = current_size / 2;
cur_array = current_array;
while (cur_size > hash_val) {
cur_array--;
cur_size /= 2;
}
add_node_to_array(node,
hash_tables[cur_array],
hash_val
- cur_size);
The look up code looks like :
node = NULL;
cur_loop = 0;
cur_size = curent_size;
V = V1 = compute_val(node, key) %
current_size;
while (cur_loop <= current_array) {
if (node is found at pos
V) break;
n = (int)log2(V);
cur_size = 2^n;
V = compute_val(node, key) %
cur_size;
cur_loop++;
}
if (node)
{
if (V1 != V) move_node_from_to(node, V,
V1);
}
return (node);
To be found at position V in the whole table,
a
node should be at pos (V - cur_size) in
table
hash_tables[log2(cur_size)].
Hope this is of interest.
Yours,
Emmanuel
|