Re: glib hash memory footprint



On Sun, 2004-08-29 at 16:46, Davi de Castro Reis wrote:
> Hi all,
> 
> I am working in a project where we do create some big string->uint hash 
> tables (> 4 million elements). One thing I've noticed is the big memory 
> overhead of glib's hash table.
> 
> glib's hash table uses up to 200MB to store the data, while a straight 
> linear probing hash implementation, with the same hashing function, 
> takes about 90MB, with the same performance. As far as I can see, the 
> main reason behind this is that glib's hash implementation uses linked 
> list. So my first question is:
> 
> 1. Is there some reason to use linked lists instead of linear probing 
> (putting collided elements in the next bucket)?

One problem with linear probing is that the number of hash buckets needs
to be greater than the number of elements. How does one know, in the
general case, how many buckets will be necessary? In your case, it would
be some 4 million+, while someone else may need much less, or perhaps
more!

Also, linear probing has a hideous worst-case. The nominal-case
collision can be mitigated somewhat by using instead quadratic probing
or double hashing to avoid clustering, but the worst-case remains very
bad.

If table size is really a problem (and I agree that it can be, as for
example, in your case), wouldn't switching from linked lists to dynamic
arrays be a better solution in the general case?

> 2. Would glib/gtk team be interested in a linear/custom probing hash 
> implementation?
> 
> One more thing I would like to have in the API would be the ability to 
> arbitrarily change the equality function for hash table (between calls 
> to g_hash_table_lookup). There are some dirty tricks that can be done 
> with this, like looking in the hash table with or without 
> case-sensitivity, without the need of keeping two different hash tables. 
>   Do you think that a g_hash_table_set_equal_func() function could be 
> part of the API?
> 
> Should I create a bugzilla entry for these? Another advantage of the 
> linear probing implementation is that g_hash_table_destroy() could be 
> made faster (current implementation is quite slow).

-- 
Alan M. Evans <ame1 extratech com>




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]