glib hash memory footprint



Hi all,

I am working in a project where we do create some big string->uint hash tables (> 4 million elements). One thing I've noticed is the big memory overhead of glib's hash table.

glib's hash table uses up to 200MB to store the data, while a straight linear probing hash implementation, with the same hashing function, takes about 90MB, with the same performance. As far as I can see, the main reason behind this is that glib's hash implementation uses linked list. So my first question is:

1. Is there some reason to use linked lists instead of linear probing (putting collided elements in the next bucket)?

2. Would glib/gtk team be interested in a linear/custom probing hash implementation?

One more thing I would like to have in the API would be the ability to arbitrarily change the equality function for hash table (between calls to g_hash_table_lookup). There are some dirty tricks that can be done with this, like looking in the hash table with or without case-sensitivity, without the need of keeping two different hash tables. Do you think that a g_hash_table_set_equal_func() function could be part of the API?

Should I create a bugzilla entry for these? Another advantage of the linear probing implementation is that g_hash_table_destroy() could be made faster (current implementation is quite slow).

Thanks,
Davi de Castro Reis



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]