Re: glib hash memory footprint



Hi all,

First of all, thank you for all the feedback. I think most of you can go on for a long time about the advantages and disadvantages of different hash implementations. But probably everybody agrees that there is not a single approach that is good for every case.

Let me discuss some of the options and give my opinion on how we could get this done. There are several options:

1. Current linked list implementation: this is simple, readable and gives good performance, but has a huge memory penalty

2. Dynamic arrays: probably better than linked list, but I don't think it is worth it. First, you still have a memory penalty for the size of the arrays of for a array terminator. Furthermore, allocating memory blocks at will might result in a very fragmented and slow to free memory.

3. Linear probing: this is what I want. It is simpler than double hashing and could be easily changed for quadric probing. Memory overhead is very small, and if you can tune your hash function and expose the g_hash_resize() call to "close" the hash, you can get very good results. By "close" I mean doing mostly (exclusively) lookup operations.

4. Double hashing: another good option that someone might want to give a try.

Since there are too many options and too many tradeoffs/scenarios, my suggestion would be to go with a polymorphic implementation using a union. So you create your hash using g_hash_table_new (linked list), g_hash_table_linear_probing_new(), g_hash_double_hashing_new(), g_hash_array_new() and so on. Then you simply feed the returned pointers to the common methods (ex: g_hash_lookup()) and let the first byte of the union demultiplex the correct call.

What I want to implement is the linear probing approach, but I can leave the hooks for those that want others approches. The advantage is that running code will not be affected at all. Do you like this idea?

Thanks,
Davi de Castro Reis

ps. I think that the problem of reserving UNUSED/UNITIALIZED values for the linear probing approach, letting NULL->NULL entries can easily be solved using and if for those reserved values.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]