Re: glib hash memory footprint



 --- Davi de Castro Reis <davicastro terra com br> wrote: 
> Hi all,
> 
> First of all, thank you for all the feedback. I think most of you can go 
> on for a long time about the advantages and disadvantages of different 
> hash implementations. But probably everybody agrees that there is not a 
> single approach that is good for every case.
> 

Yes. There never will be either. 

> Let me discuss some of the options and give my opinion on how we could 
> get this done. There are several options:
> 
> 1. Current linked list implementation: this is simple, readable and 
> gives good performance, but has a huge memory penalty
> 

s/good/reasonable/ - hence its good for a generic implementation esp one from
which users don't demand much. Last time the hash implementation came up it 
was because of security, the decison was that for security sensitive etc things
everybody should be using something else anyways.

> 2. Dynamic arrays: probably better than linked list, but I don't think 
> it is worth it. First, you still have a memory penalty for the size of 
> the arrays of for a array terminator. Furthermore, allocating memory 
> blocks at will might result in a very fragmented and slow to free memory.
> 

Dynamic arrays are not the only way to speed up chains. And one would need to 
establish that there is a problem worth tackling first (which can be way harder
than it might seem).

> 3. Linear probing: this is what I want. It is simpler than double 
> hashing and could be easily changed for quadric probing. Memory overhead 
> is very small, and if you can tune your hash function and expose the 
> g_hash_resize() call to "close" the hash, you can get very good results. 
> By "close" I mean doing mostly (exclusively) lookup operations.
> 

Yes, and you can simply write an implementation that would be easy to drop 
in to replace the gtk one and use in your own app.

> 4. Double hashing: another good option that someone might want to give a 
> try.
> 
> Since there are too many options and too many tradeoffs/scenarios, my 
> suggestion would be to go with a polymorphic implementation using a 
> union. So you create your hash using g_hash_table_new (linked list), 
> g_hash_table_linear_probing_new(), g_hash_double_hashing_new(), 
> g_hash_array_new() and so on. Then you simply feed the returned pointers 
> to the common methods (ex: g_hash_lookup()) and let the first byte of 
> the union demultiplex the correct call.
> 

The question is one of - does it (for some suitable definition of it) belong in
glib? One can argue both ways but its not something thats immediately obvious.
Simply including lots of hashing functions in glib just because somebody might 
need them is probably the wrong thing though.

> What I want to implement is the linear probing approach, but I can leave 
> the hooks for those that want others approches. The advantage is that 
> running code will not be affected at all. Do you like this idea?
> 
> Thanks,
> Davi de Castro Reis
> 
> ps. I think that the problem of reserving UNUSED/UNITIALIZED values for 
> the linear probing approach, letting NULL->NULL entries can easily be 
> solved using and if for those reserved values.


=====
Open Source - the religion of doing it right


	
	
		
___________________________________________________________ALL-NEW Yahoo! Messenger - all new features - even more fun!  http://uk.messenger.yahoo.com



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]