Re: g_str_hash

From: Owen Taylor <otaylor redhat com>
To: gtk-devel-list redhat com
Subject: Re: g_str_hash
Date: 11 Feb 2000 11:00:00 -0500

Tim Janik <timj@gtk.org> writes:

> On 11 Feb 2000, Owen Taylor wrote:
> 
> > Tim Janik <timj@gtk.org> writes:
> 
> > > guint
> > > g_str_hash (gconstpointer string)
> > > {
> > >   register const gchar *p = string;
> > >   register guint h = *p;
> > > 
> > >   for (p = string + 1; *p; p++)
> > >     h += (h << 4) + *p;
> > > 
> > >   return h;
> > > }
> > > 
> > > and for high-bit feedback:
> 
> > I'm not sure this is a good idea: one point of glib is that people can
> > use it without having to understand things to the level of detail
> > of having to understand tradeoffs between different hash functions.
> 
> that's merely a matter of documenting the intended use of the hash functions,
> e.g. we recommend using g_str_hash_long() if the average string of your
> hash table is longer than >=10 characters and tends to have a higher
> distribution in the string head than the tail.

I would hate to make that (complicated) recommendation unless I had
a good example of a collections of strings where g_str_hash_long() 
produced less collisions the g_str_hash().

I think having a bunch of strings that are the same at the end
but differ only at the beginning is just a real marginal case.

And furthermore, I've seen no evidence that the Perl/Tcls function are
any worse on these strnigs; trying out some very long strings that
differ only at the beginning, the Perl/Tcl hashes give you hash
results that are widely different, while the ASU function gives you
results that differ only in a few bits.

> > The perl version effectively uses '<< 5' instead of '<< 4',
> > and also differs by having 
> 
> thanks for catching that, g_str_hash() should have read h += (h << 5) + *p;
> 
> >  return (h >> 5) + h;
> > 
> > instead of 'return h'; something I don't completely understand.
> 
> hm, odd indeed, unless they use powers of 2 instead of prime numbers
> for their hash table sizes ;) i don't really see how that would make
> a difference.
> 
> > It would take a bit of experimentation to see how the Tcl/Perl
> > versions perform on long strings that differ only in their tail, but I
> > suspect that
> > 
> >  a) Inserting such strings into a hash table is rare (and
> >     slow anyways)
> 
> well that depends on what you mean with "such" (long) strings,
> the ASU version starts feedback after the 7th character.

But the feedback is not particularly relevant except as a way of
preventing information-loss - since we are using prime-sized bucket
arrays, all bits of the hash result affect the hash bucket selection.
And the += nature of the Perl/Tcl hash functions should do reasonably well
at preventing the loss of information from the start of the string.

Regards,
                                        Owen
References:
- Re: g_str_hash
  - From: Tim Janik
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]