Re: g_str_hash
- From: Owen Taylor <otaylor redhat com>
- To: gtk-devel-list redhat com
- Subject: Re: g_str_hash
- Date: 11 Feb 2000 11:00:00 -0500
Tim Janik <timj@gtk.org> writes:
> On 11 Feb 2000, Owen Taylor wrote:
>
> > Tim Janik <timj@gtk.org> writes:
>
> > > guint
> > > g_str_hash (gconstpointer string)
> > > {
> > > register const gchar *p = string;
> > > register guint h = *p;
> > >
> > > for (p = string + 1; *p; p++)
> > > h += (h << 4) + *p;
> > >
> > > return h;
> > > }
> > >
> > > and for high-bit feedback:
>
> > I'm not sure this is a good idea: one point of glib is that people can
> > use it without having to understand things to the level of detail
> > of having to understand tradeoffs between different hash functions.
>
> that's merely a matter of documenting the intended use of the hash functions,
> e.g. we recommend using g_str_hash_long() if the average string of your
> hash table is longer than >=10 characters and tends to have a higher
> distribution in the string head than the tail.
I would hate to make that (complicated) recommendation unless I had
a good example of a collections of strings where g_str_hash_long()
produced less collisions the g_str_hash().
I think having a bunch of strings that are the same at the end
but differ only at the beginning is just a real marginal case.
And furthermore, I've seen no evidence that the Perl/Tcls function are
any worse on these strnigs; trying out some very long strings that
differ only at the beginning, the Perl/Tcl hashes give you hash
results that are widely different, while the ASU function gives you
results that differ only in a few bits.
> > The perl version effectively uses '<< 5' instead of '<< 4',
> > and also differs by having
>
> thanks for catching that, g_str_hash() should have read h += (h << 5) + *p;
>
> > return (h >> 5) + h;
> >
> > instead of 'return h'; something I don't completely understand.
>
> hm, odd indeed, unless they use powers of 2 instead of prime numbers
> for their hash table sizes ;) i don't really see how that would make
> a difference.
>
> > It would take a bit of experimentation to see how the Tcl/Perl
> > versions perform on long strings that differ only in their tail, but I
> > suspect that
> >
> > a) Inserting such strings into a hash table is rare (and
> > slow anyways)
>
> well that depends on what you mean with "such" (long) strings,
> the ASU version starts feedback after the 7th character.
But the feedback is not particularly relevant except as a way of
preventing information-loss - since we are using prime-sized bucket
arrays, all bits of the hash result affect the hash bucket selection.
And the += nature of the Perl/Tcl hash functions should do reasonably well
at preventing the loss of information from the start of the string.
Regards,
Owen
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]