Re: g_str_hash




Karl Nelson <kenelson@ece.ucdavis.edu> writes:

> > 
> > Hi,
> > 
> > g_str_hash is really slow. The Tcl string hash function is twice as
> > fast, literally. Here's a test program, and it also shows you the Tcl
> > function. Any objections to the Tcl version?
> 
> Instead of switching how about just optimizing the one we have a
> bit?
> 
> guint
> copy_of_g_str_hash (gconstpointer v)
> {
>   const char *s = (char*)v;
>   const char *p;
>   guint h=0, g;
> 
>   for(p = s; *p != '\0'; p += 1) {
>     h = ( h << 4 ) + *p;
>     if ( ( g = h & 0xf0000000 ) ) {
>       h = h ^ (g >> 24);
>       h = h ^ g;
>     }
>   }
> 
>   return h /* % M */;
> }
> 
> =>
> 
> guint
> copy_of_g_str_hash (gconstpointer v)
> {
>   const char *s = (char*)v;
>   const char *p;
>   guint32 h=0;
> 
>   for(p = s; *p != '\0' && !(h&0xf0000000); p += 1) {
>     h = ( h << 4 ) + *p;
>   }
>   for(; *p != '\0'; p += 1) {
>     h = h ^ ((h >> 24)&0xf0);
>     h = ( h << 4 ) + *p;
>   }
> 
>   return h /* % M */;  
> }
> 
> This is the exact same hash except operations have been moved to
> enhance speed.  The difference being the new one keeps the upper bits 
> and thus has less collisions and operates somewhere between the TCL and 
> GLib on.  

I suspect the appeal of the ASU one is solely that it is nice and
easily analyzable. I won't believe that it creates less collisions
until somebody shows me a set of (plausible) strings where that is the case.

> Also I suggest the << 4 go to << 3 so that you don't create the case
> annomally where Aaaaaaaaaaaaa and aaaaaaAaaaaaa are equivent.  (The
> shift should be prime to the representation system.
> 
>                    -O0  -O3
> Original system:   0.54 0.31 
> Proposed system:   0.48 0.22 
> TCL      system:   0.31 0.12
> 
> Is that enough improvement?

Well, if the Tcl (or the Perl one with the << 5) are as good, why 
waste the time? And your optimization doesn't help for long strings,
which take the most time.

Regards,
                                        Owen



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]