Re: UTF-8 Functions



Darin Adler <darin bentspoon com> writes:

> On Monday, July 2, 2001, at 04:18  PM, Owen Taylor wrote:
> 
> > The question, I guess is whether it is worth adding:
> >
> > g_ut8_collate_key_casefold (), which is currently
> >
> >  g_utf8_collate_key (g_utf8_casefold (string));
> >
> > But might eventually be implemented as:
> >
> >  g_utf8_collate_key_extended (string,
> >                               G_COLLATE_SECONDARY,
> >                               G_NORMALIZE_ALL_COMPOSE);
> >
> > [ There are issues of correctness here as well as efficiency ]
> >
> > It's certainly easy enough to do ... just a few lines of code.  My
> > main hesitation is whether we know yet whether that is the right part
> > of the parameter space to give a special name.
> 
> Clear analysis as usual.
> 
> I think perhaps I want to retract my previous comment/request. If I
> understand correctly, g_utf8_collate_key (without g_utf8_casefold)
> will still typically sort strings in a way that is not unduly
> sensitive to case.
>   In other words, we get this kind of order:
> 
>      A, a, B, b
> 
> not this kind:
> 
>      A, B, a, b
> 
> If that's so, then I think it's not particularly important to add the
> case folding version. It's only needed when you want to "partly
> collate" things and put a bunch of identical items into the same
> bucket. That's not the usual case, I don't think.
> 
> It might be common to case fold and normalize and then use the
> resulting string as a key. But I can't think of a case where you'd
> want to case fold and normalize and then still want to collate in a
> locale-specific way.

Hadn't thought of this argument, and as an argument not to
increasethe amount of API, I'm receptive to it. :-)

I certainly agree that case-insensitive-collation doesn't make much
sense if you are sorting on a single key.

But, actually, it's not quite useless. I think the common case where
you'd want to do it is when you have another key that you want to be
sorted on secondary to the primary key. That is you might want:

   Location      Date          
   ========      ============
   France         5 June 1999
   usa           10 July 1999
   USA           21 Aug  1999

Not:

   Location      Date          
   ========      ============
   France         5 June 1999
   USA           21 Aug  1999
   usa           10 July 1999

Of course, at this level of sophistication, you've probably exceeded
what can be expressed conveniently in a GUI for sorting a table...

Regards,
                                        Owen




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]