Re: UTF-8 Functions



Steve Underwood <steveu coppice org> writes:

> > /**
> >  * g_utf8_collate:
> >  * @str1: a UTF-8 encoded string
> >  * @str2: a UTF-8 encoded string
> >  *
> >  * Compares two strings for ordering using the linguistically
> >  * correct rules for the current locale. When sorting a large
> >  * number of strings, it will be significantly faster to
> >  * obtain collation keys with g_utf8_collate_key() and
> >  * compare the keys with strcmp() when sorting instead of
> >  * sorting the original strings.
> >  *
> >  * Return value: -1 if str1 compares before str2, 0 if they
> >  *   compare equal, 1 if str1 compares after str2.
> >  **/
> > gint g_utf8_collate (const gchar *str1, const gchar *str2);
> > 
> > /**
> >  * g_utf8_collate_key:
> >  * @str: a UTF-8 encoded string.
> >  *
> >  * Converts a string into a collation key that can be compared
> >  * with other collation keys using strcmp(). The results of
> >  * comparing the collation keys of two strings with strcmp()
> >  * will always be the same as comparing the two original
> >  * keys with g_utf8_collate().
> >  *
> >  * Return value: a newly allocated string. This string should
> >  *   be freed with g_free when you are done with it.
> >  **/
> > gchar *g_utf8_collate_key (const gchar *str);
> 
> Hi Owen,
> 
> Would it be practical to allow an extra parameter in the collate related
> calls to choose between optional collate sequences (of course the
> parameter is practical, but is all the material that goes behind it
> available)? In East Asian languages many collate sequences exist -
> phonetic, stroke, radical, etc. It's doesn't seem very practical to
> switch these by locale, as you often want to switch on the fly when
> someone hits the "sort by radical/phonetics/strokes" button.

We aren't going to have this sort of facility implemented in the short
term - right now collate() and collate_key() are implemented
in terms of [wcs/str]coll / [wcs/str]xfrm. 

Since until we actually have options, we don't need an API to
control them, I think it is best to hold off on adding such
an API until we actually have a better idea what kind of options
we'll have.

(I think if you have options, that a "Collator" object like 
ICU/Java probably makes sense - it allows for future expansion
in the set of options in a compatible fashion.)

Also, for g_utf8_collate(), I think it is nice to have a two
argument function that can be used as the callback for qsort()
or g_list_sort().

Regards,
                                        Owen





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]