Re: normalized strings in searches, completion, etc.

From: "Denis Jacquerye" <moyogo gmail com>
To: "Sven Neumann" <sven gimp org>
Cc: gnome-devel-list gnome org, GNOME I18N List <gnome-i18n gnome org>
Subject: Re: normalized strings in searches, completion, etc.
Date: Fri, 23 Mar 2007 09:46:12 +0100

On 3/23/07, Denis Jacquerye <moyogo gmail com> wrote:

On 3/23/07, Sven Neumann <sven gimp org> wrote:
> Hi,
>
> On Fri, 2007-03-23 at 03:25 +0100, Denis Jacquerye wrote:
>
> > I'm sure there are tones of places where this doesn't work and some
> > where it does. But it should work everywhere someone does a search or
> > compares strings unless in some specific cases. What's the best way of
> > tackling the issue?
>
> It should work if all places where strings all compared would use
> g_utf8_collate(). I am surprised that this doesn't seem to be the case.
> Perhaps it's an issue that is often overlooked as many developers are
> not aware of the pitfalls of working with Unicode texts.

g_utf8_collate() uses G_NORMALIZE_ALL_COMPOSE = G_NORMALIZE_NFKC so it
will find ² and 2 equivalent. Should that be the default for all
searches?

Which is better? Using g_utf8_collate() instead of strcmp() or a
combination of g_utf8_normalize() and then strcmp()?
If g_utf8_normalize() is used, which normalization should be used?

I'm now guessing it should be G_NORMALIZE_NFC =
G_NORMALIZE_DEFAULT_COMPOSE in most cases because this will match
canonically equivalent strings (eg. é and é equivalent) but not
compatibility ones (eg. ² and 2 different). It will also not partially
match things like "Bise" with "Bisé" where the combining diacritic is
at the end of the string.


Actually, I take that back. Partial match would be inconsistent with
precomposed and those that can't be, eg. "bise" would not match "bisé"
but "bisɛ" would match "bisɛ́". So unless there's a better option,
G_NORMALIZE_NFD = G_NORMALIZE_DEFAULT should be used.

I'm also guessing g_utf8_collate() is more appropriate for sorting
than for searching.

Follow-Ups:
- Re: normalized strings in searches, completion, etc.
  - From: Denis Jacquerye

References:
- normalized strings in searches, completion, etc.
  - From: Denis Jacquerye
- Re: normalized strings in searches, completion, etc.
  - From: Sven Neumann
- Re: normalized strings in searches, completion, etc.
  - From: Denis Jacquerye

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]