Character normalization ?
- From: Daniel Veillard <veillard redhat com>
- To: otaylor redhat com
- Cc: gnome-hackers gnome org
- Subject: Character normalization ?
- Date: Mon, 25 Mar 2002 16:03:37 -0500
On Mon, Mar 25, 2002 at 03:50:39PM -0500, Gnome CVS User wrote:
> Log message:
> Mon Mar 25 15:46:54 2002 Owen Taylor <otaylor redhat com>
>
> * modules/basic/basic-*.c: Convert U+00A0 (NON BREAK SPACE)
> to U+0020 (SPACE)
Hum, by the way, now that we have a decent internationalized
framework, one of the annoyances of Unicode is character normalization,
i.e. remapping sometimes sequences of Unicode chars to a single one.
The I18N working group at W3C is pushing hard for "early" normalization
[1] i.e. make sure that most of the APIs see only Normalized Content.
Can you tell me/us a bit on this issue ? Is there anything in place,
should we make any decision about this ? This can affect a number of things
like string searches and compare which otherwise are real pain.
Daniel
[1] http://www.w3.org/TR/charmod/#IDAAC0R
--
Daniel Veillard | Red Hat Network https://rhn.redhat.com/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]