Re: Multibyte improvement - g_unichar_to_utf8()
- From: Jody Goldberg <jgoldberg home com>
- To: Yukihiro Nakai <ynakai redhat com>
- Cc: gnumeric-list gnome org, havill redhat com
- Subject: Re: Multibyte improvement - g_unichar_to_utf8()
- Date: Mon, 1 Jan 2001 17:09:50 -0500
On Sun, Dec 31, 2000 at 07:06:21AM +0900, Yukihiro Nakai wrote:
g_unichar_to_utf8() function in print-cell.c is broken in multibyte
environment because it doesn't any multibyte handling.
I wrote a new g_unichar_to_utf8() func that can handle multibytes
correctly in multibyte environments with gint32.
In EUC-JP, 2 byte is normal but some other codeset uses 4 bytes.
The comment in src/print-cell.c says
    'This is cut & pasted from glib 1.3'
If your replacement is better it should go into glib-1.3 and
gnumeric.  I'll wait for someone more experienced in these details
to make this decision.
I used #ifdef linux macro because in *BSDs don't have
I'd prefer to see #ifdef HAVE_LANGINFO_H than #ifdef linux
In this example, I use 'ABC' in EUC-JP multibyte and 'ABC' in ASCII.
Below it the char codes for your sake:
   |  ASCII(UTF-8)       EUC-JP   UTF-8(multibyte)
---+----------------------------------------------
A  |          0x41    0xa3 0xc1    0xef 0xbc 0xa1
B  |          0x42    0xa3 0xc1    0xef 0xbc 0xa2
C  |          0x43    0xa3 0xc1    0xef 0xbc 0xa3
This confuses me.
1) It seems as if A == B == C in the EUC-JP case.
2) where can I find some documentation on UTF-8 vs UTF-8(multibyte) ?
Thanks
    Jody
[
Date Prev][
Date Next]   [
Thread Prev][
Thread Next]   
[
Thread Index]
[
Date Index]
[
Author Index]