Multibyte improvement - g_unichar_to_utf8()




Hello,

g_unichar_to_utf8() function in print-cell.c is broken in multibyte
environment because it doesn't any multibyte handling.

I wrote a new g_unichar_to_utf8() func that can handle multibytes
correctly in multibyte environments with gint32.
In EUC-JP, 2 byte is normal but some other codeset uses 4 bytes.

I used #ifdef linux macro because in *BSDs don't have
nl_langinfo() func. It's a good idea to use improved funcs only with
linux or other OS with POSIX i18n frameworks implemented, but I also
suggest another idea to add nl_langinfo() with gnumeric just for *BSD.
I'll append them with the next mail.

In this example, I use 'ABC' in EUC-JP multibyte and 'ABC' in ASCII.
Below it the char codes for your sake:

   |  ASCII(UTF-8)       EUC-JP   UTF-8(multibyte)
---+----------------------------------------------
A  |          0x41    0xa3 0xc1    0xef 0xbc 0xa1
B  |          0x42    0xa3 0xc1    0xef 0xbc 0xa2
C  |          0x43    0xa3 0xc1    0xef 0xbc 0xa3


Thank you.
---
Yukihiro Nakai, Red Hat Japan, Development

Attachment: Makefile
Description: Binary data

Attachment: main.c
Description: Binary data



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]