Re: [Anjuta-devel] Anjuta makes GNOME i18n broken




Hi,

On Mon, 6 Jan 2003 22:57:22 +1100
"Neil Hodgson" <neilh@scintilla.org> wrote:

>   Hi,
> 
> Yukihiro Nakai:
> 
> > - Scintilla has some double byte codes but it's not enough.
> 
>    It looks like the requirement here is to support mixed size character
> sets such as Shift JIS. I guess from the look of the code that there is no
> explicit locale being passed in to the code, instead, the current locale of
> the process is deciding how the gdk_mbstowcs and similar functions behave.

It aims to support the charset which the currently selected locale assumes,
and there is a trick to avoid crash when there are chars which locale cannot handle.
mblen(3) is the locale dependent function. gtk_set_locale() is needed to set locale
before gtk_init() in SciTE, but gnome_init() include it so it's not in this Anjuta patch.
See the attached scite.diff. We should not use non-ascii chars in GNOME apps code, those
should be in .po files for each languages... But scintilla is not GNOME apps, I know.

Shift JIS is the default Japanese charset in Windows/Mac documents,
but EUC-JP is the default Japanese charset in the Red Hat Linux or modern Linux/BSD system.
(Solaris or other commercial Unices supports Shift JIS locale, though)
And all should be converted to iso-2022-jp on the network, so we always add auto-detect and
convert code in the editor. Emacs accepeted such code after long flame. I'm happy if you
consider it on Scintilla.

>    Is gdk_fontset_load completely equivalent to gdk_font_load? Are there
> performance implications of gdk_fontset_load over gdk_font_load? When
> enabling Unicode support, there is a large slowdown although I haven't
> tested whether this slowdown is caused by the specific call or the extra
> work involved caused by multiple fonts in a fontset.

There is no such slowdown for CJK locales. Slowdown in Unicode is supposed to be
by the big font file. If you want to use both loading funcs, it should be separated
by charset, multibyte or not. Englightenment also assumes the comma separated fontname
is multibyte charset, but it's broken.

> > - I don't know how the resource 'code.page' should be used
> > for  the locale behavior,
> >   so it is set as 1 in the patch.
> 
>    These are platform defined integer cookies that on Windows are the system
> defined CP_* constants. Apart from the common values 0 (default) and 65001
> used for Unicode, these can be defined to be anything on GTK+. If the actual
> locale is transmitted through the environment rather than an explicit
> parameter then just setting it to 1 will be fine.
> 
>    I'm reasonably happy with the changes to PlatGTK.cxx, although the case
> you comment as 'annoying' is a bit ugly. The changes to Document.cxx will
> require some more #ifdefing to only occur for GTK+ (the core Scintilla code
> runs on 5 graphics toolkits) or new calls added to the platform layer.

I believe ScintillaGTK.cxx changes are also acceptable for you. Those fix preedit area
positioning behavior of XIM.

For Document.cxx, if Platform::IsDBCSLeadByte() can get not only 1 char but also
MB_CUR_MAX chars, platform dependent codes (like mblen()) will all be into PlatGTK.cxx.
Picking up 1 char doesn't make sence for multibyte charset. In EUC-JP, such byte line
can exist:

A4 A2 A4 A2

'A4A2' is one EUC-JP char, but 'A2 A4' is also other correct EUC-JP char. So,
to check the second byte 'A2' is head or tail, you need to check in order from
the first byte.

Thanks.
--
Yukihiro Nakai, Red Hat Japan, Development

scite.diff



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]