Re: [Anjuta-devel] Anjuta makes GNOME i18n broken

From: "Neil Hodgson" <neilh scintilla org>
To: "Yukihiro Nakai" <ynakai redhat com>
Cc: <anjuta-devel lists sourceforge net>, <gnome-i18n gnome org>
Subject: Re: [Anjuta-devel] Anjuta makes GNOME i18n broken
Date: Mon, 13 Jan 2003 22:58:40 +1100

Yukihiro Nakai:

> It aims to support the charset which the currently selected locale
assumes,
> and there is a trick to avoid crash when there are chars which locale
cannot handle.
> mblen(3) is the locale dependent function. gtk_set_locale() is needed to
set locale
> before gtk_init() in SciTE, but gnome_init() include it so it's not in
this Anjuta patch.

   OK. I still haven't seen this work and I expect I haven't set up the
locale correctly on Linux.

> Shift JIS is the default Japanese charset in Windows/Mac documents,
> but EUC-JP is the default Japanese charset in the Red Hat Linux or modern
Linux/BSD system.
> (Solaris or other commercial Unices supports Shift JIS locale, though)
> And all should be converted to iso-2022-jp on the network, so we always
add auto-detect and
> convert code in the editor. Emacs accepeted such code after long flame.
I'm happy if you
> consider it on Scintilla.

   OK, although there is still code in the lexers that assumes only two byte
DBCS but that isn't as important.

> For Document.cxx, if Platform::IsDBCSLeadByte() can get not only 1 char
but also
> MB_CUR_MAX chars, platform dependent codes (like mblen()) will all be into
PlatGTK.cxx.
> Picking up 1 char doesn't make sence for multibyte charset. In EUC-JP,
such byte line
> can exist:
>
> A4 A2 A4 A2
>
> 'A4A2' is one EUC-JP char, but 'A2 A4' is also other correct EUC-JP char.
So,
> to check the second byte 'A2' is head or tail, you need to check in order
from
> the first byte.

   The patch didn't quite work. If you moved the caret to the end of a line
on Windows it moved between the carriage return and line feed making it
disappear. So I tweaked the loop a bit. The simplify coding there is a
Platform::DBCSCharLength that calls mblen but always returns at least 1.
BackSpace should now work on 3 (or more) byte DBCS characters. IsDBCS is no
longer in Document as LenChar covers its uses and this reduced the code
size. The DBCS state is stored in the drawing Surface and your fontset
loading code is triggered if the character set is in [SC_CHARSET_GB2312,
SC_CHARSET_HANGUL, SC_CHARSET_SHIFTJIS, SC_CHARSET_CHINESEBIG5].
The new code is available from CVS or from
http://www.scintilla.org/scite.zip Source
http://www.scintilla.org/wscite.zip Windows executable
   It would help if it could be tested to ensure EUC-JP works well.

   Neil

References:
- Anjuta makes GNOME i18n broken
  - From: Yukihiro Nakai
- Re: [Anjuta-devel] Anjuta makes GNOME i18n broken
  - From: Neil Hodgson
- Re: [Anjuta-devel] Anjuta makes GNOME i18n broken
  - From: Yukihiro Nakai

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]