Re: One key stroke --> two code-points

From: Simos Xenitellis <simos lists googlemail com>
To: Clytie Siddall <clytie riverland net au>
Cc: gnome-i18n gnome org
Subject: Re: One key stroke --> two code-points
Date: Sat, 14 Jun 2008 16:25:07 +0100

O/H Clytie Siddall έγραψε:

Just checking: so this problem does not affect languages usingprecomposed Unicode?
Vietnamese users _should_ be using precomposed forms for our added andcombined diacritics. But I wonder if we should be ready for the factthat they might not. I was using a keyboard layout for a while whichwas decomposed, and I didn't know it. That could happen to others, too.

With precomposed characters, the compose sequences look like

<dead_key_No1> <dead_key_No2> <Letter_A>  --->  single codepoint

Producing a single codepoint is well defined, and has been availablefrom the start.


When no precomposed forms exist, then

<dead_key_No1> <dead_key_No2> <Letter_A>  --->  codepointA, codepointB

This was not used in the X.Org Compose file (the Khmer composesequences, first such sequences,

were added to X.Org just a few days back).

One thing I do not know about the Vietnamese written language is,

are there characters (with combined diacritics) that no correspondingprecomposed forms exist?That is, do characters exist that you cannot type them using the typicaldead keys?

However, if there is a need for decomposed forms anyway, it is good knowabout it.

For Vietnamese, it is important to look at the xkeyboard-config projectand check

what does default layout do, and that it is a reasonable choice.

Simos

Clytie

On 10/06/2008, at 2:35 PM, Anousak Souphavanh wrote:
Thanks, Simos for your kind and time.

Much appreciated to Javier for brought a good solution indeed.

Lao input method  is need a similar solution. Javier please post your
solution (where and how to define a new table for Khmer) so I can
define these code points for Lao.
On Tue, Jun 10, 2008 at 1:58 AM, Simos Xenitellis
<simos lists googlemail com> wrote:
O/H Javier SOLA έγραψε:
Thanks Simos !!

Actually, we have had these additions for a while in X11.
Hi Javier,

Checking at
http://gitweb.freedesktop.org/?p=xorg/lib/libX11.git;a=tree;f=nls/en_US.UTF-8does not show these lines at the end. It is possible that thesecompose
sequences were added as a patch to the distribution package.
We will  do an issue for GTK+, and use the variable meanwhile.

What file is it in GTK+? I have not been able to find it.
In GTK+ (HEAD), the relevant file is
http://svn.gnome.org/viewvc/gtk%2B/trunk/gtk/gtkimcontextsimple.c?view=markup
However, your case of compose sequences is different from the existing
compose sequences, that result to a single codepoint (you requireto produce
two codepoints).
Therefore, the type of support you are looking for is similar tocompose
sequences that result to letter+diacritic mark. Several languages have
characters that no pre-composed letters exist, so the composesequenceproduces letter+diacritic marks (more than one codepoint). Suchsupport is
missing, and there are already bug reports for them.

Bug 341341 – Compose mechanism in simple input method doesn't support
decomposed forms
http://bugzilla.gnome.org/show_bug.cgi?id=341341

Bug 345254 – dead accents should at least produce combining characters
http://bugzilla.gnome.org/show_bug.cgi?id=345254

There is a shortcut when trying to solve the above cases of compose
sequences, thus the solution I expect to be different from theKhmer compose
sequences.
Specifically, for the Latin compose sequences, such as (it's a made up
example)

<dead_acute> <t> : "t́" # LETTER T WITH ACUTE

one could convert to something like    [ dead_acute, 't', 0].
We would put 0 for the resulting codepoint because we can deducefor thiscategory of compose sequences that the actual codepoints are 't'and 'acute'
(the resulting codepoints match the body of the compose sequence).
However, for the case of Khmer, the compose sequences lookindependent from
the resulting code points. Therefore, a new table should be required.

To cut the story short, I have filed a bug report for this,
Bug 537457 – Support compose sequences that produce two+ codepoints
http://bugzilla.gnome.org/show_bug.cgi?id=537457

Simos
Thanks,

Javier

Simos Xenitellis wrote
O/H Javier SOLA έγραψε:
Hi,

I am working on Khmer localization (KhmerOS project).
In Khmer, some of the basic vowels (which we include in thekeyboard)require two code-points, so one keystroke must generate two codepoints.
It used to be that we could do the conversion in KBX bygenerating afictious code-point (Pablo Saratxaga explained this to us a fewyears ago),which was later translated to two real code-points by puting theconversion
in the en-US locale file. I did work at the time.
But now this seems to have stopped working. Does anybody knowshow we
can fix this?
These additions (pressing a single key and producing twocodepoints), are
located at
/usr/share/X11/locale/en_US.UTF-8/Compose
The specific lines appear to be

# Khmer digraphs
# A keystroke has to generate several characters, so they aredefined
# in this file

<U17fb>    :   "ុះ"
<U17fc>    :   "ុំ"
<U17fd>    :   "េះ"
<U17fe>    :   "ោះ"
<U17ff>    :   "ាំ"
GTK+ based applications duplicate the Compose file in the gtk+library,and currently the version of the Compose file that exists in gtk+does not
include those specific compose sequences.
I think these are a recent addition.
Technically, it is possible for gtk+ to include compose sequencesthatproduce more than one code points (requires small change in thecode),however these recent Khmer digraphs are the only composesequences using the
facility now.
To cut the long story short, you can bypass for now the GTK+version ofthe Compose file and use the Compose file that comes with X.Org(shown
above) by setting the environment variable GTK_IM_MODULE to "xim".
This should not have adverse effect to the OLPC software.
It is important that if other keyboard layouts as well requirecompose
sequences that produce
two or more codepoints (such as Serbian), to add them to the XOrgCompose
file. In the next iteration of update of the GTK+, all these compose
sequences can make it in.

Simos

Follow-Ups:
- Re: One key stroke --> two code-points
  - From: Clytie Siddall

References:
- One key stroke --> two code-points
  - From: Javier SOLA
- Re: One key stroke --> two code-points
  - From: Simos Xenitellis
- Re: One key stroke --> two code-points
  - From: Javier SOLA
- Re: One key stroke --> two code-points
  - From: Simos Xenitellis
- Re: One key stroke --> two code-points
  - From: Anousak Souphavanh
- Re: One key stroke --> two code-points
  - From: Anousak Souphavanh
- Re: One key stroke --> two code-points
  - From: Clytie Siddall

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]