Re: Future gtk-text widget




Dov Grobgeld <dov@pixel.weizmann.ac.il> writes:

> I did some brainstorming on what I would like to see in a future
> gtk text widget. I think that now is a good time to discuss these
> issues before coding actuall starts. (Or did you start already, Owen?).

No, I'm just finding time now to look at the lower level
issue of creating API's for generic handling/shaping/line-breaking/
etc. of unicode text.
 
> Personally I'm especially interested in working on Bi-Directional 
> support of the widget.

 
> Notes about an ideal text widget
> ================================

First, I'll note that in the short term, I'm not really 
that interested in a all-singing, all-dancing word-processor
widget. While things like embedded subwidgets might be cool,
my priorities are a) stability b) really, really good 
internationalization support. (Unicode, RTL and a good enough
framework to handle thing complex text languages like
Thai or Hindi)

Of course, I know that other people's priorities will vary.

> * The internal representation of the text widget is Unicode.
> 
> * It should be possible to toggle the display of "invisible" unicode
>   characters - e.g. zero-width-space, left-to-right-mark, left-to-right-
>   override, etc..
> 
> * The widget should support all BiDirectional aspects, including 
>   display, cursor movement, and selections. 
> 
> * The text widget should allow embedding of external objects, especially
>   images but more generally any gtk widget. The alignment of any embedded
>   object should be specifiable.
> 
> * Input of unicode characters is taken care of by X.

I'm not sure how much we should or can rely on X for this.
Although X in theory could support Unicode locales for XIM,
in practice it doesn't on any platform I know of, so you
end up converting from the locale-dependent encoding to 
Unicode.

But worse, XIM has a very complicated set of API's that 
are very much tied to the X event model and to X's ideas
of fonts and rendering. Trying to adapt them to a
more modern system is not easy, and I'm toying with the
idea of abandoning XIM entirely. (though it should be
possible to provide a compatibility layer).
 
> * The text widget should support the idea of tags that span a certain 
>   text range. (Idea borrowed from the Tk text widget). 
>
> * The tags define the following properties (and possibly lots of other
>   properties):
> 
>     - Background color.
>     - Background pixmap.
>     - Foreground color.
>     - Relief(?)
>     - Unicode chars to font mapping.
> 
>   (Once the tag system is established it shouldn't be too difficult to
>   add arbitrary mouse bindings to a tag as well.)
> 
> * Copy and paste within and between two text-widgets (including if
>   they are belonging to different processes) should preserve and
>   copy the tags. (Is this at all possible for plain gtk without GNOME?)

Certainly, you just need to define a data format for the 
cut-and-paste that can hold the tags.
 
> * The Unicode chars to font mapping needs more explanation. The idea is
>   that it should be possible to map different ranges of unicode characters 
>   to a different font. E.g. to use "Helvetica" for the ISO-Latin-1
>   range and the font web1 for Hebrew characters the following tag may 
>   be defined:

I'm planning something a little more comprehensive for
GTK+ for the next release. I'll post some detailed design
plans in the next few days (probably just to gtk-i18n-list
until they get more fleshed out), but the basic idea is
that there will be language (or script) specific modules.

That is, instead of saying that U+5d0-U+5ea gets rendered
with "-*-web1-medium-r-*-*-*-160-*-*-*-*-*-8", there
will be some configuration data say that U+5d0-05ea get
handled by the "iw.so" module; the reason why this is
necessary is that for a large set of languages, simple
1-1 glyph-character mappings are not sufficient. 

When the iw.so module goes to render to X, then the
font mapping can be determined by a simple font list.
That is, the iw.so asks for an iso-8859-8 font, and
one is looked up from a font list.
 
>     Tag: {
>        Name => "latin1+hebrew",
>        Background => "yellow",
>        Foreground => "red3",
>       
>        Fontmap => 
>        {
>            {
>                Font=>        "-*-helvetica-medium-r-*-*-*-160-*-*-*-*-*-1",
>                Mapping=> {
>                    From=>    {U+0021..U+007E, U+00A0..U+00FF},
>                    To=>      {0x21..0x7E, 0xA0..0xFF}
>                }
>            },
>            {
>                Font=>        "-*-web1-medium-r-*-*-*-160-*-*-*-*-*-8",
>                Mapping=> {
>                    From=>    {U+05D0..U+05EA},
>                    To=>      {0xE0..0xFA}
>                }
>            }
>        }
>     }
> 
>   A syntax for efficiently defining these tags needs to be established.
>      
> * A tag may inherit from a different tag and overload a subset of 
>   properties.
> 
> * A special tag always spans the whole text widget and by changing the
>   attributes of this tag, the default attributes of the text widget is
>   changed.
> 
> * Another special tag defines the attributes of selected text. 
> 
> * It should be possible to import and export a XML representation of the
>   entire contents of the text widget.

Whether this is possible or not depends on whether you allow
user defined tags. If you want to allow cut and paste of
tagged text between apps, you need some sort of text 
representation Using XML is good, but bad in that it brings in
another libary dependency, though a fairly light one.
 
> * The concept of tag may be used for l10n as well by making it possible
>   to define a fontmap in the gtkrc file so that the gettext resource files
>   can be interpreted in Unicode and displayed in the desired font for 
>   all widgets with default widgets that display text.

Yes, basically. I think the idea is that font lists defined
in the RC file will be used to render all GtkLabels. Where
language tagging is needed (i.e., between Japanese and Chinese)
it will be provided by the user's LANG setting.
 
>   (How are the l10n concerns that are addressed in the latest version of the
>   perl journal addressed btw? There it is basically claimed that the static
>   gettext method don't work for a lot of languages of the world! The
>   article defines a way of writing subroutines that given a set of
>   paremeters returns a resolved string.)

I think they are ignored ;-). Generally, by picking the right
set of strings in English to translate, the set of cases
where the same English string needs to be translated in
different ways in different places can be made very small.
The info pages for gettext have some information and 
commentary about this issue.

The nice thing about gettext is that it makes it simple enough
for the program writer that authors that don't care much
about translation can be convinced to use it. catgets()-style
message catalogs are worse in this regard, and the scheme
proposed in the Perl journal, completely infeasible.

Regards,
                                        Owen



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]