Re: working progress for UTF-8

From: Cyrille Chepelov <cyrille chepelov org>
To: dia-list gnome org
Subject: Re: working progress for UTF-8
Date: Thu, 17 Jan 2002 22:46:23 +0100

Akira,

 here are the details. The "audit" file mentions all .[ch] files as of the
beginning of summer 2001. Files with no tags have not been audited for UTF-8
correctness. Those with just "UTF8" are okay, some are tagged "!UTF8
(done)", meaning they had non-compliances, but that it's now sorted out. 
(forget the "LAZY" tags, these are for the now removed support for
"lazyprop"-style of handling properties). 

I would highly suggest you begin with files tagged "!UTF8 (done)" to see how
it's done; the main complexity here is that the program must still work even
if not HAVE_UNICODE (we will drop non-unicode support one day, but not yet),
and it must also work when there are situations of GTK_CHARSET_MISMATCH
(such as GTK_DOESNT_TALK_UTF8_WE_DO, which was my target for gtk1.4/GNOME1,
or GTK_TALKS_UTF8_WE_DONT which is the case currently on Win32, and of
course the final target is GTK_TALKS_UTF8 and (!GTK_CHARSET_MISMATCH) (this
is the GNOME2 /gtk2 case).

I'll be happy to give more detailed explanations on more specific cases...
lib/prop_text.[ch] is certainly a good place to look at for the interaction
between dia and gtk in the various states of UTF-8 awareness.

BTW I have another question. it's about gnome-print support.
Dia can print it out without gnome-print. but it's just
complex that support the printing for CJK. so Dia also can't
handle it. if the printer has no fonts for CJK, current
implementation will not be print out even if possible.
so it needs to implement like gnome-print for printing with
all of environment, I think. I mean the fonts should be
embedded on PS.
for example, this problem should occur on Windows too. but I
don't try to run it on Windows. doesn't anyone fall on this
problem?


The solution currently implemented is the following: there is a module in
lib/ps-utf8, called the "PS Unicoder", which follows the approach taken by
Microsoft in its Postscript drivers; basically, character maps are built on
the fly, using only the characters actually used by the text. It of course
works great in 8859-1, I haven't got reliable and consistent reports 0for
non-8859-1 (let's say, it "sorta works"). Now, I'm not running 8859-1
anymore, but I don't have the time to test (and for only two different
characters...). Of course, managing the encoding tables into something the
printer understands is only 50% of the job...

...Currently, dia depends on having the printer (or the Ghostscript) to 
include a very strict set of fonts. This is the main reason there is not a 
lot of freedom in the choice of fonts. Downloading more fonts means being able to
locate the font files or glyph outlines on the local system or the
potentially remote font server, and turning them into things the printer can
swallow and display nicely. This is not my area of expertise; Lars Clausen
had some work in progress there, you'd probably better ask him (he'll
probably comment if he reads this <grin/>). 

Happy hacking !

        -- Cyrille

-- 
Grumpf.

Attachment: audit
Description: Text document

References:
- working progress for UTF-8
  - From: Akira TAGOH
- Re: working progress for UTF-8
  - From: Cyrille Chepelov
- Re: working progress for UTF-8
  - From: Akira TAGOH

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]