Re: Gnopernicus and ISO-Latin2 characters
- From: Bill Haneman <Bill Haneman Sun COM>
- To: remus draica <rd baum ro>
- Cc: Jan Buchal <buchal brailcom org>, gnome-accessibility-list gnome org
- Subject: Re: Gnopernicus and ISO-Latin2 characters
- Date: Fri, 26 Aug 2005 12:03:40 +0100
Hi Remus:
In UTF-8, which is the default encoding on most of our platforms now,
there is only "one character set", which is the unicode character set.
(more below)
remus draica wrote:
Hi,
Bill, for me one thing is still unclear. Is possible to have more than
one graphical representation for a character (same code as number)
depending on the character set? In this case, how this is specified? In
windows I know that a code page is used for this.
Windows systems can and do use unicode as well.
My impression was (and still is) that all characters are represented
from 0 to a huge number and some parts of the interval represents
characters sets.
Yes, but those 'code pages' do not represent different languages. They
map approximately to different "scripts".
0......x......y.....z......t.....(something huge)
[latin1] [latin2]
where latin1 is the interval of chars with codes between x and y.
you said "different Latin characters"? What that means? Same character
in different languages? For example "a" in english and german? Or same
character with a different sense depending on something else (a "code
page"?)?
As Samuel pointed out, by 'different Latin characters' I meant the
various latin code pages in unicode. They do not overlap, in the sense
that characters in Latin-2 are not redundant with Latin-1, and most real
languages that use the Latin character set include characters from more
than one Latin code page.
When dealing with UTF-8 content, there is no ambiguity about what
character is intended by a particular set of bytes; each UTF-8 encoded
character refers to a specific unicode 'codepoint' or character. Thus
the 'a' in German and the 'a' in English and French refer to the same
unicode point, usually written as U+00061, and both are encoded in UTF-8
as a single byte, '0x61'. The 'a' with dieresis (ä) is in the "Latin-1
Supplement" section of unicode, written U+00E4, and is encoded in UTF-8
as two bytes, '0xC3' '0xA4'.
regards,
Bill
Remus
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]