Re: Gnopernicus and ISO-Latin2 characters

From: Bill Haneman <Bill Haneman Sun COM>
To: remus draica <rd baum ro>
Cc: Jan Buchal <buchal brailcom org>, gnome-accessibility-list gnome org
Subject: Re: Gnopernicus and ISO-Latin2 characters
Date: Fri, 26 Aug 2005 12:03:40 +0100

Hi Remus:

In UTF-8, which is the default encoding on most of our platforms now,there is only "one character set", which is the unicode character set.(more below)


remus draica wrote:

Hi,

Bill, for me one thing is still unclear. Is possible to have more than
one graphical representation for a character (same code as number)
depending on the character set? In this case, how this is specified? In
windows I know that a code page is used for this.

Windows systems can and do use unicode as well.

My impression was (and still is) that all characters are represented
from 0 to a huge number and some parts of the interval represents
characters sets.

Yes, but those 'code pages' do not represent different languages. Theymap approximately to different "scripts".

0......x......y.....z......t.....(something huge)
      [latin1]    [latin2]
where latin1 is the interval of chars with codes between x and y.

you said "different Latin characters"? What that means? Same character
in different languages? For example "a" in english and german? Or same
character with a different sense depending on something else (a "code
page"?)?

As Samuel pointed out, by 'different Latin characters' I meant thevarious latin code pages in unicode. They do not overlap, in the sensethat characters in Latin-2 are not redundant with Latin-1, and most reallanguages that use the Latin character set include characters from morethan one Latin code page.When dealing with UTF-8 content, there is no ambiguity about whatcharacter is intended by a particular set of bytes; each UTF-8 encodedcharacter refers to a specific unicode 'codepoint' or character. Thusthe 'a' in German and the 'a' in English and French refer to the sameunicode point, usually written as U+00061, and both are encoded in UTF-8as a single byte, '0x61'. The 'a' with dieresis (ä) is in the "Latin-1Supplement" section of unicode, written U+00E4, and is encoded in UTF-8as two bytes, '0xC3' '0xA4'.


regards,

Bill

Remus

References:
- Gnopernicus and ISO-Latin2 characters
  - From: Jan Buchal
- Re: Gnopernicus and ISO-Latin2 characters
  - From: Bill Haneman
- Re: Gnopernicus and ISO-Latin2 characters
  - From: Jan Buchal
- Re: Gnopernicus and ISO-Latin2 characters
  - From: Bill Haneman
- Re: Gnopernicus and ISO-Latin2 characters
  - From: remus draica
- Re: Gnopernicus and ISO-Latin2 characters
  - From: Bill Haneman
- Re: Gnopernicus and ISO-Latin2 characters
  - From: remus draica

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]