Re: Gnopernicus and ISO-Latin2 characters



Hi Remus:

In UTF-8, which is the default encoding on most of our platforms now, there is only "one character set", which is the unicode character set. (more below)

remus draica wrote:

Hi,

Bill, for me one thing is still unclear. Is possible to have more than
one graphical representation for a character (same code as number)
depending on the character set? In this case, how this is specified? In
windows I know that a code page is used for this.
Windows systems can and do use unicode as well.

My impression was (and still is) that all characters are represented
from 0 to a huge number and some parts of the interval represents
characters sets.
Yes, but those 'code pages' do not represent different languages. They map approximately to different "scripts".

0......x......y.....z......t.....(something huge)
      [latin1]    [latin2]
where latin1 is the interval of chars with codes between x and y.

you said "different Latin characters"? What that means? Same character
in different languages? For example "a" in english and german? Or same
character with a different sense depending on something else (a "code
page"?)?
As Samuel pointed out, by 'different Latin characters' I meant the various latin code pages in unicode. They do not overlap, in the sense that characters in Latin-2 are not redundant with Latin-1, and most real languages that use the Latin character set include characters from more than one Latin code page. When dealing with UTF-8 content, there is no ambiguity about what character is intended by a particular set of bytes; each UTF-8 encoded character refers to a specific unicode 'codepoint' or character. Thus the 'a' in German and the 'a' in English and French refer to the same unicode point, usually written as U+00061, and both are encoded in UTF-8 as a single byte, '0x61'. The 'a' with dieresis (ä) is in the "Latin-1 Supplement" section of unicode, written U+00E4, and is encoded in UTF-8 as two bytes, '0xC3' '0xA4'.

regards,

Bill

Remus





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]