OT: On ISO decisions [codes and charsets] (was: Re: locale for Uzbekistan)
- From: Danilo Segan <dsegan gmx net>
- To: "Andrew W. Nosenko" <awn bcs zp ua>
- Cc: gnome-i18n gnome org
- Subject: OT: On ISO decisions [codes and charsets] (was: Re: locale for Uzbekistan)
- Date: Sat, 27 Sep 2003 01:37:19 +0200
петак, 26. септембар 2003. 22:29:43 CEST — Andrew W. Nosenko написа:
>
> Just one more point why I hate ISO.
I think ISO is great in this particular case -- it provides you a
*meaningful* code (that is, it's distinct) for such things like
different scripts for same language.
Character sets have (almost) nothing to do with the script. It crucial
that I can differentiate between the *codes* (yes, they are *codes*,
not "words to be pronounced"). They are technical entities, in a sense
that we want machines to use them programmaticaly. Sure, it helps if
they remind us of their true meanings, and that purpose is fullfilled
too (you're not going to tell me how "Cyrl" doesn't remind you of
"cyrillic", ain't you? :-)
>
> One time he are think that iso-8859-5 should be used... (Question: is
> at least one cyrillyc specking people exists that uses this brain-
> dameged encoding?)
Sorry to dissappoint you, but it is quite used in "Unix systems" (don't
tell those OpenGroup folks for misappropriate use of the name ;-) for
Serbian language, where UTF-8 is not supported ;-)
It's also quite common to come across mail communication in ISO-8859-5
between those speaking Serbian. Okay, I admit, there's no real standard
for Serbian, so that's why UTF-8 is getting good acceptance in here.
Btw, ISO-8859-5 is no more "brain-damaged" encoding for cyrillic than
Unicode is, since the latter is mapping it one-to-one from 0xa0--0xff
to 0x400--0x45f, or something like that (not really sure on the exact
numbers of characters present, which implies the starting character
0xa0).
Also, if you think ISO-8859-5 is braindamaged, you should take a look
at 7-bit YUSCII encoding which encodes some letters over "[", "]" and
similar characters ;-)
The main shortcoming of ISO-8859-5, as I see it, is that it is not in
correct "collating" order for Serbian (which also means that Unicode is
not either), but as UCA[*] proves, it's possible to make it work while
keeping the collating sequence correct for Russian and other cyrillic
languages.
Actually, I don't see any advantage to KOI8-* encodings, especially
since "striping the high bit and getting readable text" is not very
needed in most modern software.
>
> Now these guys think that we should crack own tongue and brains (in
> physioligy sence) by reading his totally unliteracy abbreviations...
>
> Cyrillyc, not Cyrllyc!!!
> ^ note this `i'!!!
> Therefore, possible abbreviations are `Cyr' or `Cyril' or `Cyrill'
> but nat a `Cyrl' anyway!!!
Btw, I'd mention that it is "cyrillic" in English :-)
So, we've got four letter codes (sorry, I don't know why was it decided
on 4-letter codes, but lets accept that as a fact; why one would later
want *all* the codes to be of the *same* length is quite obvious, at
least I hope), and we've got to describe Cyrillic script with it.
We have a choice: Cyri or Cyrl. To me, it's more clear that "Cyrl" is
cyrillic (the first one might be pronounced like "syrai", which has
hardly any resemblance with "cyrillic"), and even more so since some
vowels can usually be excluded, and the word will still remind us of
the original.
Actually, there are several similar rules in Serbian language for
constructing abbreviations from full names. One of the rules is to take
a couple of first consonants and construct an abbreviation from them.
That means that even "Crl." (I don't know if "y" is vowel in English or
not, if it's not it would be "Cyrl" itself ;-) could be legitimate.
Perhaps English has similar rules which allow that usage, and don't
forget that "cyrillic" is English word.
Btw, I don't see any particular reason for "hating" ISO because of this
-- it's still easier to remember than some cryptic code (eg. 0xf642 for
cyrillic, 0xf4a7 for latin, etc.)
Cheers,
Danilo
[*] Unicode Collation Algorithm, Unicode Technical Report 10, I believe
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]