Re: UTF-8
- From: Ole Laursen <olau hardworking dk>
- To: gnome-i18n gnome org
- Subject: Re: UTF-8
- Date: 10 Jul 2002 19:32:45 +0200
Damien Donlon - Sun Ireland - Solaris Software - Localisation Engineer <damien.donlon@sun.com> writes:
> I think it may be impossible to distinguish between UTF-8 and 8859-1
> if no character is outside the 0-127 range. Can anyone confirm? Is
> this a big problem in identifying UTF-8 encoded files?
Yep. And no - if there's no characters above 127, it doesn't matter
how you interpret the file so it is really not a problem. :-)
The big problem is to decide whether the characters above 127 are part
of the UTF-8 encoding or just ordinary characters. This is in general
unsolvable, but with some clever coding you could specialize for a lot
of different languages, I suspect. Danish text contains 'æ', 'ø', and
'å', for instance, so if you spot the UTF-8 equivalents 'æ', 'ø' and
'Ã¥', you can be pretty sure that it is UTF-8 and not ISO 8859-1. This
is how I do myself. ;-)
--
Ole Laursen
http://sunsite.dk/olau/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]