Re: Natilus using UTF-8 for filenames regardless of locale

From: Maciej Katafiasz <mnews22 wp pl>
To: cyrille bollu aweurope be
Cc: Mika Fischer <mika_fischer gmx net>, nautilus-list gnome org, nautilus-list-bounces gnome org
Subject: Re: Natilus using UTF-8 for filenames regardless of locale
Date: Fri, 16 Jul 2004 18:22:01 +0200

W liście z pią, 16-07-2004, godz. 17:52, cyrille bollu aweurope be
pisze: 
> > Also, can you tell me why Nautilus displays Filenames in different
> > character sets correctly? How the hell does it figure out the
> correct
> > cahracter set? This has the same taste as IE guessing the character
> set
> > used in HTML-Pages...
> 
> I think the UTF-8 standard is backward compatible with all the
> ISO-8859-... standards. This means, it's UTF-8 that guesses character
> set not nautilus.
> So, as, I guess, nautilus is using UTF-8 to display characters, it is
> able to display lots of character sets.

No. UTF-8 is indeed compatible with ASCII (that means text using only
characters in range 0-127 will be represented in the same way), and as a
Unicode encoding, it has codepoints compatible with ISO-8859-1 (that
means number assigned to characters that are in ISO-8859-1 are the same,
however their representation in ISO-8859-1 and UTF-8 is not the same),
but UTF-8 is not compatible with any of ISO-8859-* charsets, and
certainly cannot guess anything (encoding is a way of transforming
character numbers into byte sequences, it cannot "do" anything)

What IE (and other browsers too) does is to use some heuristics, that
makes it possible to choose amongst some subsets of known encodings
based on presence of sequences that are very unlikely (or impossible) to
apper in encoding currently in use. But this usually works only for some
combinations of multibyte encodings (ie, Chinese or Japanese), isn't
very reliable, and is meant only as a way to avoid some very broken
cases. Detection is not by any means way to fix encodings mess.

HTH,
Maciej

-- 
"Tautologizm to coś tautologicznego"
     Mathrick <mathrick swat pl>
       http://mathrick.blog.pl

References:
- Re: Natilus using UTF-8 for filenames regardless of locale
  - From: cyrille . bollu

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]