Re: Quotation marks: Using =?UTF-8?Q?=E2=80=9C=E2=80=9D?= instead of ""



On Mon, 2008-06-16 at 18:59 +0200, Dave Neary wrote:
> Iain * wrote:
> > On Mon, Jun 16, 2008 at 11:20 AM, Alan Cox <alan lxorguk ukuu org uk> wrote:
> >> In LANG=C you call gtk_label_new with UTF-8 strings. What happens at that
> >> point depends if gtk_label_new ever calls a single C library function
> >> that is locale dependant (eg strcasecmp).
> > 
> > All of GTK is utf-8 compatible.
> > This is the point we're trying to make.
> 
> I"m increasingly mystified by this discussion.
> 
> Lots of people use non-UTF8 locales - most of the people I know send
> iso-8859-15 emails and when I send UTF8 emails they end up seeing €¥ or
> whatever instead of é.

The header for the email you just sent has this:

Content-Type: text/plain; charset="utf-8"

Any mail client that tries to interpret your email as ISO-8859-1
is simply broken.  Things like web sites and email specify their
character encoding, precisely because not everybody is using the
same one.  When you get an email from me, regardless of your
character encoding, the email is to be interpreted as UTF-8.

This, of course, has nothing to do with UTF-8 strings being
passed to GTK+.

>  I can imagine that lots of Linux users use
> non-UTF8 locales for their UI. I don't know if there are any stats for
> this, but a couple of people working with distributions have said so.
> 
> Alan's made a reasonable argument that we shouldn't be using non-ascii
> in C source files. It's not standard. He's made a reasonable argument
> that in the case where a string is untranslated, or the user chooses the
> C locale, the output string will be the input string, and if the input
> string is non-ASCII UTF-8, then strange and unexpected things will happen.

Input to GTK+ functions is defined to be UTF-8.  Always,
regardless of a user's character encoding.  That's the
point Iain has been making.  When you have this:

gtk_label_new (_("foo"));

The output of the _ function had better be UTF-8, because
that's what GTK+ is going to treat it as.  If the input
is non-ASCII, the worst case scenario is that _ returns
the input, because it's untranslated.  If that non-ASCII
input is UTF-8, we're golden, because that's what GTK+
wants.

Again, *regardless of the user's locale*, the input to
gtk_label_new is UTF-8.  There is *never* any confusion
about what gets passed to gtk_label_new.

I very much doubt any of us are taking these strings
and passing them anywhere they don't belong.  They're
being used as simple keys.  If we do anything with the
strings, we do it with the translated strings, which
means we have to use UTF-8-capable functions anyway.

All of that means that there are no run-time problems.
The only actual concern is whether compilers will choke
on UTF-8 source files.  Alan says that, according to the
standard, a compiler would be perfectly right to choke.
I believe him.  I also don't care.

> This is starting to sound to me like change for change's sake. I don't
> see any decent reason to make the change (other than the "proper" quotes
> look better, even if they're harder to type), and credible people have
> pointed out a significant potential for breakage in a change like this.

They don't just look better.  They're easier to read.
What's more, they're correct.  And if we could get over
this barrier, maybe we could start using proper dashes.
Good typography makes text more legible.

--
Shaun




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]