Re: G_UTF8String: Boxed Type Proposal

From: Simon McVittie <simon mcvittie collabora co uk>
To: gtk-devel-list gnome org
Subject: Re: G_UTF8String: Boxed Type Proposal
Date: Thu, 17 Mar 2016 20:48:34 +0000

On 17/03/16 20:29, Matthias Clasen wrote:

Terminology can certainly be confusing at times, but I think that a
Unicode character is a perfectly well-defined entity, non-withstanding
the fact that it can be represented in various encodings (a utf8
sequence, a ucs4 word, a utf-16 surrogate pair, etc).


You mean a code point, then (that's basically what gunichar is). I think
the reason Unicode people are so pedantic about "code point" is because
a code point may or may not be what you actually mean when you say
"character", whereas it's rare that I see "code point" used with a
meaning other than its Unicode one.

More precisely, a Unicode code point is an abstract entity indexed by a
number, such as U+0041 LATIN CAPITAL LETTER A or U+262D HAMMER AND
SICKLE, which can only be concretely represented as some particular byte
sequence by passing it through an encoding like UCS-4, UTF-8 or
ISO-8859-1. Some encodings are more obvious than others, and in
particular non-Unicode encodings like ISO-8859-1 cannot represent every
Unicode code point.

-- 
Simon McVittie
Collabora Ltd. <http://www.collabora.com/>

Follow-Ups:
- Re: G_UTF8String: Boxed Type Proposal
  - From: Matthias Clasen

References:
- G_UTF8String: Boxed Type Proposal
  - From: Randall Sawyer
- Re: G_UTF8String: Boxed Type Proposal
  - From: Matthias Clasen
- Re: G_UTF8String: Boxed Type Proposal
  - From: Randall Sawyer
- Re: G_UTF8String: Boxed Type Proposal
  - From: Jasper St. Pierre
- Re: G_UTF8String: Boxed Type Proposal
  - From: Matthias Clasen
- Re: G_UTF8String: Boxed Type Proposal
  - From: Jasper St. Pierre
- Re: G_UTF8String: Boxed Type Proposal
  - From: Matthias Clasen

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]