Re: character set in URIs for drag and drop
- From: Daniel Veillard <veillard redhat com>
- To: Darin Adler <darin bentspoon com>
- Cc: gnome-hackers gnome org
- Subject: Re: character set in URIs for drag and drop
- Date: Fri, 24 Aug 2001 16:26:28 -0400
On Fri, Aug 24, 2001 at 01:06:55PM -0700, Darin Adler wrote:
> When dealing with drag and drop of files, we often deal with URIs for
> files on the local system. Currently, the URIs typically use the raw name
> from the file system with URI encoding. For example, if a file has an
> upside-down question mark in its name, and my file system has file names
> encoded with the Latin-1 character set, then the file might have this URI:
>
> file:///home/darin/%BFQue%3F
>
> But I think that this URI would not be what GNOME 2 programs would expect.
> They would instead expect the URI to be encoded with UTF-8:
That time it's not in RFC 2396, too old, maybe it has bee superseeded
http://www.faqs.org/rfcs/rfc2396.html
2.1 URI and non-ASCII characters
"For original character sequences that contain non-ASCII characters,
however, the situation is more difficult. Internet protocols that
transmit octet sequences intended to represent character sequences
are expected to provide some way of identifying the charset used, if
there might be more than one [RFC2277]. However, there is currently
no provision within the generic URI syntax to accomplish this
identification. An individual URI scheme may require a single
charset, define a default charset, or provide a way to indicate the
charset used.
It is expected that a systematic treatment of character encoding
within URI will be developed as a future modification of this
specification."
However each time I have discussed with people from the I18N
group at W3C I was told that the URI should first be converted to
UTF8, then the normalization would ocur. There is some recent
prose on this issue in the XPointer specification:
http://www.w3.org/TR/xptr#uri-escaping
---------------------
1. Each disallowed character is converted to UTF-8 [IETF RFC 2279]
as one or more bytes.
2. Any bytes corresponding to a disallowed character are escaped
with the URI escaping mechanism (that is, converted to %HH,
where HH is the hexadecimal notation of the byte value).
3. The original character is replaced by the resulting character sequence.
---------------------
This part had really a lot of review, I would trust it.
Daniel
--
Daniel Veillard | Red Hat Network http://redhat.com/products/network/
veillard redhat com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]