Re: deploying UTF-8 in new programs
- From: Owen Taylor <otaylor redhat com>
- To: Darin Adler <darin bentspoon com>
- Cc: gtk-devel-list gnome org
- Subject: Re: deploying UTF-8 in new programs
- Date: 19 Jul 2001 18:53:17 -0400
Darin Adler <darin bentspoon com> writes:
> Some Unicode-related questions (maybe I should be asking these on the
> gnome 2.0 list or the gnome i18n list):
>
> 1) How did the po files for projects like gtk+ get transcoded to UTF-8?
> Who did it? With what tools?
Mostly Robert Brady. See gtk+/po/README.translators for information
about tools.
> 2) Is there a standard way to detect at runtime that the gettext
> translations are in the wrong charset? Should we bother doing that?
Things will die horribly with warnings all over the place. If we take
the GTK+ approach of putting all .po files in UTF-8 then this is
solely a translator problem, since such .po files will work whether
or not your system has bind_textdomain_codeset().
> 3) How should programs figure out what character set file names are in?
> Should we add something to glib and/or gnome-vfs to help with this?
Basically, this is a "Unix is screwed" problem. The file system isn't
tagged, and file names are far too short to autodetect. (You might not
do too badly assuming UTF-8 and falling back to the locale if it
isn't legimimate, but maybe not. But that doesn't help on saving.)
I don't think anyone has come up with a satisfactory solution yet.
The only thing that is going to half-way work is if everybody simply
switches over to UT8-locales and converts all their filenames.
> 4) Are there functions in the platform for converting file names
> and paths and the like to and from UTF-8?
Yes:
/* Convert between the operating system (or C runtime)
* representation of file names and UTF-8.
*/
gchar* g_filename_to_utf8 (const gchar *opsysstring,
gssize len,
gsize *bytes_read,
gsize *bytes_written,
GError **error);
gchar* g_filename_from_utf8 (const gchar *utf8string,
gssize len,
gsize *bytes_read,
gsize *bytes_written,
GError **error);
The implementation of these are nothing very magic on Unix.
* Normally they are no-ops
* if the G_BROKEN_FILENAMES environment variable is set, they
reduce to g_locale_to/from_utf8.
> 5) When making file: URIs, should the % sequences encode the
> actual file names, or the UTF-8 equivalent of the file names, taking
> into account the character set used for file names?
No clue. The one thing I'd consider is - many places should deescape
filenames for display, and that's only possible if you have the
filename in UTF-8 form.
Regards,
Owen
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]