Conversion functions

From: Owen Taylor <otaylor redhat com>
To: gtk-devel-list gnome org
Subject: Conversion functions
Date: 03 Jan 2001 18:06:11 -0500
Currently, GLib has the following conversion functions

[ Ignoring g_convert_with_fallback(), and g_iconv(), which really
  don't affect this discussion. ]

gchar* g_convert               (const gchar  *str,
				gint          len,
				const gchar  *to_codeset,
				const gchar  *from_codeset,
				gint         *bytes_read,
				gint         *bytes_written,
				GError      **error);

gchar* g_locale_to_utf8   (const gchar  *opsysstring,
			   GError      **error);
gchar* g_locale_from_utf8 (const gchar  *utf8string,
			   GError      **error);

gchar* g_filename_to_utf8   (const gchar  *opsysstring,
			     GError      **error);
gchar* g_filename_from_utf8 (const gchar  *utf8string,
			     GError      **error);

gunichar2 *g_utf8_to_utf16 (const gchar     *str,
			    gint             len);
gunichar * g_utf8_to_ucs4  (const gchar     *str,
			    gint             len);
gunichar * g_utf16_to_ucs4 (const gunichar2 *str,
			    gint             len);
gchar *    g_utf16_to_utf8 (const gunichar2 *str,
			    gint             len);
gunichar * g_ucs4_to_utf16 (const gunichar  *str,
			    gint             len);
gchar *    g_ucs4_to_utf8  (const gunichar  *str,
			    gint             len);

So, there are three basic prototypes here:

 gchar* g_convert (const gchar  *str,
                   gint          len,
	  	   const gchar  *to_codeset,
		   const gchar  *from_codeset,
	           gint         *bytes_read,
		   gint         *bytes_written,
		   GError      **error);


gchar* g_locale_to_utf8 (const gchar  *opsysstring,
                         GError      **error);

gunichar2 *g_utf8_to_utf16 (const gchar     *str,
			    gint             len);

In theory all the functions should have the full set of arguments
from g_convert - so, we'd have:

 gchar* g_utf8_to_utf16 (const gchar  *str,
                         gint          len,
	                 gint         *bytes_read,
		         gint         *items_written,
		         GError      **error);

All of these arguments can be useful. In approximate order of importance.

 @len:           needed to be able to convert part of a larger string
 @bytes_read:    Needed to handle incomplete input (partial 
                 characters at end of UTF-8 or UTF-16 string)
 @error:         Useful for displaying a human-readable error
                 message on invalid input.
 @bytes_written: Can be a useful optimization

Though the other functions are basically convenience wrappers
around g_convert() (in theory, if not in practice), so the programmer
has the option of falling back to g_convert() if they need the
full details.

However, going to g_convert isn't very convenient, especially for
g_locale_to_utf8() and g_filename_to_utf8().

My instinct now is to standardize on having everything but
read/written in the individual functions, so we have:

 gchar* g_utf8_to_utf16 (const gchar  *str,
                         gint          len,
		         GError      **error);

And force people to go to g_convert for partial input. (If you
are converting a stream, you really want some other interface
anyways.)

And for the simple case:

 g_utf8_to_utf16 (str, -1, NULL);

Any opinions on this?
                                              Owen
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]