Re: Exposing get_filename_charset
- From: Matthias Clasen <mclasen redhat com>
- To: gtk-devel-list gnome org
- Subject: Re: Exposing get_filename_charset
- Date: Tue, 02 Nov 2004 12:11:35 -0500
On Mon, 2004-11-01 at 10:21 +0100, Alexander Larsson wrote:
> On Sun, 2004-10-31 at 20:19 +0000, Tor Lillqvist wrote:
> > Alexander Larsson writes:
> > > The local files part of nautilus_file_get_display_name currently goes
> > > like:
> >
> > If I understood that code correctly, it patches the display name
> > together from valid UTF-8 snippets in the string and question marks? I
> > think instead of question marks it would be more useful to use
> > something like g_strescape() of the whole string. I don't think file
> > names that are partially in UTF-8 and partially in something else
> > occur very often, so using the portions of the string that happen to
> > be valid UTF-8 as such is probably wrong, and it would be better to
> > just output all of the non-ASCII bytes in octal or hex.
> >
> > Would this be OK:
>
> Not trying to convert from various likely encodings makes this fail for
> many common cases. The eel fallback code with question marks might not
> be ideal, but in reality it isn't hit that often. I'm not sure that
> showing escaped characters in the user interface is any better though.
Alex, does this look like a reasonable first attempt ?
Based on the nautilus code you posted earlier, and only very
superficially tested:
static gchar *
make_valid_utf8 (const gchar *name)
{
GString *string;
const gchar *remainder, *invalid;
gint remaining_bytes, valid_bytes;
string = NULL;
remainder = name;
remaining_bytes = strlen (name);
while (remaining_bytes != 0)
{
if (g_utf8_validate (remainder, remaining_bytes, &invalid))
break;
valid_bytes = invalid - remainder;
if (string == NULL)
string = g_string_sized_new (remaining_bytes);
g_string_append_len (string, remainder, valid_bytes);
g_string_append_c (string, '?');
remaining_bytes -= valid_bytes + 1;
remainder = invalid + 1;
}
if (string == NULL)
return g_strdup (name);
g_string_append (string, remainder);
g_string_append (string, " (invalid encoding)");
g_assert (g_utf8_validate (string->str, -1, NULL));
return g_string_free (string, FALSE);
}
gchar *
g_filename_display_name (const gchar *filename)
{
gint i;
const gchar **charsets;
gchar *display_name = NULL;
gboolean is_utf8;
is_utf8 = g_get_filename_charsets (&charsets);
if (is_utf8)
{
if (g_utf8_validate (filename, -1, NULL))
display_name = g_strdup (filename);
}
if (!display_name)
{
/* Try to convert from the filename charsets to UTF-8.
* Skip the first charset if it is UTF-8.
*/
for (i = is_utf8 ? 1 : 0; charsets[i]; i++)
{
display_name = g_convert (filename, -1, "UTF-8", charsets[i],
NULL, NULL, NULL);
if (display_name)
break;
}
}
/* if all conversions failed, we replace invalid UTF-8
* by a question mark
*/
if (!display_name)
display_name = make_valid_utf8 (filename);
return display_name;
}
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]