Re: Encoding in g_filename_to_uri()



On Fri, 2004-04-16 at 02:17, Federico Mena Quintero wrote:
> Here is what I think is a bug.  Do this:
> 
> 0. Make sure you are using GtkFileSystemUnix.
> 1. export LANG=es_MX.ISO8859-1
> 2. export G_FILENAME_ENCODING= locale
> 3. gedit
> 4. Save a filename called áéíóú
> 5. Hit File/Open
> 6. Select that file
> 7. Gedit will tell you that the file does not exist and would you like
> to create it.
> 
> The filename on disk is 5 bytes long with non-ASCII characters, as it is
> in ISO8859-1.  If it had been created without G_FILENAME_ENCODING, it
> would be 10 bytes long and in UTF-8.
> 
> Internally, Gedit uses gtk_file_chooser_get_uris(), and then for each
> one of those it does gnome_vfs_uri_exists() --- note that this has
> nothing to do whether you are using GtkFileSystemUnix or
> GtkFileSystemGnomeVFS; Gedit does use gnome-vfs for itself.
> 
> gtk_file_chooser_get_uris() gets the list of internal GtkFilePath, which
> for the unix backend are filenames in the local encoding, and converts
> them to URIs using g_filename_to_uri().
> 
> However, g_filename_to_uri() does essentially this:
> 
> char *
> g_filename_to_uri (char *filename)
> {
>   char *utf8_filename;
>   char *escaped;
> 
>   utf8_filename = g_filename_to_utf8 (filename);
>   escaped = g_escape_file_uri (utf8_filename);
> 
>   return escaped;
> }
> 
> g_filename_to_utf8() takes the local encoding for filenames and uses it
> to convert the filename to UTF8.  So, our 5-byte filename from above
> gets converted into a 10-byte UTF-8 string.  Later, g_escape_file_uri()
> turns this into a percent-escaped string and prepends a "file://".  The
> end result is something like
> 
> 	file:///home/federico/%C3%A1%C3%A9%C3%AD%C3%B3%C3%BA
> 
> which is a valid URI, but does *not* refer to the filename above.  I
> think the result should be
> 
> 	file:///home/federico/%E1%E9%ED%F3%FA
> 
> That is, the hexadecimal representation of "áéíóú" in ISO8859-1.
> 
> When gnome-vfs gets the URI to see if it exists, it decodes it and fails
> to locate the file because the URI is encoded incorrectly in glib.
> 
> I think g_filename_to_uri() should not call g_filename_to_utf8() and
> just pass the filename to g_escape_file_uri().
> 
> Is this analysis correct?

Yes. The problem is that glib and gnome-vfs disagree on the format of
file:// uris. glib always tries to convert the filename to utf8, while
gnome-vfs has no defined encoding for its uris. 

This is a longstanding problem, and I actually sat down and wrote a
freedesktop standard about how to use file:// uris. See:
http://freedesktop.org/Standards/file-uri-spec

I dunno how to fix this though, can we just change g_filename_to_uri?
Apps might be using that, and the could break from the change...

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Alexander Larsson                                            Red Hat, Inc 
                   alexl redhat com    alla lysator liu se 
He's a suave zombie assassin in drag. She's a pregnant foul-mouthed pearl 
diver who hides her beauty behind a pair of thick-framed spectacles. They 
fight crime! 




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]