Re: g_filename_to_uri() issue in glib-win32



On 23 May 2012, at 00:22, Krzysztof Kosiński wrote:

> 
> What you get is an URI encoding of the UTF-8 bytes. I think this is
> the expected and correct behavior: there are multiple incompatible
> locale encodings and there's no way for this function to know what
> encoding you want to use for the URI. It would also fail if you had
> characters not representable in the locale encoding.
> 
> This is at most a documentation bug. It should be stated that this
> function converts the string byte-by-byte, and everything outside of
> the 0-127 range is converted to hex escapes.
> 

Thanks for the prompt reply Krzysztof,

I can see where you're coming from on this but there's another way to look at it.  In my example (Göran) the UTF-8 byte sequence (for my particular code page) would have been:-

47 C3 B6 72 61 6E

This would get displayed as:-

G [ some codepage dependent character ] r a n

But whatever that (second) character looked like, it's decimal value would always be 246 (because the UTF-8 sequence C3 B6 translates to decimal 246).

The URI translation of decimal 246 is %F6.

Therefore it should be possible to translate from UTF-8 [47 C3 B6 72 61 6E] into URI "G%F6ran" regardless of the user's code page.   On my system this would say "Göran" whereas on someone else's system it might look different but that's not really relevant.  The conversion itself is valid and shouldn't be affected by code pages.  Code pages will only affect the displayed appearance.

Of course, this is only with my simple example.  There might be other examples where my theory breaks down.  I've only considered this particular case.  But if what you said was true Krzysztof, g_filename_to_utf8() would suffer from the same problem - but it doesn't.  If (on Windows) you pass it a UTF-8 filename, it correctly recognises that the name is already UTF-8 and returns the original string (i.e. it doesn't attempt a new byte-by-byte conversion).

So 'g_filename_to_uri()' is misbehaving AFAICT.

John


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]