Re: utf8 odd behavior with Gtk2
- From: Torsten Schoenfeld <kaffeetisch gmx de>
- To: gtk-perl-list gnome org
- Subject: Re: utf8 odd behavior with Gtk2
- Date: Sun, 01 Jul 2012 19:24:28 +0200
On 01.07.2012 14:32, zentara wrote:
What you will notice, or I do with perl 5.14.1, is that the placement
of the "use utf8::all" changes what is decoded properly.  If that use line
comes before the Gtk2 modules, it dosn't decode input. If placed after, it
works fine.  Furthermore, if you comment out the Gtk2 modules, it works
right.
This is due to Gtk2's treatment of @ARGV.  When you call Gtk2::init (for 
example via 'use Gtk2 -init'), it copies @ARGV into a C array and passes 
it on to gtk_init, which might remove entries from it.  To make these 
changes visible to the Perl programmer, Gtk2::init then clears @ARGV and 
copies the contents of the C array back into it.  The problem you found 
occurs because all this copying does not take the UTF8 flag into account 
(it simply uses SvPV and newSVpv).
So when you use utf8::all before Gtk2, @ARGV contains strings whose 
internal representation is in UTF8.  When Gtk2::init then reconstructs 
@ARGV from the C array, it creates Perl strings from UTF8 encoded byte 
sequences but does not mark the strings as such (i.e. it does not set 
the UTF8 flag).  When you print these strings, perl sees no UTF8 flag 
and so assumes they contain Latin1-encoded byte sequences and tries to 
convert them to UTF8.  This leads to the doubly-encoded output that you see.
So the diagnosis is easy enough.  I'm not so certain about the correct 
fix, though.
â Do we continue to use SvPV/newSVpv but also store the UTF8 flag, and 
if it was set, restore it?
â Do we switch to always using SvPVutf8/newSVpvn_utf8, assuming that 
@ARGV always contains UTF-8-encoded data?
â Do we switch to always using SvPVbyte/newSVpv, assuming that @ARGV 
always contains Latin1-encoded data?
I'm leaning towards the first option, but I'm not sure.  I don't have a 
firm grasp on the Perl/UTF-8/XS complex yet, and I've yet to see clear 
documentation for XS authors.
[
Date Prev][
Date Next]   [
Thread Prev][
Thread Next]   
[
Thread Index]
[
Date Index]
[
Author Index]