Old Windows-specific bug in GOption



Hello

First, I would like to send a big thank you to Murray Cumming for
committing the fix for the lingering libsigc++ accumulator bug! It
will allow everyone to write more powerful signals code.

There's a second serious bug that impacts Inkscape, and it's one of
the reasons we still uses libpopt for option parsing. This time it's
in Glib.
https://bugzilla.gnome.org/show_bug.cgi?id=522131

The comment thread in the bug is rather long, so I will outline what's
the problem. I'll call it "The Immigrant Problem", since it is most
likely to impact people who changed their country of residence.

On Windows, the command line passed to main() is in the "system
codepage" (glib term: locale encoding). Usually it's some
ASCII-compatible single byte encoding, but on Asian systems it might
be multibyte. It's all fine when the user only opens files with names
in his local language. Problems start to happen when, for example,
someone tries to open Russian-named files on an English copy of
Windows. To be more precise, it happens whenever someone tries to open
a filename that contains characters not representable in the locale
encoding.

The arguments of main() simply do not contain the information needed
to unambiguously locate the Russian-named files. If there are two
plain text files with three-character Russian names in one folder,
both of them will be passed to main() as ???.txt and there is no way
to tell which one should be opened. It means the arguments of main()
are useless in this case.

Right now, GOption assumes that the argv passed to it is in locale
encoding and converts it to UTF-8 when parsing, so even if we have the
Unicode command line from somewhere else, we cannot parse it using
GOption. If we converted the Unicode command line to locale encoding,
we would lose the necessary additional information it contains. More
complicated tricks undermine the merit of using GOption at all.

The Unicode command line can be retrieved on Windows using one of the
following ways:
- GetCommandLineW() - functions retrieves the full command line as a
single string as it was entered, without shell expansion.
- __wargv - undocumented global variable equivalent to __argv that
contains argv in UTF-16.
- _wmain() - undocumented alternative entry point that receives UTF-16
argv. I heard that MinGW doesn't support it, though didn't check it.

There a few possible solutions to this problem:
1. On Windows, ignore the arguments of g_option_context_parse, and use
__wargv instead.
2. Same as above, but use the new behavior for a new method and
deprecate g_option_context_parse.
3. Introduce a helper function that retrieves UTF-8 command line
arguments, based on global variables and arguments of main(), and
provide g_option_context_parse_utf8 method which takes UTF-8 encoded
argv.
4. Same as 3, but use an option context property to select the input encoding.
5. Introduce a method that always parses argv based on global data.
6. Same as 1, but try to guess whether the passed argv was modified by
comparing it to __wargv. If it wasn't, use __wargv.

Each has a downside:
1: Can break programs that modify the arguments of main() before
passing them to GOption.
2: Requires changes to existing code and disallows modifying argv
before parsing when using GOption (probably not a bad thing).
3: Requires changes to existing code and using a non-obvious function
that does nothing on Gnome's primary platform (Unix), so it's possible
most programs will not use it and remain broken on Windows until
someone discovers this relatively rare bug for each of them.
4: Overloads one method to take 2 different types of data.
5: Requires depending on highly platform-specific features. There is a
__wargv equivalent called __libc_argv, which works at least on Linux,
but probably not on other Unixes.
6: Still doesn't fix the problem if argv is modified, causing
unobvious breakage.

I would favor solution #2, since it's the simplest and least probable
to cause confusion, but bug commenters apparently wanted to go with
#6. I sent a few patches implementing #6 but they weren't committed.
What do others think about this?

Regards, Krzysztof Kosiński


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]