Re: [Gnome-bindings] Strings and bindings

Guillaume Laurent <glaurent telegraph-road org> writes:

> Owen Taylor <otaylor redhat com> writes:
> > C++
> > ===
> [...]
> > There are at least three ways I can think of to handle 
> > GTK+'s utf-8 strings in Unicode:
> > 
> >  - convert them to wstring
> > 
> >  - convert them to basic_string<gunichar> 
> >  
> >    that is, avoid the problem of the unspecified width, by
> >    defining a new string type using a type of specified
> >    width.
> > 
> >  - Create an STL-string-like wrapper for a utf8 string. The
> >    problem here is that you don't get O(1) random access, which
> >    will no doubt disturb some of the people reading this.
> If simply reusing wstring is an option, then I suggest we do. No need
> to define new types if we can avoid it, plus it makes interoperability
> with the rest of the world easier.

The trouble is, wstring doesn't specify an encoding. Its just
wide characters. So, if the local has unicode wide characters,
then you are compatible with the rest of the world. If it
has JIS encoded wide characters (not common on Linux, but common
on other Unices), then you are incompatible, if though you
are both using wstring.

Another problem here is that released versions of GNU libstc++
still don't have wstring support. (I believe they are waiting
for glibc-2.2.) So, using wstring would be pushing the bleeding
edge in some places.
> > If one did use the standard STL wstring type, then one would
> > run into the problem that there will be no 
> > 
> >  wstring (const char *eightbit_string);
> > 
> > constructor so you would probably have to subclass it to add
> > that converter in any case. But I'm not enough of a C++ expert
> > to really comment.
> I'm not sure what you mean here. Is eightbit_string in utf8 ? There is 
> a wstring (const char *); ctor defined.

Yes, eightbit_string is in utf8. I assume the wstring (const char *)
ctor is takes the string and simply widens each byte - which
is not the right behavior - it is the right behavior if
eightbit_string is latin-1.

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]