On Tue, Mar 23, 2004 at 08:57:21PM +0100, Murray Cumming wrote:
> On Tue, 2004-03-23 at 18:49, Paul Elliott wrote:
> > On Tue, Mar 23, 2004 at 10:18:06AM +0100, Murray Cumming wrote:
> > > On Tue, 2004-03-23 at 03:37, Paul Elliott wrote:
> > > > I am told that there are some systems where sizeof(wchar_t) == 2.
> > > > On these systems the c locale can only support UCS-2 not UCS-4!
> > > >
> > > > Has Gtkmm or Glibmm been ported to any of these systems?
> > >
> > > I have no idea. I think linux and GTK+ usually use UTF8 rather than UCS2
> > > or UCS4, and there is a working Windows port.
> > >
> >
> >
> > My understanding is that the buffers in Glib:ustring Gtk::TextBuffer
> > contain UTF8, but the iterators interate over gunichar which are UCS-4.
>
> The TextBuffer iterators iterate over characters.
Yes and those characters are UCS-4 even though the characters
in the buffer are UTF8.
> Please point me to
> exactly the API reference (or a code example) for what you mean.
> Iterating over bytes would be almost useless.
>
Proof:
in ustirng.h ustring_Iterator is defined. When we look there we see
that operator*() is type gunichar;
> typedef gunichar value_type;
> inline value_type operator*() const;
>
The implementation of operator*() is further down in the file:
>template <class T> inline
>typename ustring_Iterator<T>::value_type ustring_Iterator<T>::operator*() const
>{
> return Glib::get_unichar_from_std_iterator(pos_);
>}
If we look at the documentation for get_unichar_from_std_iterator in:
http://www.gtkmm.org/gtkmm2/docs/reference/html/namespaceGlib.html
we find that it returns UCS-4 character:
>gunichar get_unichar_from_std_iterator(std::string::const_iterator pos )
>Extract a UCS-4 character from UTF-8 data.
>
>
>Convert a single UTF-8 (multibyte) character starting at pos to a
>UCS-4 wide character. This may read up to 6 bytes after the start
>position, depending on the UTF-8 character width. You have to make
>sure the source contains at least one valid UTF-8 character.
>
>
>This is mainly used by the implementation of Glib::ustring::iterator,
>but it might be useful as utility function if you prefer using
>std::string even for UTF-8 encoding.
Thus even the buffer contains varrible byte UTF8 "characters",
the interator returns fixed size 32 bit UCS-4 characters.
My point is that these are too big to fit in a wchar_t for
those systems where sizeof(wchar_t) ==2.
My original question is: "Does gtkmm, glibmm run on any systems
where sizeof(wchar_t) == 2?" (I want my code to use another
library (boost::regex) that needs these characters to be in wchar_t.)
--
Paul Elliott 1(512)837-1096
pelliott io com PMB 181, 11900 Metric Blvd Suite J
http://www.io.com/~pelliott/pme/ Austin TX 78758-3117
Attachment:
pgpCHGN26KeuL.pgp
Description: PGP signature