[glibmm] Glib::ustring docs: Say more clearly that it must be only UTF-8
- From: Kjell Ahlstedt <kjellahl src gnome org>
- To: commits-list gnome org
- Cc:
- Subject: [glibmm] Glib::ustring docs: Say more clearly that it must be only UTF-8
- Date: Wed, 5 Mar 2014 17:56:35 +0000 (UTC)
commit 1d5cad9272f074bfa0a8885f460ce0ef6d201f73
Author: Kjell Ahlstedt <kjell ahlstedt bredband net>
Date: Wed Mar 5 18:52:35 2014 +0100
Glib::ustring docs: Say more clearly that it must be only UTF-8
* glib/glibmm/ustring.h: Mention in the documentation of ustring and
ustring_Iterator that bad things can happen if the string does not contain
only valid UTF-8 data. Bug #681560.
glib/glibmm/ustring.h | 21 ++++++++++++++++++++-
1 files changed, 20 insertions(+), 1 deletions(-)
---
diff --git a/glib/glibmm/ustring.h b/glib/glibmm/ustring.h
index f0fa217..6e09ecd 100644
--- a/glib/glibmm/ustring.h
+++ b/glib/glibmm/ustring.h
@@ -84,6 +84,20 @@ struct IteratorTraits<const T*>
* the old UTF-8 character and the new one to write could be different.
* Therefore, any write operation would invalidate all other iterators
* pointing into the same string.
+ *
+ * The Glib::ustring iterated over must contain only valid UTF-8 data.
+ * If it does not, operator++(), operator-\-() and operator*() may make
+ * accesses outside the bounds of the string. A loop such as the following
+ * one would not stop at the end of the string.
+ * @code
+ * // Bad code! Don't do this!
+ * const char not_utf8[] = { '\x80', '\xef', '\x80', '\x80', '\xef', '\x80' };
+ * const Glib::ustring s(not_utf8, not_utf8 + sizeof not_utf8);
+ * for (Glib::ustring::const_iterator it = s.begin(); it != s.end(); ++it)
+ * std::cout << *it << std::endl;
+ * @endcode
+ *
+ * @tparam T std::string::iterator or std::string::const_iterator
*/
template <class T>
class ustring_Iterator
@@ -156,8 +170,13 @@ gunichar get_unichar_from_std_iterator(std::string::const_iterator pos) G_GNUC_P
* character, and <tt>std::string::length()</tt> returns the number of bytes
* rather than characters. So don't do that without a good reason.
* @par
+ * Many member functions and operators of Glib::ustring and Glib::ustring_Iterator
+ * assume that the string contains only valid UTF-8 data. If it does not, memory
+ * outside the bounds of the string can be accessed.
+ * @par
* In a perfect world the C++ Standard Library would contain a UTF-8 string
- * class. Unfortunately, the C++ standard doesn't mention UTF-8 at all. Note
+ * class. Unfortunately, the C++98 standard doesn't mention UTF-8 at all.
+ * C++11 has UTF-8 literals but no UTF-8 string class. Note
* that std::wstring is not a UTF-8 string class because it contains only
* fixed-width characters (where width could be 32, 16, or even 8 bits).
*
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]