Re: possible deadlock on invalid UTF-8 data



Am Die, 2001-11-27 um 21.54 schrieb Havoc Pennington:
> 
> Daniel Elstner <daniel elstner gmx net> writes: 
> > Yes, but as long as the pointer is not dereferenced it should work. 
> > (Although ANSI C only guarantees that moving the pointer to a position
> > immediately after the last element will work, I consider failures when
> > moving it six bytes after the end very rare.)
> 
> How many next_char loops don't dereference the char?

Well, the special case I'm talking about is the gtkmm wrapper for UTF-8
encoded strings.  E.g. It makes use of g_utf8_pointer_to_offset() to
calculate the length of a string.  (It doesn't use g_utf8_strlen()
because the size in bytes is already known, and to make it possible to
work with strings which contain the '\0' character.)

Glib::ustring of course has a validate() method.  But that should be
called by the app programmer.

> > I absolutely agree with the policy.  But if we can easily avoid an
> > endless loop even in case the programmer makes an error, shouldn't we
> > try to do so?
> 
> On the other hand, the advantage of the endless loop (vs. reading
> invalid memory) is that the bug is immediately evident, and pretty
> easy to track down.

True.  Though, in most cases the ustring will be passed to a GTK+
function, which will in turn validate the string and print a warning.

However, the problem isn't really important to me.  It's just one thing
more we have to drum into the head of all gtkmm users:  If app users are
starting to report livelocks, it'll probably have something to do with
invalid UTF-8 strings. :-)

Cheers,
--Daniel




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]