Re: possible deadlock on invalid UTF-8 data
- From: Daniel Elstner <daniel elstner gmx net>
- To: Owen Taylor <otaylor redhat com>
- Cc: Jon Trowbridge <trow ximian com>, Havoc Pennington <hp redhat com>, gtk-devel-list <gtk-devel-list gnome org>
- Subject: Re: possible deadlock on invalid UTF-8 data
- Date: 28 Nov 2001 17:30:17 +0100
Am Mit, 2001-11-28 um 00.57 schrieb Owen Taylor:
>
> Jon Trowbridge <trow ximian com> writes:
>
> > On Tue, 2001-11-27 at 14:54, Havoc Pennington wrote:
> > >
> > > On the other hand, the advantage of the endless loop (vs. reading
> > > invalid memory) is that the bug is immediately evident, and pretty
> > > easy to track down.
> >
> > Wouldn't it be even more immediately evident and even easier to track
> > down if it returned NULL or g_assert-ed or g_error-ed or something.
> >
> >
> > It seems pathological for a library to signal an error by deadlocking.
>
> #define g_utf8_next_char(p) (char *)((p) + g_utf8_skip[*(guchar *)(p)])
>
> g_utf8_next_char() turns out to be a very time critical operation;
> strings often get iterated over again and again, and checking each
> time for valid UTF-8 is a heavy penalty. You really need to check
> on input strings and not every time you process strings.
>
> I don't really have a strong preference on the deadlock versus
> continue incorrectly issue; note that the g_utf8_skip array is
> currently inconsistent on the issue - it has 1 for the 0x80-0xA0 range
> which isn't valid for the initial character, but 0 for 0xfe, 0xff.
>
> The tradeoff here is basically:
>
> - Easy to debug
>
> vs.
>
> - If encountered, hopefully continue working "well enough"
> to be minimally useful for the user.
>
> If I recall correctly, I originally had it 0 for the 0x80-0xA0 range
> as well and changed it to 1 on the theory that while a lockup
> is easier to debug for a developer, they can be _very_ confusing
> to a user, worse than a lockup.
That's what I think too. Especially if a warning will be printed by the
GTK+ functions anyway.
> Strings are validated at enough places that the chance of invalid
> UTF-8 not getting caught at all is low.
>
> So, on balance I think it's worth making the 0xfe, oxff entries
> correspond.
OK to check in the attached patch?
--Daniel
Index: ChangeLog
===================================================================
RCS file: /cvs/gnome/glib/ChangeLog,v
retrieving revision 1.947
diff -u -3 -r1.947 ChangeLog
--- ChangeLog 2001/11/27 23:30:07 1.947
+++ ChangeLog 2001/11/28 16:26:22
@@ -1,3 +1,8 @@
+2001-11-28 Daniel Elstner <daniel elstner gmx net>
+
+ * glib/gutf8.c: In order to avoid infinite loops on invalid UTF-8
+ strings, change the skip count for 0xfe and 0xff from 0 to 1.
+
2001-11-28 Tor Lillqvist <tml iki fi>
* glibconfig.h.win32.in: Add GLIB_SIZEOF_SIZE_T here, too.
Index: glib/gutf8.c
===================================================================
RCS file: /cvs/gnome/glib/glib/gutf8.c,v
retrieving revision 1.28
diff -u -3 -r1.28 gutf8.c
--- glib/gutf8.c 2001/09/27 02:49:05 1.28
+++ glib/gutf8.c 2001/11/28 16:26:25
@@ -109,7 +109,7 @@
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
- 3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,6,6,0,0
+ 3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,6,6,1,1
};
const gchar * const g_utf8_skip = utf8_skip_data;
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]