Re: possible deadlock on invalid UTF-8 data

From: Daniel Elstner <daniel elstner gmx net>
To: Owen Taylor <otaylor redhat com>
Cc: Jon Trowbridge <trow ximian com>, Havoc Pennington <hp redhat com>, gtk-devel-list <gtk-devel-list gnome org>
Subject: Re: possible deadlock on invalid UTF-8 data
Date: 28 Nov 2001 17:30:17 +0100

Am Mit, 2001-11-28 um 00.57 schrieb Owen Taylor:
> 
> Jon Trowbridge <trow ximian com> writes:
> 
> > On Tue, 2001-11-27 at 14:54, Havoc Pennington wrote:
> > >
> > > On the other hand, the advantage of the endless loop (vs. reading
> > > invalid memory) is that the bug is immediately evident, and pretty
> > > easy to track down.
> > 
> > Wouldn't it be even more immediately evident and even easier to track
> > down if it returned NULL or g_assert-ed or g_error-ed or something.
> > 
> > 
> > It seems pathological for a library to signal an error by deadlocking.
> 
> #define g_utf8_next_char(p) (char *)((p) + g_utf8_skip[*(guchar *)(p)])
> 
> g_utf8_next_char() turns out to be a very time critical operation;
> strings often get iterated over again and again, and checking each
> time for valid UTF-8 is a heavy penalty. You really need to check
> on input strings and not every time you process strings.
> 
> I don't really have a strong preference on the deadlock versus
> continue incorrectly issue; note that the g_utf8_skip array is
> currently inconsistent on the issue - it has 1 for the 0x80-0xA0 range
> which isn't valid for the initial character, but 0 for 0xfe, 0xff.
> 
> The tradeoff here is basically:
> 
>  - Easy to debug
> 
> vs.
> 
>  - If encountered, hopefully continue working "well enough"
>    to be minimally useful for the user.
> 
> If I recall correctly, I originally had it 0 for the 0x80-0xA0 range
> as well and changed it to 1 on the theory that while a lockup 
> is easier to debug for a developer, they can be _very_ confusing
> to a user, worse than a lockup. 

That's what I think too.  Especially if a warning will be printed by the
GTK+ functions anyway.

> Strings are validated at enough places that the chance of invalid
> UTF-8 not getting caught at all is low.
> 
> So, on balance I think it's worth making the 0xfe, oxff entries
> correspond.

OK to check in the attached patch?

--Daniel

Index: ChangeLog
===================================================================
RCS file: /cvs/gnome/glib/ChangeLog,v
retrieving revision 1.947
diff -u -3 -r1.947 ChangeLog
--- ChangeLog	2001/11/27 23:30:07	1.947
+++ ChangeLog	2001/11/28 16:26:22
@@ -1,3 +1,8 @@
+2001-11-28  Daniel Elstner  <daniel elstner gmx net>
+
+	* glib/gutf8.c: In order to avoid infinite loops on invalid UTF-8
+	strings, change the skip count for 0xfe and 0xff from 0 to 1.
+
 2001-11-28  Tor Lillqvist  <tml iki fi>
 
 	* glibconfig.h.win32.in: Add GLIB_SIZEOF_SIZE_T here, too.
Index: glib/gutf8.c
===================================================================
RCS file: /cvs/gnome/glib/glib/gutf8.c,v
retrieving revision 1.28
diff -u -3 -r1.28 gutf8.c
--- glib/gutf8.c	2001/09/27 02:49:05	1.28
+++ glib/gutf8.c	2001/11/28 16:26:25
@@ -109,7 +109,7 @@
   1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
   1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
   2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
-  3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,6,6,0,0
+  3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,6,6,1,1
 };
 
 const gchar * const g_utf8_skip = utf8_skip_data;

Follow-Ups:
- Re: possible deadlock on invalid UTF-8 data
  - From: Owen Taylor

References:
- possible deadlock on invalid UTF-8 data
  - From: Daniel Elstner
- Re: possible deadlock on invalid UTF-8 data
  - From: Havoc Pennington
- Re: possible deadlock on invalid UTF-8 data
  - From: Daniel Elstner
- Re: possible deadlock on invalid UTF-8 data
  - From: Havoc Pennington
- Re: possible deadlock on invalid UTF-8 data
  - From: Jon Trowbridge
- Re: possible deadlock on invalid UTF-8 data
  - From: Owen Taylor

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]