Re: possible deadlock on invalid UTF-8 data
- From: Owen Taylor <otaylor redhat com>
- To: Jon Trowbridge <trow ximian com>
- Cc: Havoc Pennington <hp redhat com>, Daniel Elstner <daniel elstner gmx net>, gtk-devel-list <gtk-devel-list gnome org>
- Subject: Re: possible deadlock on invalid UTF-8 data
- Date: 27 Nov 2001 18:57:29 -0500
Jon Trowbridge <trow ximian com> writes:
> On Tue, 2001-11-27 at 14:54, Havoc Pennington wrote:
> >
> > On the other hand, the advantage of the endless loop (vs. reading
> > invalid memory) is that the bug is immediately evident, and pretty
> > easy to track down.
>
> Wouldn't it be even more immediately evident and even easier to track
> down if it returned NULL or g_assert-ed or g_error-ed or something.
>
>
> It seems pathological for a library to signal an error by deadlocking.
#define g_utf8_next_char(p) (char *)((p) + g_utf8_skip[*(guchar *)(p)])
g_utf8_next_char() turns out to be a very time critical operation;
strings often get iterated over again and again, and checking each
time for valid UTF-8 is a heavy penalty. You really need to check
on input strings and not every time you process strings.
I don't really have a strong preference on the deadlock versus
continue incorrectly issue; note that the g_utf8_skip array is
currently inconsistent on the issue - it has 1 for the 0x80-0xA0 range
which isn't valid for the initial character, but 0 for 0xfe, 0xff.
The tradeoff here is basically:
- Easy to debug
vs.
- If encountered, hopefully continue working "well enough"
to be minimally useful for the user.
If I recall correctly, I originally had it 0 for the 0x80-0xA0 range
as well and changed it to 1 on the theory that while a lockup
is easier to debug for a developer, they can be _very_ confusing
to a user, worse than a lockup.
Strings are validated at enough places that the chance of invalid
UTF-8 not getting caught at all is low.
So, on balance I think it's worth making the 0xfe, oxff entries
correspond.
Regards,
Owen
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]