Re: Faster UTF-8 decoding in GLib
- From: Behdad Esfahbod <behdad behdad org>
- To: Daniel Elstner <daniel kitta googlemail com>
- Cc: gtk-devel-list gnome org
- Subject: Re: Faster UTF-8 decoding in GLib
- Date: Sun, 28 Mar 2010 16:34:10 -0400
On 03/27/2010 06:57 PM, Daniel Elstner wrote:
> However, for other invalid conditions to result in defined behavior,
> explicit checks would be required in the code. I see no reason to pay
> the cost for insufficient validation checks in light of the fact that
> the documentation explicitly states that the behavior is undefined if
> the input is not valid UTF-8. It might be a different matter if it
> would write past the end of a buffer or something, but that's not the
> case here.
Well, there's a bit more to it. Just because some bytes in a file are invalid
acording to the spec doesn't mean your text editor should refuse to open the
file. While g_utf8_get_char() and friends do assume valid UTF-8 data, it's an
unwritten assumption that for invalid bytes they simply skip the byte and
return -1. And I want to keep it that way and perhaps even document it. I
think I use that in Pango IIRC.
Anyway, getting way off-topic here.
behdad
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]