Re: Faster UTF-8 decoding in GLib



On 03/27/2010 06:57 PM, Daniel Elstner wrote:
> However, for other invalid conditions to result in defined behavior,
> explicit checks would be required in the code.  I see no reason to pay
> the cost for insufficient validation checks in light of the fact that
> the documentation explicitly states that the behavior is undefined if
> the input is not valid UTF-8.  It might be a different matter if it
> would write past the end of a buffer or something, but that's not the
> case here.

Well, there's a bit more to it.  Just because some bytes in a file are invalid
acording to the spec doesn't mean your text editor should refuse to open the
file.  While g_utf8_get_char() and friends do assume valid UTF-8 data, it's an
unwritten assumption that for invalid bytes they simply skip the byte and
return -1.  And I want to keep it that way and perhaps even document it.  I
think I use that in Pango IIRC.

Anyway, getting way off-topic here.

behdad


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]