Re: Faster UTF-8 decoding in GLib
- From: Daniel Elstner <daniel kitta googlemail com>
- To: Daniel Elstner <daniel kitta googlemail com>
- Cc: gtk-devel-list gnome org
- Subject: Re: Faster UTF-8 decoding in GLib
- Date: Fri, 26 Mar 2010 23:16:19 +0100
Hi again,
Am Freitag, den 26.03.2010, 22:43 +0100 schrieb Daniel Elstner:
> Am Freitag, den 26.03.2010, 13:25 -0400 schrieb Behdad Esfahbod:
>
> > * The construct borrowed from glibmm, as beautiful as it is, is WRONG for
> > 6-byte-long UTF-8. It just doesn't work. We historically support those
> > sequences.
>
> What? In what way exactly is it wrong?
OK, I just ran a test program and it works just fine for me. Phew, you
scared me there for a moment. :)
--Daniel
#include <stdio.h>
unsigned int decode_utf8(const char* pos)
{
unsigned int result = (unsigned char) *pos;
if ((result & 0x80) != 0)
{
unsigned int mask = 0x40;
do
{
const unsigned int c = (unsigned char) *++pos;
result <<= 6;
mask <<= 5;
result += c - 0x80;
printf("result = %.8X, mask = %.8X\n", result, mask);
}
while ((result & mask) != 0);
result &= mask - 1;
}
return result;
}
int main()
{
printf("U+%.8X\n", decode_utf8("\xFD\xBF\xBF\xBF\xBF\xBE"));
return 0;
}
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]