Re: Faster UTF-8 decoding in GLib
- From: Daniel Elstner <daniel kitta googlemail com>
- To: Daniel Elstner <daniel kitta googlemail com>
- Cc: gtk-devel-list gnome org
- Subject: Re: Faster UTF-8 decoding in GLib
- Date: Fri, 26 Mar 2010 23:16:19 +0100
Hi again,
Am Freitag, den 26.03.2010, 22:43 +0100 schrieb Daniel Elstner:
> Am Freitag, den 26.03.2010, 13:25 -0400 schrieb Behdad Esfahbod:
> 
> >     * The construct borrowed from glibmm, as beautiful as it is, is WRONG for
> > 6-byte-long UTF-8.  It just doesn't work.  We historically support those
> > sequences.
> 
> What?  In what way exactly is it wrong?
OK, I just ran a test program and it works just fine for me.  Phew, you
scared me there for a moment. :)
--Daniel
#include <stdio.h>
unsigned int decode_utf8(const char* pos)
{
  unsigned int result = (unsigned char) *pos;
  if ((result & 0x80) != 0)
  {
    unsigned int mask = 0x40;
    do
    {
      const unsigned int c = (unsigned char) *++pos;
      result <<= 6;
      mask   <<= 5;
      result += c - 0x80;
      printf("result = %.8X, mask = %.8X\n", result, mask);
    }
    while ((result & mask) != 0);
    result &= mask - 1;
  }
  return result;
}
int main()
{
  printf("U+%.8X\n", decode_utf8("\xFD\xBF\xBF\xBF\xBF\xBE"));
  return 0;
}
[
Date Prev][
Date Next]   [
Thread Prev][
Thread Next]   
[
Thread Index]
[
Date Index]
[
Author Index]