Re: Faster UTF-8 decoding in GLib



Hi again,

Am Freitag, den 26.03.2010, 22:43 +0100 schrieb Daniel Elstner:
> Am Freitag, den 26.03.2010, 13:25 -0400 schrieb Behdad Esfahbod:
> 
> >     * The construct borrowed from glibmm, as beautiful as it is, is WRONG for
> > 6-byte-long UTF-8.  It just doesn't work.  We historically support those
> > sequences.
> 
> What?  In what way exactly is it wrong?

OK, I just ran a test program and it works just fine for me.  Phew, you
scared me there for a moment. :)

--Daniel

#include <stdio.h>

unsigned int decode_utf8(const char* pos)
{
  unsigned int result = (unsigned char) *pos;

  if ((result & 0x80) != 0)
  {
    unsigned int mask = 0x40;

    do
    {
      const unsigned int c = (unsigned char) *++pos;

      result <<= 6;
      mask   <<= 5;
      result += c - 0x80;

      printf("result = %.8X, mask = %.8X\n", result, mask);
    }
    while ((result & mask) != 0);

    result &= mask - 1;
  }
  return result;
}

int main()
{
  printf("U+%.8X\n", decode_utf8("\xFD\xBF\xBF\xBF\xBF\xBE"));
  return 0;
}


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]