Re: [patch] treatment of wrongly encoded 8-bit messages
- From: Jeffrey Stedfast <fejj ximian com>
- To: Albrecht Dreß <albrecht dress arcor de>
- Cc: Pawel Salek <pawsa theochem kth se>,Balsa-Liste <balsa-list gnome org>
- Subject: Re: [patch] treatment of wrongly encoded 8-bit messages
- Date: 03 Apr 2003 14:35:53 -0500
if you guys want to play with black magic, feel free to check out
http://primates.ximian.com/~fejj/charset-foo.[c,h]
a little something I've been working on the past 2 nights to try and
auto-detect what charset a given stream of text is in... seems to work
okay. (don't expect iso-8859-8 or iso-8859-4 texts to be recognized yet
tho... and iso-8859-5 is also a bit sketchy - koi8-r seems to work well,
tho - and that is probably more important than iso-8859-5 for russian
anyway).
just so you know, these are the charsets it *attempts* to check for:
{ "iso-8859-1", 0x20 },
{ "iso-8859-2", 0x40 },
{ "iso-8859-4", 0x80 },
{ "iso-8859-5", 0x100 },
{ "iso-8859-7", 0x200 },
{ "iso-8859-8", 0x400 },
{ "iso-8859-9", 0x800 },
{ "iso-8859-13", 0x1000 },
{ "iso-8859-15", 0x2000 },
{ "windows-1251", 0x4000 },
{ "koi8-r", 0x8000 },
{ "koi8-u", 0x10000 },
{ "shift-jis", 0x20000 },
{ "gb2312", 0x40000 },
{ "euc-jp", 0x80000 },
{ "euc-kr", 0x100000 },
{ "euc-tw", 0x200000 },
{ "big5", 0x400000 },
hmmm, I should remove euc-tw... don't have any samples for that and it
is very uncommon anyway.
btw, if any of you have text documents in any of those charsets (in
particular -4, -5, -8 and shift-jis since I am severely lacking in those
departments currently), feel free to send them to me so I can improve
support for detecting those charsets.
(note: make sure they contain nothing personal)
Jeff
On Thu, 2003-04-03 at 13:48, Albrecht Dreß wrote:
> Am 03.04.03 11:08 schrieb(en) Pawel Salek:
> > When I try to reply such a misformatted message, I get loads of
> >
> > (balsa:9949): Gtk-CRITICAL **: file gtktextbuffer.c: line 543
> > (gtk_text_buffer_emit_insert): assertion `g_utf8_validate (text, len,
> > NULL)' failed
>
> Ooops... As the replacement of badly encoded chars was moved out of
> libmutt, content2reply now gets the wrong stream with bad chars... An
> extra libbalsa_utf_sanitize() fixes it (see below). The same problem
> occurs with printing, btw (also fixed below).
>
> The patch also removes an extra paranoia check in the gpg stuff, which is
> a result of the wonders of copy & paste, but completely silly at that
> point.
>
> Sorry for the chaos,
>
> Cheers,
>
> Albrecht.
--
Jeffrey Stedfast
Evolution Hacker - Ximian, Inc.
fejj@ximian.com - www.ximian.com
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]