new patch attached I've unified the append_*() usage, switched to using memchr() for places that could benefit from it, and added a test-suite fwiw, I checked your example against mozilla-mail's implementation and it results in the same garbage as evolution (with my patch)... but yea, I don't know what to do about that. clearly your example should not be decoded, but users want the following to work too: Subject: =?ISO-8859-1?B?SWYgeW91IGNh biByZWFkIHRoaXMgeW8=?= so it seems we either follow the spec like we do now and get everything right (as defined by the way things are supposed to work according to the rfc) or we apply my patch and get some stuff wrong but make users quit their belly aching :) I'm OK with either - e.g. I'm not going to take it personal if this patch doesn't ever make it in. perhaps this patch could be put someplace for people to apply if they really feel strongly about communicating with their friend's broken mailers and leave it at that. I really don't like the idea of not handling valid cases in favour of handling broken ones Jeff On Mon, 2005-02-28 at 12:22 +0800, Not Zed wrote: > > So, how many valid mails does this break? > > e.g. > > =?iso-8859-1?b?foo this is a hidden message not for evolution users > bar?= > > > And why did you copy the append_ functions? There should be only one > of each in the code. This is a messy enough hack as it is. > > Instead of a test folder, any test data must be added to the > regression test we already have for rfc2047 decoding. > > BTW, using if (p = memchr(inptr, '=', end-start-2) && p[1] == '?') is > highly likely to be much more efficient than (while (inptr < inend-2 > && !strcmp("=?")) inptr++)) > > On Fri, 2005-02-25 at 13:47 -0500, Jeffrey Stedfast wrote: > > unfortunately, there exist a bountiful number of shitful mailers out > > there who's authors clearly couldn't be bothered to read or understand > > the MIME specifications and so just pulled an encoding scheme out of > > their proverbials. > > > > the attached patch tries to deal with the scenarios that I'm aware of, > > namely illegal characters in the encoded text portion of the > > encoded-word token (which, sadly, even includes SPACE and TAB) > > > > for the convenience of anyone reading this message who hasn't yet read > > rfc2047, here's a good quote from the end of section 2: > > > > IMPORTANT: 'encoded-word's are designed to be recognized as 'atom's > > by an RFC 822 parser. As a consequence, unencoded white space > > characters (such as SPACE and HTAB) are FORBIDDEN within an > > 'encoded-word'. For example, the character sequence > > > > =?iso-8859-1?q?this is some text?= > > > > would be parsed as four 'atom's, rather than as a single 'atom' (by > > an RFC 822 parser) or 'encoded-word' (by a parser which understands > > 'encoded-words'). The correct way to encode the string "this is some > > text" is to encode the SPACE characters as well, e.g. > > > > =?iso-8859-1?q?this=20is=20some=20text?= > > > > so yes, the behaviour of the broken mailers is explicitly FORBIDDEN but > > that hasn't stopped them. oh well. > > > > > > I've also attached a test mbox for everyone's convenience in testing > > this patch (feel free to add to it) > > > > With the patch applied, both Mozilla-Mail and Evolution render the > > subjects (and other headers) exactly the same afaict. > > -- Jeffrey Stedfast Evolution Hacker - Novell, Inc. fejj ximian com - www.novell.com
? broken-rfc2047.patch ? camel-mime-tables.c ? providers/imap4/imap4-XGWMOVE.patch ? providers/imap4/imap4.patch ? tests/message/test4 ? tests/misc/rfc2047 ? tests/misc/test2 ? tests/misc/url-scan Index: ChangeLog =================================================================== RCS file: /cvs/gnome/evolution-data-server/camel/ChangeLog,v retrieving revision 1.2432 diff -u -p -r1.2432 ChangeLog --- ChangeLog 25 Feb 2005 03:49:26 -0000 1.2432 +++ ChangeLog 28 Feb 2005 21:17:52 -0000 @@ -1,3 +1,13 @@ +2005-02-25 Jeffrey Stedfast <fejj novell com> + + * camel-mime-utils.c (quoted_decode): Allow spaces in the text we + are decoding. + (append_quoted_pair): Changed to take charset params and convert + un-quoted-pair'd strings to UTF-8. + (header_decode_text): Rewritten to work around broken rfc2047 + encoded-words sent by mailers who's authors couldn't be bothered + to read the specs. + 2005-02-24 Not Zed <NotZed Ximian com> ** See bug #68459 Index: camel-mime-utils.c =================================================================== RCS file: /cvs/gnome/evolution-data-server/camel/camel-mime-utils.c,v retrieving revision 1.223 diff -u -p -r1.223 camel-mime-utils.c --- camel-mime-utils.c 31 Jan 2005 06:56:28 -0000 1.223 +++ camel-mime-utils.c 28 Feb 2005 21:17:52 -0000 @@ -814,8 +814,12 @@ quoted_decode(const unsigned char *in, s *outptr++ = 0x20; } else if (c==' ' || c==0x09) { /* FIXME: this is an error! ignore for now ... */ +#if ADHERE_TO_SPEC ret = -1; break; +#else + *outptr++ = c; +#endif } else { *outptr++ = c; } @@ -915,7 +919,7 @@ rfc2047_decode_word(const char *in, size /* quick check to see if this could possibly be a real encoded word */ if (len < 8 || !(in[0] == '=' && in[1] == '?' && in[len-1] == '=' && in[len-2] == '?')) { - d(printf("invalid\n")); + d(printf("rfc2047_decode_word: invalid token\n")); return NULL; } @@ -1058,31 +1062,51 @@ append_8bit (GString *out, const char *i } -static GString * -append_quoted_pair (GString *str, const char *in, gssize inlen) +static void +append_atom (GString *str, const char *in, ssize_t inlen, const char *default_charset, const char *locale_charset) +{ + g_string_append_len (str, in, inlen); +} + +static void +append_text (GString *str, const char *in, ssize_t inlen, const char *default_charset, const char *locale_charset) +{ + if ((default_charset == NULL || !append_8bit (str, in, inlen, default_charset)) + && (locale_charset == NULL || !append_8bit (str, in, inlen, locale_charset))) + append_latin1 (str, in, inlen); +} + +static void +append_quoted_pair (GString *str, const char *in, ssize_t inlen, const char *default_charset, const char *locale_charset) { register const char *inptr = in; const char *inend = in + inlen; + GString *unquoted; char c; + unquoted = g_string_new (""); + while (inptr < inend) { c = *inptr++; if (c == '\\' && inptr < inend) - g_string_append_c (str, *inptr++); + g_string_append_c (unquoted, *inptr++); else - g_string_append_c (str, c); + g_string_append_c (unquoted, c); } - - return str; + + append_text (str, unquoted->str, unquoted->len, default_charset, locale_charset); + g_string_free (unquoted, TRUE); } +#ifdef ADHERE_TO_SPEC + /* decodes a simple text, rfc822 + rfc2047 */ static char * header_decode_text (const char *in, size_t inlen, int ctext, const char *default_charset) { GString *out; const char *inptr, *inend, *start, *chunk, *locale_charset; - GString *(* append) (GString *, const char *, gssize); + void (* append) (GString *, const char *, ssize_t, const char * const char *); char *dword = NULL; guint32 mask; @@ -1093,7 +1117,7 @@ header_decode_text (const char *in, size append = append_quoted_pair; } else { mask = (CAMEL_MIME_IS_LWSP); - append = g_string_append_len; + append = append_atom; } out = g_string_new (""); @@ -1107,10 +1131,10 @@ header_decode_text (const char *in, size inptr++; if (inptr == inend) { - append (out, start, inptr - start); + append (out, start, inptr - start, default_charset, locale_charset); break; } else if (dword == NULL) { - append (out, start, inptr - start); + append (out, start, inptr - start, default_charset, locale_charset); } else { chunk = start; } @@ -1127,9 +1151,7 @@ header_decode_text (const char *in, size if (!chunk) chunk = start; - if ((default_charset == NULL || !append_8bit (out, chunk, inptr-chunk, default_charset)) - && (locale_charset == NULL || !append_8bit(out, chunk, inptr-chunk, locale_charset))) - append_latin1(out, chunk, inptr-chunk); + append_text (out, chunk, inptr - chunk, default_charset, locale_charset); } chunk = NULL; @@ -1140,6 +1162,81 @@ header_decode_text (const char *in, size return dword; } + +#else /* ! ADHERE_TO_SPEC */ + +static char * +header_decode_text (const char *in, size_t inlen, int ctext, const char *default_charset) +{ + void (* append) (GString *, const char *, ssize_t, const char *, const char *); + const char *inptr, *inend, *start, *encword, *locale_charset; + char *dword = NULL; + GString *out; + + locale_charset = e_iconv_locale_charset (); + + if (ctext) + append = append_quoted_pair; + else + append = append_text; + + out = g_string_new (""); + inptr = in; + inend = inptr + inlen; + + while (inptr < inend) { + start = inptr; + + while (inptr < (inend - 8) && strncmp (inptr, "=?", 2) != 0) { + if (!camel_mime_is_lwsp (*inptr)) + dword = NULL; + inptr++; + } + + if (inptr >= (inend - 8)) { + append (out, start, inend - start, default_charset, locale_charset); + break; + } + + /* could be an encoded word (or a broken encoded word which is why this code is so damn hairy) */ + encword = inptr; + + inptr += 2; + inptr = memchr (inptr, '?', (inend - 4) - inptr); + if (inptr && (inptr[1] == 'B' || inptr[1] == 'b' || inptr[1] == 'Q' || inptr[1] == 'q') && inptr[2] == '?') { + /* looking more and more like an encoded word... */ + inptr += 3; + inptr = memchr (inptr, '?', (inend - 1) - inptr); + if (!(inptr && inptr[1] == '=')) + goto not_encword; + + if (!dword) + append (out, start, encword - start, default_charset, locale_charset); + + inptr += 2; + + if ((dword = rfc2047_decode_word (encword, inptr - encword))) { + g_string_append (out, dword); + g_free (dword); + } else { + append (out, encword, inptr - encword, default_charset, locale_charset); + } + } else { + /* not an encoded word */ + not_encword: + dword = NULL; + inptr = encword + 2; + + append (out, start, inptr - start, default_charset, locale_charset); + } + } + + dword = out->str; + g_string_free (out, FALSE); + + return dword; +} +#endif /* ADHERE_TO_SPEC */ char * camel_header_decode_string (const char *in, const char *default_charset) Index: tests/misc/Makefile.am =================================================================== RCS file: /cvs/gnome/evolution-data-server/camel/tests/misc/Makefile.am,v retrieving revision 1.13 diff -u -p -r1.13 Makefile.am --- tests/misc/Makefile.am 13 Dec 2004 03:08:52 -0000 1.13 +++ tests/misc/Makefile.am 28 Feb 2005 21:17:52 -0000 @@ -20,9 +20,10 @@ check_PROGRAMS = \ url-scan \ utf7 \ split \ + rfc2047 \ test2 -TESTS = url utf7 split url-scan test2 +TESTS = url utf7 split url-scan rfc2047 test2 Index: tests/misc/rfc2047.c =================================================================== RCS file: tests/misc/rfc2047.c diff -N tests/misc/rfc2047.c --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ tests/misc/rfc2047.c 28 Feb 2005 21:17:52 -0000 @@ -0,0 +1,96 @@ +/* -*- Mode: C; tab-width: 8; indent-tabs-mode: t; c-basic-offset: 8 -*- */ +/* + * Authors: Jeffrey Stedfast <fejj novell com> + * + * Copyright 2005 Novell, Inc. (www.novell.com) + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Street #330, Boston, MA 02111-1307, USA. + * + */ + + +#ifdef HAVE_CONFIG_H +#include <config.h> +#endif + +#include <ctype.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <camel/camel-object.h> +#include <camel/camel-mime-utils.h> + +#include "camel-test.h" + +struct { + const char *encoded; + const char *decoded; + int ctext; +} tests[] = { + /* the first half are rfc compliant cases (which are the most important) */ + { "=?iso-8859-1?q?this=20is=20some=20text?=", "this is some text", 0 }, + { "this =?iso-8859-1?q?is_some?= text", "this is some text", 0 }, + { "=?iso-8859-1?q?th?= =?iso-8859-1?q?is?= is some text", "this is some text", 0 }, + { "=?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?= =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=", + "If you can read this you understand the example.", 0 }, + + /* second half: brokenly encoded rfc2047 words */ + { "foo=?UTF-8?Q?bar?=baz", "foobarbaz", 0 }, + { "foo =?UTF-8?Q?bar?=baz", "foo barbaz", 0 }, + { "foo=?UTF-8?Q?bar?= baz", "foobar baz", 0 }, + { "=?UTF-8?Q?foo bar?=baz", "foo barbaz", 0 }, + { "foo=?UTF-8?Q?bar baz?=", "foobar baz", 0 }, + { "=?UTF-8?Q?foo?==?UTF-8?Q?bar baz?=", "foobar baz", 0 }, + { "=?UTF-8?Q?foo?= =?UTF-8?Q?bar baz?=", "foobar baz", 0 }, + { "=?UTF-8?Q?foo?= bar =?UTF-8?Q?baz?=", "foo bar baz", 0 }, + { "=?foo=?UTF-8?Q?bar baz?=", "=?foobar baz", 0 }, + { "=?foo?=?UTF-8?Q?bar baz?=", "=?foo?bar baz", 0 }, + { "=?foo?Q=?UTF-8?Q?bar baz?=", "=?foo?Qbar baz", 0 }, + { "=?foo?Q?=?UTF-8?Q?bar baz?=", "=?foo?Q?bar baz", 0 }, + { "=?UTF-8?Q?bar ? baz?=", "=?UTF-8?Q?bar ? baz?=", 0 }, + + /* ctext tests */ + { "Test of ctext (=?ISO-8859-1?Q?a?= =?ISO-8859-1?Q?b?=)", "Test of ctext (ab)", 1 }, + { "ctext with \\\"quoted\\\" pairs", "ctext with \"quoted\" pairs", 1 } +}; + +int main (int argc, char ** argv) +{ + char *decoded; + int i; + + camel_test_init (argc, argv); + + camel_test_start ("rfc2047 decoding"); + + for (i = 0; i < G_N_ELEMENTS (tests); i++) { + camel_test_push ("rfc2047 decoding[%d] '%s'", i, tests[i].encoded); + + if (tests[i].ctext) + decoded = camel_header_format_ctext (tests[i].encoded, "iso-8859-1"); + else + decoded = camel_header_decode_string (tests[i].encoded, "iso-8859-1"); + + check (decoded && strcmp (decoded, tests[i].decoded) == 0); + + g_free (decoded); + + camel_test_pull (); + } + + camel_test_end(); + + return 0; +}
Attachment:
smime.p7s
Description: S/MIME cryptographic signature