Re: [evolution-patches] patch to revert the fix for bug #42170 (fixes bug #46331)



Not Zed wrote:

>I don't agree with this patch.  The problem is the header is invalid
>anyway since it's got 8 bit data in it.
>

yea, but so was the other header :-)

>
>Having pango abort and display nothing on bad text is a bigger problem
>than a badly displayed, bad text.
>

I'll agree that we should not feed non-UTF-8 to etable so that we don't 
get an abort() in pango, but that doesn't mean that we have to sacrifice 
being able to handle raw iso-8859-1 headers (tho I agree they are 
invalid and I normally wouldn't feel bad about it not working, but the 
original patch was wrong...).

>
>I guess something is wrong with the other patch, maybe something is
>re-encoding utf8 twice or something, or his locale default is utf8, in
>which case it *was* doing the right thing ...  But something needs to be
>there to address 42170.
>

header_decode_word() was converting word tokens into UTF-8 and the code 
that was using header_decode_word() expected word-tokens. This code was 
later converting it to UTF-8 again with the header_decode_string() which 
is where it *should* be converted to UTF-8...

since lewing's original bug was just that local-part tokens of 
addr-spec's would be left in a non-UTF-8 state and thus break pango... I 
suggest that we just convert the addr->str to UTF-8 when we're done 
constructing it *or* we just replace 8bit chars with '?' in the 
addr-spec string.

so perhaps something like this:

if (!g_utf8_validate (addr->str, addr->len)) {
   unsigned char *ptr;
  
   ptr = addr->str;
   while (*ptr) {
      if (*ptr >= 128)
         *ptr = '?';
      ptr++;
   }
}


the only reason I suggest '?' (or '_' or 'x' or something) rather than 
trying to convert to UTF-8 is that addr-specs don't allow anything but 
us-ascii anyway, so the address is already invalid (100% guarentee that 
it is spam). Now, someone might argue that since hostnames in the future 
will be UTF-8, well... we've already got that covered - the token is 
*not* in UTF-8 and so again it is still invalid.

if people really want, we could try and convert to utf-8. I don't really 
care much one way or the other...just that I don't feel it is worth it.

Jeff

>
>
>On Fri, 2003-07-25 at 06:40, Jeffrey Stedfast wrote:
>  
>
>>I think a better approach to bug #42170 might be to check the parsed
>>addr->str when we're done and do charset conversions there instead? or
>>maybe just replace those invalid 8bit chars with a '?' or something?
>>it's 100% invalid... those can't be real email addresses, so no sense
>>trying to "make it work".
>>
>>Jeff
>>    
>>
>
>  
>




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]