Hi Jeff: Thanks a lot for your feedback! Am 22.10.17 16:54 schrieb(en) Jeffrey Stedfast via balsa-list:
I think, though, that proxying stuff back to the GMimeParser would get ugly quickly, so it probably does make more sense to have a callback on GMimeParserOptions like you've done.
Actually, I don't see a real use case for feeding much information back into the parser. IMO, the GMime use cases in this context are • parser for a MUA, as it is used now in Balsa, which works just perfectly • “message validator”, for which the patch can be used, either stand-alone or as extension for the MUA • /maybe/ a “message sanitiser” for re-writing the message to a sane format (see below)
I think your patch seems reasonable for a start. I would prefer "Callback" over "CB" to be consistent with glib/gtk naming conventions and maybe use the terms "Warn/Warning" instead of "Issue/Notify/Error", but those are minor nitpicks ;-)
Yes, will do… I also have to change the Eclipse formatter as to meet your coding style. Do you have guidelines, or even a Eclipse/uncrustify/indent/… rule set?
As far as the actual logic of the patch goes, I noticed that when a header exists already, you don't still add the header anyway. I couldn't tell at a glance whether you did the same with parameters, but I don't think that this patch should change behavior - it should only add a mechanism for alerting the application of potential issues that may arise when interpreting the MIME.
I basically removed the headers because GMime didn't scan the embedded multipart in my “evil” example (maybe it does after switching the order of the duplicated Content-Type headers). Of course, this should *not* be the standard behaviour! As I noted in my (broken) 1st message, it might be an idea to make it configurable through GMimeParserOptions, though: • By default, the extra headers are kept, and the only callback is invoked. • In “drop extra headers” mode, GMime could be used to re-write a suspicious message into a well-defined format (see use case #3 above; this is what some security appliances do). We know that we may lose some information in this case, but we also know *exactly* how it will be interpreted by the final MUA, so we can scan for other threats (like viruses).
Now, *perhaps*, the callback could return some sort of value to the inner workings of the parser logic that would determine if the header/parameter/etc should be dropped, but it shouldn't happen without application feedback because that would alter the message, breaking the work I did to make sure that parsing a message and then writing it back to a stream should produce an exact byte-for-byte copy of the original message stream by default.
The return value from the callback (a gboolean would be sufficient) is an interesting idea. However, as outlined above, the sanitiser is a /very/ special (and controversial, as it may actually change the contents which might even be illegal). Thus, thinking again about it, I guess it's better to omit this feature for the first approach. And it could be added later easily if someone *really* needs it.
The only gotchas that I can think of with regards to a return value from the callback is that the callback may need more context than what we can easily provide right now and that different issues/warnings may require different sets of return values (which may make some return values have undefined behavior depending on the context). We might need to try to figure out what sort of actions make sense for each warning condition and see if they are consistent.
I think it is possible to distinguish between stuff which is contrary to the intentions of MIME (like multiple contradictory Content-Types) and can be used for an attack, and minor standard violations which are otherwise harmless (like unencoded 8-bit chars or whitespace in RFC 2047 encoded-words). I.e. keep the latter and if requested sanitize the former. An error code would probably be sufficient to distinguish these cases.
(Note: what if you get multiple headers or parameters but want to only keep the last one found rather than the first?)
Good point. Hmmm…
Another thing that I just thought of is that I've noticed that some clients with include the same parameters 2 times, once encoded using rfc2231 and once using the (incorrect-but-widely-used-anyway) rfc2047 encoding. In the "valid" cases, these values will be identical, it's just that the sending client is trying to be "helpful" if the receiving client doesn't support one form of encoding or the other. To be honest, I'm not sure this is really all that helpful, but I digress ;-)
Also a good point. It's easy, though, to limit this feature to the “boundary” (and perhaps other?) parameter which shall not be present multiple times and may indicate an attack.
Anyway, I look forward to seeing your next patch. If you have a github account, using a PR would be really helpful too (if you don't have one, it's not a huge deal).
I don't have a Github account, but will get one if you prefer it. As I am far from understanding your code in detail, a review of any changes would be great (i.e. necessary…), though. Please don't expect a full-featured patch too early, as I mentioned, this is not the work I'm paid for (unfortunately)… Cheers, Albrecht.
Attachment:
pgpMbTXXYnVEb.pgp
Description: PGP signature