-
Notifications
You must be signed in to change notification settings - Fork 36
"Found the next address, delimited by a comma" #70
Comments
Seems like the parser is parsing mail addresses after decoding the header which is causing this sort of problem (unquoted commas in address headers are meant to split addresses). The parser should be fixed to parse the headers and then decode the names. |
Comma in the "displayName" was splitting 1 address into multiple, invalid ones. Fixes bertjohnson#70
The patch is wrong :-\ Don't decode the address until after you have completely parsed it and determined which part is the name and which part is the address and then you only want to decode the name portion. Also: yikes, that address parser code is very... brute force. It should really be token-based like the parser I wrote in MimeKit. Main loop for parsing address lists: https://github.com/jstedfast/MimeKit/blob/master/MimeKit/InternetAddressList.cs#L546 Logic for parsing an individual address (used by above loop): https://github.com/jstedfast/MimeKit/blob/master/MimeKit/InternetAddress.cs#L555 You'll also note that MimeKit's parser handles unquoted commas in the name portion of the address as well (see https://github.com/jstedfast/MimeKit/blob/master/MimeKit/InternetAddress.cs#L623). |
Keep in mind that the rfc2047 encoded-word token, once decoded, might contain |
For anyone that wants to try a different approach to my parser in MimeKit, you could always try parsing the address list in reverse. This approach probably helps simplify the parser logic a bit because parsing forward makes it difficult to know what the tokens belong to (is it the name token? or is it the local-part of an addr-spec? hard to know until I consume a few more tokens...). For example, consider the following BNF grammar:
Now consider the following email address: The first token you read will be If, however, you parse the address in reverse, things become a little simpler because you know immediately what to expect the next token to be a part of. It's been an approach that I've been meaning to try, but since my current parser in MimeKit works well, I haven't been able to summon the enthusiasm to rewrite it (as the old saying goes, "if it ain't broke, don't fix it"). |
It may be a good idea to test against all of the address examples in my unit tests as well: https://github.com/jstedfast/MimeKit/blob/master/UnitTests/InternetAddressListTests.cs For example, the patch here will likely break for addresses like this:
You can't just assume that all commas split addresses. |
With the newest commit (forget the first one) from the PR, but sadly tested only using Mozilla Thunderbird client, I tested out both encoded ( I gotta agree that the
I'm saw some emails encoded as |
Okay, cool. |
Ok. While this fixed the address issue, it introduced a new bug as apparently |
Above gets patched with e6cd1fc , hopefully without any further surprises. |
MailAddressCollection.Parse
fails to parse address when it contains comma in the display name. It splits single address into multiple ones which leads to having some invalid address(es) (first part of display name) and valid address with incorrect displayname.In the example above it's
From
address, so when I useMailMessage.From
I get the first, invalid, address from the 1st screenshot.The text was updated successfully, but these errors were encountered: