Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug with eml attachments containing html content #106

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

wszostak
Copy link

Will fix bug introduced in v0.1.33 (see: 03e43b5)

Status

  • In Progress
  • Ready
  • In Hold - (Reason for hold)

Related Issues

No issue. Unable to create one in repository

Description

Bug introduced in v0.1.33 (see: 03e43b5)

Version v0.1.32 works.

Emails with attached another eml file containing body in html format are not properly parsed. There is no attachments and html content of attached file is set in original file

bug-example.eml file:

To: [email protected]
From: [email protected]
Subject: Your Subject
Date: 14 Jan 2025 12:00:00 +0000
Content-Type: multipart/mixed; boundary="000000000000e915c3062bcd115c"

--000000000000e915c3062bcd115c
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: base64

Email with attached another email

--000000000000e915c3062bcd115c
Content-Type: message/rfc822; name="original_message.eml"
Content-Disposition: attachment; filename="original_message.eml"
Content-Transfer-Encoding: 8bit
X-Attachment-Id: f0af9d461a78b41c_0.1

From: [email protected]
To: [email protected]
Date: 16 Jan 2025 05:31:24 +0000
Subject: =?utf-8?B?QXR0YWNoZWQgZW1haWwgc3ViamVjdA==?=
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: base64

PG1ldGEgaHR0cC1lcXVpdj0iQ29udGVudC1UeXBlIiBjb250ZW50PSJ0ZXh0L2h0b
Ww7IGNoYXJzZXQ9dXRmLTgiPg0KPHA+QXR0YWNoZWQgZW1haWwgSFRNTDwvcD4=
--000000000000e915c3062bcd115c--

Code:

email_parser = EmailParser(file_path="bug-example.eml", max_depth=2)
parsed_email = email_parser.parse()
print(json.dumps(parsed_email))

Buggy output (HTML is from attachment, no attached files in output):

{
  "To": "[email protected]",
  "CC": "",
  "BCC": "",
  "From": "[email protected]",
  "Subject": "Your Subject",
  "HTML": "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">\r\n<p>Attached email HTML</p>",
  "Text": "Email with attached another email",
  "HeadersMap": {
    "To": "[email protected]",
    "From": "[email protected]",
    "Subject": "Your Subject",
    "Date": "14 Jan 2025 12:00:00 +0000",
    "Content-Type": "multipart/mixed; boundary=\"000000000000e915c3062bcd115c\""
  },
  "Headers": [
    {
      "name": "To",
      "value": "[email protected]"
    },
    {
      "name": "From",
      "value": "[email protected]"
    },
    {
      "name": "Subject",
      "value": "Your Subject"
    },
    {
      "name": "Date",
      "value": "14 Jan 2025 12:00:00 +0000"
    },
    {
      "name": "Content-Type",
      "value": "multipart/mixed; boundary=\"000000000000e915c3062bcd115c\""
    }
  ],
  "Attachments": "",
  "AttachmentNames": [],
  "AttachmentsData": [],
  "Format": "multipart/mixed",
  "Depth": 0,
  "FileName": "bug-example.eml"
}

Expected output (properly parsed both original and attached files):

[
  {
    "To": "[email protected]",
    "CC": "",
    "BCC": "",
    "From": "[email protected]",
    "Subject": "Your Subject",
    "HTML": "",
    "Text": "Email with attached another email",
    "HeadersMap": {
      "To": "[email protected]",
      "From": "[email protected]",
      "Subject": "Your Subject",
      "Date": "14 Jan 2025 12:00:00 +0000",
      "Content-Type": "multipart/mixed; boundary=\"000000000000e915c3062bcd115c\""
    },
    "Headers": [
      {
        "name": "To",
        "value": "[email protected]"
      },
      {
        "name": "From",
        "value": "[email protected]"
      },
      {
        "name": "Subject",
        "value": "Your Subject"
      },
      {
        "name": "Date",
        "value": "14 Jan 2025 12:00:00 +0000"
      },
      {
        "name": "Content-Type",
        "value": "multipart/mixed; boundary=\"000000000000e915c3062bcd115c\""
      }
    ],
    "Attachments": "original_message.eml",
    "AttachmentNames": [
      "original_message.eml"
    ],
    "AttachmentsData": [
      {
        "Name": "original_message.eml",
        "Content-ID": null,
        "Content-Disposition": "attachment; filename=\"original_message.eml\"",
        "FileData": "From: [email protected]\nTo: [email protected]\nDate: 16 Jan 2025 05:31:24 +0000\nSubject: =?utf-8?B?QXR0YWNoZWQgZW1haWwgc3ViamVjdA==?=\nContent-Type: text/html; charset=\"utf-8\"\nContent-Transfer-Encoding: base64\n\nPG1ldGEgaHR0cC1lcXVpdj0iQ29udGVudC1UeXBlIiBjb250ZW50PSJ0ZXh0L2h0b\nWw7IGNoYXJzZXQ9dXRmLTgiPg0KPHA+QXR0YWNoZWQgZW1haWwgSFRNTDwvcD4="
      }
    ],
    "Format": "multipart/mixed",
    "Depth": 0,
    "FileName": "bug-example.eml"
  },
  {
    "To": "[email protected]",
    "CC": "",
    "BCC": "",
    "From": "[email protected]",
    "Subject": "Attached email subject",
    "HTML": "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">\r\n<p>Attached email HTML</p>",
    "Text": "",
    "HeadersMap": {
      "From": "[email protected]",
      "To": "[email protected]",
      "Date": "16 Jan 2025 05:31:24 +0000",
      "Subject": "Attached email subject",
      "Content-Type": "text/html; charset=\"utf-8\"",
      "Content-Transfer-Encoding": "base64"
    },
    "Headers": [
      {
        "name": "From",
        "value": "[email protected]"
      },
      {
        "name": "To",
        "value": "[email protected]"
      },
      {
        "name": "Date",
        "value": "16 Jan 2025 05:31:24 +0000"
      },
      {
        "name": "Subject",
        "value": "Attached email subject"
      },
      {
        "name": "Content-Type",
        "value": "text/html; charset=\"utf-8\""
      },
      {
        "name": "Content-Transfer-Encoding",
        "value": "base64"
      }
    ],
    "Attachments": "",
    "AttachmentNames": [],
    "AttachmentsData": [],
    "Format": "text/html",
    "Depth": 1,
    "FileName": "original_message.eml",
    "ParentFileName": "bug-example.eml"
  }
]

Screenshots

No screenshots. All information is in description.

Must have

  • Tests
  • Documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants