Skip to content

Data structures: Emails and Mailboxes

Jack Dodds edited this page Jan 4, 2021 · 8 revisions

This page is based on reading code from commit c9bd001 dated 2020-11-11 and files generated by it. There may errors or omissions!

The Mailpile instance on which the page is based receives incoming emails from three IMAP mail sources. It also includes old emails imported one time, when the instance was created, from a Thunderbird Local Folders mbox directory structure. The mail received from all the sources is stored in the local Mailpile directory. The Mailpile code supports the retrieval of email using other types of mail sources and mailboxes (see code at mailpile/mail-source and mailpile/mailboxes). This page does not document these additional types of data structures.

External format - Emails

Email messages are stored in RFC822 format, one message per file, in a directory heirarchy under subdirectory mail of the homedir. Similarly to most Mailpile files, the RFC822 plaintext of each file is optionally encrypted.

A message file name consists of a number (10 digit, hex, lower case), sometimes followed by the suffix !2,s. The 10 digit number is unique in the mail directory tree and is referenced in the message location pointer (field 2) of the Metadata Index record for the message..

When the suffix !2,s is present there may be two files, sometimes but not always identical, with different numbers in their file names, both file names being listed (without suffix) in the same metadata index entry.

External format - Mailboxes

A mailbox is represented by a first level subdirectory of subdirectory mail in the Mailpile homedir. The subdirectory name consists of a number (5 digits, hex, lower case). Each first level subdirectory in turn contain subdirectories cur, new and tmp, which contain the email message files.

The first level subdirectories also contain a file wervd.ver which in this version of the software always contains "0" (see mailpile/mailboxes/wervd.pyc and wiki page WERVD Storage. Also, mail contains first level subdirectories cur, new and tmp which appear to be unused.

Mailboxes are defined by entries in mailpile.cfg. Each account has a 12 digit identifier. Each mailbox in the account has an entry config/sources/[account id]/mailbox/[mailbox id]. The [mailbox id] is 4 characters from the set (0..9,a..z) and is used to identify the mail box in the message location pointer (field 2) of the metadata index. The mailpile.cfg entry contains a parameter local which is the virtual file system path /Mailpile$/mail/[subdirectory] to the mailbox in the subdirectory mail of the homedir; a parameter name which is the user's name for the mailbox; and a parameter path. The path is a reference to another mailpile.cfg entry that identifies a mail source (e.g. for mailboxes receiving emails from an IMAP server) or a file path from which the mailbox was imported (e.g. in the case of a Thunderbird mbox file).

Mailboxes are also represented by pickled data structures in files in the homedir with file names pickled-mailbox.[mailbox id]. The first line of these files indicate that the data structures are of class mailpile.mailboxes.wervd.MailpileMailbox.These files contain a list of emails in the mailbox, in the form of their file names in the mail subdirectory hierarchy. Attributes of the pickled object contain absolute paths to the mail subdirectory associated with the mailbox and its cur, new and tmp subdirectories.

Internal format - Mailboxes

There are multiple classes called MailpileMailbox. Some relate to POP3 mail sources (not documented here) or sources identified as "obsolete, handled as local" in comments in "defaults.py". Class mailboxes.maildir.MailpileMailbox is used only relative to these POP3 or obsolete mail sources.

This leaves class mailboxes.wervd.MailpileMailbox which is derived from the Python library class mailbox.Maildir using the factory class mailpile.mailboxes.UnorderedPicklable.

Clone this wiki locally