From 966d1e0f9a8879e26a337214bcff7f330e07c0e8 Mon Sep 17 00:00:00 2001 From: Sebastian Wagner Date: Mon, 8 Jul 2024 15:05:08 +0200 Subject: [PATCH] improve documatation of mail collectors and csv parser based on user feedback --- CHANGELOG.md | 1 + docs/user/bots.md | 22 ++++++++++++++++++---- 2 files changed, 19 insertions(+), 4 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index a507edd43..209bb1896 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -38,6 +38,7 @@ - `intelmq.bots.outputs.smtp_batch.output`: Documentation on multiple recipients added (PR#2501 by Edvard Rejthar). ### Documentation +- Bots: Clarify some section of Mail collectors and the Generic CSV Parser (PR#2510 by Sebastian Wagner). ### Packaging diff --git a/docs/user/bots.md b/docs/user/bots.md index b7a1e2653..21097cb56 100644 --- a/docs/user/bots.md +++ b/docs/user/bots.md @@ -350,6 +350,7 @@ the line) or not. Defaults to true. ### Generic Mail URL Fetcher
Extracts URLs from e-mail messages and downloads the content from the URLs. +It uses the [`imbox`](https://github.com/martinrusev/imbox) library. The resulting reports contain the following special fields: @@ -360,6 +361,8 @@ The resulting reports contain the following special fields: - `extra.email_message_id`: The email's message ID. - `extra.file_name`: The file name of the downloaded file (extracted from the HTTP Response Headers if possible). +The fields can be used by parsers to identify the feed and are not automatically passed on to events. + **Chunking** For line-based inputs the bot can split up large reports into smaller chunks. This is particularly important for setups @@ -392,6 +395,10 @@ limitation set `chunk_size` to something like 384000000 (~384 MB). (optional, boolean) Whether the mail server uses TLS or not. Defaults to true. +**`mail_starttls`** + +(optional, boolean) Whether the mail server uses STARTTLS or not. Defaults to false. + **`folder`** (optional, string) Folder in which to look for e-mail messages. Defaults to INBOX. @@ -422,6 +429,7 @@ certificate is not found, the IMAP connection will fail on handshake. Defaults t ### Generic Mail Attachment Fetcher
This bot collects messages from mailboxes and downloads the attachments. +It uses the [`imbox`](https://github.com/martinrusev/imbox) library. The resulting reports contains the following special fields: @@ -432,6 +440,8 @@ The resulting reports contains the following special fields: - `extra.file_name`: The file name of the attachment or the file name in the attached archive if attachment is to uncompress. +The fields can be used by parsers to identify the feed and are not automatically passed on to events. + **Module:** `intelmq.bots.collectors.mail.collector_mail_attach` **Parameters (also expects [feed parameters](#feed-parameters)):** @@ -442,7 +452,7 @@ The resulting reports contains the following special fields: **`mail_port`** -(optional, integer) IMAP server port: 143 without TLS, 993 with TLS. Defaults to 143. +(optional, integer) IMAP server port: 143 without TLS, 993 with TLS. Default depends on SSL setting. **`mail_user`** @@ -456,6 +466,10 @@ The resulting reports contains the following special fields: (optional, boolean) Whether the mail server uses TLS or not. Defaults to true. +**`mail_starttls`** + +(optional, boolean) Whether to use STARTTLS before authenticating to the server. Defaults to false. + **`folder`** (optional, string) Folder in which to look for e-mail messages. Defaults to INBOX. @@ -466,7 +480,7 @@ The resulting reports contains the following special fields: **`attach_regex`** -(optional, string) Regular expression of the name of the attachment. Defaults to csv.zip. +(optional, string) All attachments which match this [regular expression](https://docs.python.org/3/library/re.html#re.search) will be processed. Defaults to `csv.zip`. **`extract_files`** @@ -1697,8 +1711,8 @@ available with their index. **`skip_header`** -(optional, boolean/integer) Whether to skip the first N lines of the input (True -> 1, False -> 0). Lines starting -with `#` will be skipped additionally, make sure you do not skip more lines than needed! +(optional, boolean/integer) Whether to skip the first N lines of the input (true equals to 1, false requalis to 0). Lines starting +with `#` will be skipped additionally, make sure you do not skip more lines than needed! Defaults to false/0. **`time_format`**