From 966d1e0f9a8879e26a337214bcff7f330e07c0e8 Mon Sep 17 00:00:00 2001
From: Sebastian Wagner <sebix@sebix.at>
Date: Mon, 8 Jul 2024 15:05:08 +0200
Subject: [PATCH] improve documatation of mail collectors and csv parser

based on user feedback
---
 CHANGELOG.md      |  1 +
 docs/user/bots.md | 22 ++++++++++++++++++----
 2 files changed, 19 insertions(+), 4 deletions(-)
diff --git a/CHANGELOG.md b/CHANGELOG.md
index a507edd43..209bb1896 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -38,6 +38,7 @@
 - `intelmq.bots.outputs.smtp_batch.output`: Documentation on multiple recipients added (PR#2501 by Edvard Rejthar).
 
 ### Documentation
+- Bots: Clarify some section of Mail collectors and the Generic CSV Parser (PR#2510 by Sebastian Wagner).
 
 ### Packaging
 
diff --git a/docs/user/bots.md b/docs/user/bots.md
index b7a1e2653..21097cb56 100644
--- a/docs/user/bots.md
+++ b/docs/user/bots.md
@@ -350,6 +350,7 @@ the line) or not. Defaults to true.
 ### Generic Mail URL Fetcher <div id="intelmq.bots.collectors.mail.collector_mail_url" />
 
 Extracts URLs from e-mail messages and downloads the content from the URLs.
+It uses the [`imbox`](https://github.com/martinrusev/imbox) library.
 
 The resulting reports contain the following special fields:
 
@@ -360,6 +361,8 @@ The resulting reports contain the following special fields:
 - `extra.email_message_id`: The email's message ID.
 - `extra.file_name`: The file name of the downloaded file (extracted from the HTTP Response Headers if possible).
 
+The fields can be used by parsers to identify the feed and are not automatically passed on to events.
+
 **Chunking**
 
 For line-based inputs the bot can split up large reports into smaller chunks. This is particularly important for setups
@@ -392,6 +395,10 @@ limitation set `chunk_size` to something like 384000000 (~384 MB).
 
 (optional, boolean) Whether the mail server uses TLS or not. Defaults to true.
 
+**`mail_starttls`**
+
+(optional, boolean) Whether the mail server uses STARTTLS or not. Defaults to false.
+
 **`folder`**
 
 (optional, string) Folder in which to look for e-mail messages. Defaults to INBOX.
@@ -422,6 +429,7 @@ certificate is not found, the IMAP connection will fail on handshake. Defaults t
 ### Generic Mail Attachment Fetcher <div id="intelmq.bots.collectors.mail.collector_mail_attach" />
 
 This bot collects messages from mailboxes and downloads the attachments.
+It uses the [`imbox`](https://github.com/martinrusev/imbox) library.
 
 The resulting reports contains the following special fields:
 
@@ -432,6 +440,8 @@ The resulting reports contains the following special fields:
 - `extra.file_name`: The file name of the attachment or the file name in the attached archive if attachment is to
   uncompress.
 
+The fields can be used by parsers to identify the feed and are not automatically passed on to events.
+
 **Module:** `intelmq.bots.collectors.mail.collector_mail_attach`
 
 **Parameters (also expects [feed parameters](#feed-parameters)):**
@@ -442,7 +452,7 @@ The resulting reports contains the following special fields:
 
 **`mail_port`**
 
-(optional, integer) IMAP server port: 143 without TLS, 993 with TLS. Defaults to 143.
+(optional, integer) IMAP server port: 143 without TLS, 993 with TLS. Default depends on SSL setting.
 
 **`mail_user`**
 
@@ -456,6 +466,10 @@ The resulting reports contains the following special fields:
 
 (optional, boolean) Whether the mail server uses TLS or not. Defaults to true.
 
+**`mail_starttls`**
+
+(optional, boolean) Whether to use STARTTLS before authenticating to the server. Defaults to false.
+
 **`folder`**
 
 (optional, string) Folder in which to look for e-mail messages. Defaults to INBOX.
@@ -466,7 +480,7 @@ The resulting reports contains the following special fields:
 
 **`attach_regex`**
 
-(optional, string) Regular expression of the name of the attachment. Defaults to csv.zip.
+(optional, string) All attachments which match this [regular expression](https://docs.python.org/3/library/re.html#re.search) will be processed. Defaults to `csv.zip`.
 
 **`extract_files`**
 
@@ -1697,8 +1711,8 @@ available with their index.
 
 **`skip_header`**
 
-(optional, boolean/integer) Whether to skip the first N lines of the input (True -> 1, False -> 0). Lines starting
-with `#` will be skipped additionally, make sure you do not skip more lines than needed!
+(optional, boolean/integer) Whether to skip the first N lines of the input (true equals to 1, false requalis to 0). Lines starting
+with `#` will be skipped additionally, make sure you do not skip more lines than needed! Defaults to false/0.
 
 **`time_format`**