Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shadowserver dynamic config #2372

Merged
merged 75 commits into from
Dec 18, 2023
Merged
Show file tree
Hide file tree
Changes from 67 commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
19a1972
remove obsolete tests and data
elsif2 Apr 11, 2023
2dec6ec
remove json parser - csv provides better performance
elsif2 Apr 11, 2023
0454961
dynamic configuration model
elsif2 Apr 11, 2023
0a39e0d
revised tests
elsif2 Apr 12, 2023
94b22fb
Updated to reset report type on reload #2361
elsif2 May 8, 2023
b2f9bc3
Added schema download on startup and additional logging
elsif2 May 23, 2023
d5cf063
Added version support to the schema update function.
elsif2 May 23, 2023
1e6ea89
Documentation and style updates.
elsif2 May 28, 2023
fc3f5b0
Added schema.json.test.license.
elsif2 May 30, 2023
b996e0e
Updates in response to feedback.
elsif2 Jul 27, 2023
661a964
Removed file_format parameter
elsif2 Jul 28, 2023
a045bee
Minor changes based on feedback 2023-08-24
elsif2 Aug 24, 2023
0660a89
Added VAR_STATE_PATH check.
elsif2 Aug 24, 2023
33370cf
Changes based on feedback 2023-08-25.
elsif2 Aug 25, 2023
bd76ab7
Added INTELMQ_SKIP_INTERNET check
elsif2 Aug 25, 2023
6e5e110
Added debug logging for CI test.
elsif2 Aug 25, 2023
01dcd5e
Refactored test_download_schema to utilize mocking.
elsif2 Aug 25, 2023
9314c84
Added docstring for test_update_schema().
elsif2 Aug 28, 2023
2f11b2a
Removed logging output.
elsif2 Aug 29, 2023
46f2ca7
Removed the assertion regarding report fields.
elsif2 Aug 31, 2023
d0311e0
remove obsolete tests and data
elsif2 Apr 11, 2023
b5416c7
remove json parser - csv provides better performance
elsif2 Apr 11, 2023
876a414
dynamic configuration model
elsif2 Apr 11, 2023
b917a94
revised tests
elsif2 Apr 12, 2023
eafa15b
Updated to reset report type on reload #2361
elsif2 May 8, 2023
b2753cb
Added schema download on startup and additional logging
elsif2 May 23, 2023
fd0a8fd
Added version support to the schema update function.
elsif2 May 23, 2023
357aad5
Documentation and style updates.
elsif2 May 28, 2023
37c6745
Added schema.json.test.license.
elsif2 May 30, 2023
ee8ce87
Updates in response to feedback.
elsif2 Jul 27, 2023
4a73f0b
Removed file_format parameter
elsif2 Jul 28, 2023
e413fb5
Minor changes based on feedback 2023-08-24
elsif2 Aug 24, 2023
df6e622
Added VAR_STATE_PATH check.
elsif2 Aug 24, 2023
9195213
Changes based on feedback 2023-08-25.
elsif2 Aug 25, 2023
cc48565
Added INTELMQ_SKIP_INTERNET check
elsif2 Aug 25, 2023
16daee4
Added debug logging for CI test.
elsif2 Aug 25, 2023
f102f2c
Refactored test_download_schema to utilize mocking.
elsif2 Aug 25, 2023
b103282
Added docstring for test_update_schema().
elsif2 Aug 28, 2023
356b956
Removed logging output.
elsif2 Aug 29, 2023
c72d553
Removed the assertion regarding report fields.
elsif2 Aug 31, 2023
3b60c2f
Skip and log a warning message for fields not in the IDF.
elsif2 Oct 16, 2023
28d306d
Merge branch 'shadowserver-dynamic-config' of https://github.com/cert…
elsif2 Oct 16, 2023
afce131
Merge branch 'develop' into shadowserver-dynamic-config
elsif2 Oct 24, 2023
473f6a6
remove obsolete tests and data
elsif2 Apr 11, 2023
a33fa64
remove json parser - csv provides better performance
elsif2 Apr 11, 2023
cd3338a
dynamic configuration model
elsif2 Apr 11, 2023
b081509
revised tests
elsif2 Apr 12, 2023
c6108d6
Updated to reset report type on reload #2361
elsif2 May 8, 2023
308ec67
Added schema download on startup and additional logging
elsif2 May 23, 2023
9ecf366
Added version support to the schema update function.
elsif2 May 23, 2023
9c4a1a4
Documentation and style updates.
elsif2 May 28, 2023
e4f9ac4
Added schema.json.test.license.
elsif2 May 30, 2023
460344f
Updates in response to feedback.
elsif2 Jul 27, 2023
fec1fd2
Removed file_format parameter
elsif2 Jul 28, 2023
fe2a37c
Minor changes based on feedback 2023-08-24
elsif2 Aug 24, 2023
ec066ce
Added VAR_STATE_PATH check.
elsif2 Aug 24, 2023
d1427f3
Changes based on feedback 2023-08-25.
elsif2 Aug 25, 2023
ae54e7c
Added INTELMQ_SKIP_INTERNET check
elsif2 Aug 25, 2023
e4e5063
Added debug logging for CI test.
elsif2 Aug 25, 2023
1280482
Refactored test_download_schema to utilize mocking.
elsif2 Aug 25, 2023
2a60d2e
Added docstring for test_update_schema().
elsif2 Aug 28, 2023
e401e2c
Removed logging output.
elsif2 Aug 29, 2023
66ae9f5
Removed the assertion regarding report fields.
elsif2 Aug 31, 2023
e04dfee
Skip and log a warning message for fields not in the IDF.
elsif2 Oct 16, 2023
6f23883
Updated convert_http_host_and_url and added category_or_detail.
elsif2 Oct 31, 2023
606fc10
Merge branch 'shadowserver-dynamic-config' of https://github.com/cert…
elsif2 Oct 31, 2023
a0b34cb
Avoid exception when a conversion function is not available in the cu…
elsif2 Oct 31, 2023
61c756d
Added exception for missing schema and added intelmq user to the cron…
elsif2 Nov 4, 2023
a3a3aee
Merge branch 'shadowserver-dynamic-config' into develop
elsif2 Nov 13, 2023
307386d
Documentation update.
elsif2 Nov 13, 2023
ac04471
Removed old unsorted doc and updated the taxonomy functions for the s…
elsif2 Nov 16, 2023
04c63a4
Merge branch 'develop' into shadowserver-dynamic-config
elsif2 Nov 16, 2023
7a7a6a6
Merge branch 'develop' into shadowserver-dynamic-config
kamil-certat Nov 27, 2023
0c0cb68
Merge branch 'develop' into shadowserver-dynamic-config
elsif2 Dec 12, 2023
4743ba9
Merge branch 'develop' into shadowserver-dynamic-config
aaronkaplan Dec 18, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,11 +123,13 @@ CHANGELOG
- added support for `Subject NOT LIKE` queries,
- added support for multiple values in ticket subject queries.
- `intelmq.bots.collectors.rsync`: Support for optional private key, relative time parsing for the source path, extra rsync parameters and strict host key checking (PR#2241 by Mateo Durante).
- `intelmq.bots.collectors.shadowserver.collector_reports_api`:
- The 'json' option is no longer supported as the 'csv' option provides better performance.

#### Parsers
- `intelmq.bots.parsers.shadowserver._config`:
- Reset detected `feedname` at shutdown to re-detect the feedname on reloads (PR#2361 by @elsif2, fixes #2360).
- `intelmq.bots.parsers.shadowserver._config`:
- Switch to dynamic configuration to decouple report schema changes from IntelMQ releases.
Comment on lines +159 to +165
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changelog entry is in the wrong section (3.2.0) instead of 3.2.2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be fixed with #2447

- Added 'IPv6-Vulnerable-Exchange' alias and 'Accessible-WS-Discovery-Service' report. (PR#2338)
- Removed unused `p0f_genre` and `p0f_detail` from the 'DNS-Open-Resolvers' report. (PR#2338)
- Added 'Accessible-SIP' report. (PR#2348)
Expand Down
169 changes: 56 additions & 113 deletions docs/user/bots.rst
Original file line number Diff line number Diff line change
Expand Up @@ -673,6 +673,23 @@ The resulting reports contain the following special field:

* `extra.file_name`: The name of the downloaded file, with fixed filename extension. The API returns file names with the extension `.csv`, although the files are JSON, not CSV. Therefore, for clarity and better error detection in the parser, the file name in `extra.file_name` uses `.json` as extension.

**Sample configuration**

.. code-block:: yaml

shadowserver-collector:
description: Our bot responsible for getting reports from Shadowserver
enabled: true
group: Collector
module: intelmq.bots.collectors.shadowserver.collector_reports_api
name: Shadowserver_Collector
parameters:
destination_queues:
_default: [shadowserver-parser-queue]
file_format: csv
api_key: "$API_KEY_received_from_the_shadowserver_foundation"
secret: "$SECRET_received_from_the_shadowserver_foundation"
run_mode: continuous

.. _intelmq.bots.collectors.shodan.collector_stream:

Expand Down Expand Up @@ -1557,17 +1574,15 @@ This does not affect URLs which already include the scheme.


.. _intelmq.bots.parsers.shadowserver.parser:
.. _intelmq.bots.parsers.shadowserver.parser_json:

Shadowserver
^^^^^^^^^^^^

There are two Shadowserver parsers, one for data in ``CSV`` format (``intelmq.bots.parsers.shadowserver.parser``) and one for data in ``JSON`` format (``intelmq.bots.parsers.shadowserver.parser_json``).
The latter was added in IntelMQ 2.3 and is meant to be used together with the Shadowserver API collector.
The Shadowserver parser operates on ``CSV`` formatted data.

**Information**

* `name:` `intelmq.bots.parsers.shadowserver.parser` (for CSV data) or `intelmq.bots.parsers.shadowserver.parser_json` (for JSON data)
* `name:` `intelmq.bots.parsers.shadowserver.parser`
* `public:` yes
* `description:` Parses different reports from Shadowserver.

Expand Down Expand Up @@ -1603,107 +1618,43 @@ A list of possible feeds can be found in the table below in the column "feed nam

**Supported reports**

These are the supported feed name and their corresponding file name for automatic detection:

======================================= =========================
feed name file name
======================================= =========================
Accessible-ADB `scan_adb`
Accessible-AFP `scan_afp`
Accessible-AMQP `scan_amqp`
Accessible-ARD `scan_ard`
Accessible-Cisco-Smart-Install `cisco_smart_install`
Accessible-CoAP `scan_coap`
Accessible-CWMP `scan_cwmp`
Accessible-MS-RDPEUDP `scan_msrdpeudp`
Accessible-FTP `scan_ftp`
Accessible-Hadoop `scan_hadoop`
Accessible-HTTP `scan_http`
Accessible-Radmin `scan_radmin`
Accessible-RDP `scan_rdp`
Accessible-Rsync `scan_rsync`
Accessible-SMB `scan_smb`
Accessible-Telnet `scan_telnet`
Accessible-Ubiquiti-Discovery-Service `scan_ubiquiti`
Accessible-VNC `scan_vnc`
Blacklisted-IP (deprecated) `blacklist`
Blocklist `blocklist`
Compromised-Website `compromised_website`
Device-Identification IPv4 / IPv6 `device_id`/`device_id6`
DNS-Open-Resolvers `scan_dns`
Honeypot-Amplification-DDoS-Events `event4_honeypot_ddos_amp`
Honeypot-Brute-Force-Events `event4_honeypot_brute_force`
Honeypot-Darknet `event4_honeypot_darknet`
Honeypot-HTTP-Scan `event4_honeypot_http_scan`
HTTP-Scanners `hp_http_scan`
ICS-Scanners `hp_ics_scan`
IP-Spoofer-Events `event4_ip_spoofer`
Microsoft-Sinkhole-Events IPv4 `event4_microsoft_sinkhole`
Microsoft-Sinkhole-Events-HTTP IPv4 `event4_microsoft_sinkhole_http`
NTP-Monitor `scan_ntpmonitor`
NTP-Version `scan_ntp`
Open-Chargen `scan_chargen`
Open-DB2-Discovery-Service `scan_db2`
Open-Elasticsearch `scan_elasticsearch`
Open-IPMI `scan_ipmi`
Open-IPP `scan_ipp`
Open-LDAP `scan_ldap`
Open-LDAP-TCP `scan_ldap_tcp`
Open-mDNS `scan_mdns`
Open-Memcached `scan_memcached`
Open-MongoDB `scan_mongodb`
Open-MQTT `scan_mqtt`
Open-MSSQL `scan_mssql`
Open-NATPMP `scan_nat_pmp`
Open-NetBIOS-Nameservice `scan_netbios`
Open-Netis `netis_router`
Open-Portmapper `scan_portmapper`
Open-QOTD `scan_qotd`
Open-Redis `scan_redis`
Open-SNMP `scan_snmp`
Open-SSDP `scan_ssdp`
Open-TFTP `scan_tftp`
Open-XDMCP `scan_xdmcp`
Outdated-DNSSEC-Key `outdated_dnssec_key`
Outdated-DNSSEC-Key-IPv6 `outdated_dnssec_key_v6`
Sandbox-URL `cwsandbox_url`
Sinkhole-DNS `sinkhole_dns`
Sinkhole-Events `event4_sinkhole`/`event6_sinkhole`
Sinkhole-Events IPv4 `event4_sinkhole`
Sinkhole-Events IPv6 `event6_sinkhole`
Sinkhole-HTTP-Events `event4_sinkhole_http`/`event6_sinkhole_http`
Sinkhole-HTTP-Events IPv4 `event4_sinkhole_http`
Sinkhole-HTTP-Events IPv6 `event6_sinkhole_http`
Sinkhole-Events-HTTP-Referer `event4_sinkhole_http_referer`/`event6_sinkhole_http_referer`
Sinkhole-Events-HTTP-Referer IPv4 `event4_sinkhole_http_referer`
Sinkhole-Events-HTTP-Referer IPv6 `event6_sinkhole_http_referer`
Spam-URL `spam_url`
SSL-FREAK-Vulnerable-Servers `scan_ssl_freak`
SSL-POODLE-Vulnerable-Servers `scan_ssl_poodle`/`scan6_ssl_poodle`
Vulnerable-Exchange-Server `*` `scan_exchange`
Vulnerable-ISAKMP `scan_isakmp`
Vulnerable-HTTP `scan_http`
Vulnerable-SMTP `scan_smtp_vulnerable`
======================================= =========================

`*` This report can also contain data on active webshells (column `tag` is `exchange;webshell`), and are therefore not only vulnerable but also actively infected.

In addition, the following legacy reports are supported:

=========================== =================================================== ========================
feed name successor feed name file name
=========================== =================================================== ========================
Amplification-DDoS-Victim Honeypot-Amplification-DDoS-Events ``ddos_amplification``
CAIDA-IP-Spoofer IP-Spoofer-Events ``caida_ip_spoofer``
Darknet Honeypot-Darknet ``darknet``
Drone Sinkhole-Events ``botnet_drone``
Drone-Brute-Force Honeypot-Brute-Force-Events, Sinkhole-HTTP-Events ``drone_brute_force``
Microsoft-Sinkhole Sinkhole-HTTP-Events ``microsoft_sinkhole``
Sinkhole-HTTP-Drone Sinkhole-HTTP-Events ``sinkhole_http_drone``
IPv6-Sinkhole-HTTP-Drone Sinkhole-HTTP-Events ``sinkhole6_http``
=========================== =================================================== ========================

More information on these legacy reports can be found in `Changes in Sinkhole and Honeypot Report Types and Formats <https://www.shadowserver.org/news/changes-in-sinkhole-and-honeypot-report-types-and-formats/>`_.
The report configuration is stored in a `shadowserver-schema.json` file downloaded from https://interchange.shadowserver.org/intelmq/v1/schema.

The parser will attempt to download a schema update on startup when the *auto_update* option is enabled.

Schema downloads can also be scheduled as a cron job:

.. code-block:: bash

02 01 * * * intelmq.bots.parsers.shadowserver.parser --update-schema


For air-gapped systems automation will be required to download and copy the file to VAR_STATE_PATH/shadowserver-schema.json.

The parser will automatically reload the configuration when the file changes.

**Schema contract**

Once set in the schema, the `classification.identifier`, `classification.taxonomy`, and `classification.type` fields will remain static for a specific report.

The schema revision history is maintained at https://github.com/The-Shadowserver-Foundation/report_schema/.

**Sample configuration**

.. code-block:: yaml

shadowserver-parser:
bot_id: shadowserver-parser
name: Shadowserver Parser
enabled: true
group: Parser
groupname: parsers
module: intelmq.bots.parsers.shadowserver.parser
parameters:
destination_queues:
_default: [file-output-queue]
auto_update: true
run_mode: continuous

**Development**

Expand All @@ -1715,14 +1666,6 @@ The parser consists of two files:

Both files are required for the parser to work properly.

**Add new Feedformats**

Add a new feed format and conversions if required to the file
``_config.py``. Don't forget to update the ``mapping`` dict.
It is required to look up the correct configuration.

Look at the documentation in the bot's ``_config.py`` file for more information.


.. _intelmq.bots.parsers.shodan.parser:

Expand Down
24 changes: 9 additions & 15 deletions intelmq/bots/collectors/shadowserver/collector_reports_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,15 +34,13 @@ class ShadowServerAPICollectorBot(CollectorBot, HttpMixin, CacheMixin):
A list of strings or a comma-separated list of the mailing lists you want to process.
types (list):
A list of strings or a string of comma-separated values with the names of reporttypes you want to process. If you leave this empty, all the available reports will be downloaded and processed (i.e. 'scan', 'drones', 'intel', 'sandbox_connection', 'sinkhole_combined').
file_format (str): File format to download ('csv' or 'json'). The default is 'json' for compatibility. Using 'csv' is recommended for best performance.
"""

country = None
api_key = None
secret = None
types = None
reports = None
file_format = None
rate_limit: int = 86400
redis_cache_db: int = 12
redis_cache_host: str = "127.0.0.1" # TODO: type could be ipadress
Expand All @@ -66,15 +64,15 @@ def init(self):
self.logger.warn("Deprecated parameter 'country' found. Please use 'reports' instead. The backwards-compatibility will be removed in IntelMQ version 4.0.0.")
self._report_list.append(self.country)

if self.file_format is not None:
if not (self.file_format == 'csv' or self.file_format == 'json'):
raise ValueError('Invalid file_format')
else:
self.file_format = 'json'
self.logger.info("For best performance, set 'file_format' to 'csv' and use intelmq.bots.parsers.shadowserver.parser.")

self.preamble = f'{{ "apikey": "{self.api_key}" '

def check(parameters: dict):
for key in parameters:
if key == 'file_format':
return [["error", "The file_format parameter is no longer supported. All reports are CSV."]]
elif key == 'country':
return [["warning", "Deprecated parameter 'country' found. Please use 'reports' instead. The backwards-compatibility will be removed in IntelMQ version 4.0.0."]]

def _headers(self, data):
return {'HMAC2': hmac.new(self.secret.encode(), data.encode('utf-8'), digestmod=hashlib.sha256).hexdigest()}

Expand Down Expand Up @@ -123,11 +121,7 @@ def _report_download(self, reportid: str):
data = self.preamble
data += f',"id": "{reportid}"}}'
self.logger.debug('Downloading report with data: %s.', data)

if (self.file_format == 'json'):
response = self.http_session().post(APIROOT + 'reports/download', data=data, headers=self._headers(data))
else:
response = self.http_session().get(DLROOT + reportid)
response = self.http_session().get(DLROOT + reportid)
response.raise_for_status()

return response.text
Expand All @@ -144,7 +138,7 @@ def process(self):

for item in reportslist:
filename = item['file']
filename_fixed = FILENAME_PATTERN.sub('.' + self.file_format, filename, count=1)
filename_fixed = FILENAME_PATTERN.sub('.csv', filename, count=1)
if self.cache_get(filename):
self.logger.debug('Processed file %r (fixed: %r) already.', filename, filename_fixed)
continue
Expand Down
Loading
Loading