From 1162fb6e610a4fdc96051d55bef4026439cda08b Mon Sep 17 00:00:00 2001 From: Kamil Mankowski Date: Wed, 14 Jun 2023 14:01:09 +0200 Subject: [PATCH 1/5] FIX: Ensure rejecting URLs starting with space The gh-102153 change in CPython modified how urllib.pare handles URLs with the leading spaces. To ensure previous behaviour, additional check was added. Closes: #2377 --- CHANGELOG.md | 4 +++- intelmq/lib/harmonization.py | 4 ++++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 5a2311d03..cc09d11d6 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -17,7 +17,9 @@ CHANGELOG - `intelmq.lib.upgrages`: Fix a bug in the upgrade function for version 3.1.0 which caused an exception if a generic csv parser instance had no parameter `type` (PR#2319 by Filip Pokorný). - `intelmq.lib.datatypes`: Adds `TimeFormat` class to be used for the `time_format` bot parameter (PR#2329 by Filip Pokorný). - `intelmq.lib.exceptions`: Fixes a bug in `InvalidArgument` exception (PR#2329 by Filip Pokorný). -- `intelmq.lib.harmonization`: Changes signature and names of `DateTime` conversion functions for consistency, backwards compatible (PR#2329 by Filip Pokorný). +- `intelmq.lib.harmonization`: + - Changes signature and names of `DateTime` conversion functions for consistency, backwards compatible (PR#2329 by Filip Pokorný). + - Ensure rejecting URLs with leading whitespaces after changes in CPython (fixes [#2377](https://github.com/certtools/intelmq/issues/2377)) ### Development diff --git a/intelmq/lib/harmonization.py b/intelmq/lib/harmonization.py index 3c983cea5..0114c906d 100644 --- a/intelmq/lib/harmonization.py +++ b/intelmq/lib/harmonization.py @@ -34,6 +34,7 @@ import json import re import socket +import string import warnings import urllib.parse as parse from typing import Optional, Union @@ -1090,6 +1091,9 @@ def is_valid(value: str, sanitize: bool = False) -> bool: if not GenericType.is_valid(value): return False + if value[0] in string.whitespace: + return False + result = parse.urlsplit(value) if result.netloc == "": return False From 57298f6622de2337725b99d2167577c19483abba Mon Sep 17 00:00:00 2001 From: Kamil Mankowski Date: Wed, 21 Jun 2023 15:20:33 +0200 Subject: [PATCH 2/5] FIX: Pin codespell version Without pinning the version, codespell is regulary failing after releasing their new version with new dictionary. --- .github/workflows/codespell.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/codespell.yml b/.github/workflows/codespell.yml index e58994f3b..e709d8c16 100644 --- a/.github/workflows/codespell.yml +++ b/.github/workflows/codespell.yml @@ -26,6 +26,6 @@ jobs: - name: Checkout repository uses: actions/checkout@v2 - name: Install codespell - run: pip install codespell + run: pip install "codespell==2.2.4" - name: Run codespell run: /home/runner/.local/bin/codespell From 3ee7a3fa8d3bc145dc94fa31ca4ba0a5bf5b7b21 Mon Sep 17 00:00:00 2001 From: Kamil Mankowski Date: Wed, 21 Jun 2023 17:15:01 +0200 Subject: [PATCH 3/5] Add changelog --- CHANGELOG.md | 1 + 1 file changed, 1 insertion(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index cc09d11d6..2cda8729d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -22,6 +22,7 @@ CHANGELOG - Ensure rejecting URLs with leading whitespaces after changes in CPython (fixes [#2377](https://github.com/certtools/intelmq/issues/2377)) ### Development +- CI: pin the Codespell version to omit troubles caused by its new releases (PR #2379). ### Bots From a7b269b42a7140360604d4d141a93be308d38514 Mon Sep 17 00:00:00 2001 From: Kamil Mankowski Date: Thu, 22 Jun 2023 13:39:52 +0200 Subject: [PATCH 4/5] TST: Skip tests failing due to urllib change More restrict validation in urllib causes troubles when processing invalid URLs. The correct solution on our side is at the moment unclear, see #2382 --- CHANGELOG.md | 2 ++ .../tests/bots/parsers/html_table/test_parser_column_split.py | 1 + 2 files changed, 3 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index e789265a6..dd5335f03 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -42,6 +42,7 @@ CHANGELOG - `intelmq.bots.parsers.html_table.parser`: Changes `time_format` parameter to use new `TimeFormat` class (PR#2329 by Filip Pokorný). - `intelmq.bots.parsers.turris.parser.py` Updated to the latest data format (issue #2167). (PR#2373 by Filip Pokorný). + #### Experts - `intelmq.bots.experts.sieve`: - Allow empty lists in sieve rule files (PR#2341 by Mikk Margus Möll). @@ -66,6 +67,7 @@ CHANGELOG - SECURITY: fixed a low-risk bug causing the tool to change owner of `/` if run with the `INTELMQ_PATHS_NO_OPT` environment variable set. This affects only the PIP package as the DEB/RPM packages don't contain this tool. (PR#2355 by Kamil Mańkowski, fixes #2354) ### Known Errors +- `intelmq.parsers.html_table` may not process invalid URLs in patched Python version due to changes in `urllib`. See #2382 3.1.0 (2023-02-10) ------------------ diff --git a/intelmq/tests/bots/parsers/html_table/test_parser_column_split.py b/intelmq/tests/bots/parsers/html_table/test_parser_column_split.py index 2c6ce2903..06d7121db 100644 --- a/intelmq/tests/bots/parsers/html_table/test_parser_column_split.py +++ b/intelmq/tests/bots/parsers/html_table/test_parser_column_split.py @@ -70,6 +70,7 @@ def test_event_with_split(self): self.run_bot() self.assertMessageEqual(0, EXAMPLE_EVENT) + @unittest.skip("Change in urllib prevent invalid URLs to be processed, see #2377") def test_event_without_split(self): self.sysconfig = {"columns": ["time.source", "source.url", "malware.hash.md5", "source.ip", "__IGNORE__"], From 834290caed1e2be177e070f57c0c6470a8c5482c Mon Sep 17 00:00:00 2001 From: Sebastian Date: Tue, 27 Jun 2023 08:34:47 +0200 Subject: [PATCH 5/5] Update CHANGELOG.md: Remove empty line --- CHANGELOG.md | 1 - 1 file changed, 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index dd5335f03..01be7ef2a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -42,7 +42,6 @@ CHANGELOG - `intelmq.bots.parsers.html_table.parser`: Changes `time_format` parameter to use new `TimeFormat` class (PR#2329 by Filip Pokorný). - `intelmq.bots.parsers.turris.parser.py` Updated to the latest data format (issue #2167). (PR#2373 by Filip Pokorný). - #### Experts - `intelmq.bots.experts.sieve`: - Allow empty lists in sieve rule files (PR#2341 by Mikk Margus Möll).