Merge pull request #12 from Ericsson/CWE-502-doc

BartyBoi1128 · web-flow · commit c63f32302797 · 2024-05-22T11:32:51.000+01:00
Adding docs for CWE-502
diff --git a/CWE-664/CWE-502/README.md b/CWE-664/CWE-502/README.md
@@ -0,0 +1,327 @@
+# CWE-502: Deserialization of Untrusted Data
+
+The `pickle` module is known to be vulnerable [[docs.python.org 2023]](https://docs.python.org/3.9/library/pickle.html) against unwanted code execution during deserialization and should only be used if there is no architectural text-based alternative.
+Even if data has been created from a trusted source we need to verify that it has not been tampered with during transport.
+
+Security-related concerns during object serialization and deserialization include:
+
+* Prefer text-based formats such as `JSON` or `YAML` if possible.
+* Consider using `Base64` encoding for binary data
+* Only unpickle data you trust [docs.python.org 2023].
+* Restricting Globals during deserialization.
+* Prefer `xmlrpc.client` for network operations that are already `XML` based.
+* Sign data that is crossing trust boundaries with `hmac`.
+* Use Input validation.
+
+## Noncompliant Code Example
+
+The `noncompliant01.py`  code demonstrates arbitrary code execution [Checkoway Oct 2013] using `os.system` to launch a program during unpickling when `pickle.loads()`.
+
+*[noncompliant01.py](noncompliant01.py):*
+
+```py
+""" Non-Compliant Code Example """
+import platform
+import pickle
+
+
+class Message(object):
+    """Sample Message Object"""
+    sender_id = 42
+    text = "Some text"
+
+    def printout(self):
+        """prints content to stdout to demonstrate active content"""
+        print(f"Message:sender_id={self.sender_id} text={self.text}")
+
+
+class Preserver(object):
+    """Demonstrating deserialisation"""
+
+    def can(self, _message: Message) -> bytes:
+        """Serializes a Message object.
+            Parameters:
+                _message (Message): Message object
+            Returns:
+                _jar (bytes): pickled jar as string
+        """
+        return pickle.dumps(_message)
+
+    def uncan(self, _jar) -> Message:
+        """De-serializes a Message object.
+            Parameters:
+                _jar (String): Pickled jar
+            Returns:
+                (Message): Message object
+        """
+        return pickle.loads(_jar)
+
+
+# serialization of a normal package
+p1 = Preserver()
+message = Message()
+message.printout()
+jar = p1.can(message)
+
+# sending or storing would happen here
+p2 = Preserver()
+message = None
+message = p2.uncan(jar)
+message.printout()
+
+#####################
+# exploiting above code example
+#####################
+print("-" * 10)
+print("Attacker trying to read the message")
+message = pickle.loads(jar)
+message.printout()
+
+print("-" * 10)
+if platform.system() == "Windows":
+    PAYLOAD = b"""cos
+system
+(S'calc.exe'
+tR."""
+else:
+    PAYLOAD = b"""cos
+system
+(S'whoami;uptime;uname -a;ls -la /etc/shadow'
+tR."""
+print("Attacker trying to inject PAYLOAD")
+p3 = Preserver()
+message = None
+message = p3.uncan(PAYLOAD)
+```
+
+The deserializating `Preserver.uncan()` method has no solution to verify the content prior to unpickling it and runs the PAYLOAD even before turning it into an object. On Windows you have `calc.exe`  and on Unix a bunch of commands such as `uname -a and ls -la /etc/shadow`.
+
+> [!CAUTION]
+> The `compliant01.py` code only demonstrates integrity protection with hmac.
+> The pickled object is not encrypted and key-handling is inappropriate!
+> Consider using proper key management with `x509` and encryption [[pyca/cryptography 2023]](https://cryptography.io/en/latest/).
+
+*[compliant01.py](compliant01.py):*
+
+```py
+""" Compliant Code Example """
+import hashlib
+import hmac
+import platform
+import pickle
+import secrets
+
+
+class Message(object):
+    """Sample Message Object"""
+    sender_id = 42
+    text = "Some text"
+
+    def printout(self):
+        """prints content to stdout to demonstrate active content"""
+        print(f"Message:sender_id={self.sender_id} text={self.text}")
+
+
+class Preserver(object):
+    """Demonstrating deserialisation"""
+    def __init__(self, _key):
+        self._key = _key
+
+    def can(self, _message: Message) -> tuple:
+        """Serializes a Message object.
+            Parameters:
+                _message (Message): Message object
+            Returns:
+                _digest (String): HMAC digest string
+                _jar (bytes): pickled jar as string
+        """
+        _jar = pickle.dumps(_message)
+        _digest = hmac.new(self._key, _jar, hashlib.sha256).hexdigest()
+        return _digest, _jar
+
+    def uncan(self, _expected_digest, _jar) -> Message:
+        """Verifies and de-serializes a Message object.
+            Parameters:
+                _expected_digest (String): Message HMAC digest
+                _jar (bytes): Pickled jar
+            Returns:
+                (Message): Message object
+        """
+        _digest = hmac.new(self._key, _jar, hashlib.sha256).hexdigest()
+        if _expected_digest != _digest:
+            raise ValueError("Integrity of jar compromised")
+        return pickle.loads(_jar)
+
+
+# serialization of a normal package
+key = secrets.token_bytes()
+print(f"key={key}")
+p1 = Preserver(key)
+message = Message()
+message.printout()
+digest, jar = p1.can(message)
+
+# sending or storing would happen here
+p2 = Preserver(key)
+message = None
+message = p2.uncan(digest, jar)
+message.printout()
+
+#####################
+# exploiting above code example
+#####################
+print("-" * 10)
+print("Attacker trying to read the message")
+message = pickle.loads(jar)
+message.printout()
+
+print("-" * 10)
+if platform.system() == "Windows":
+    PAYLOAD = b"""cos
+system
+(S'calc.exe'
+tR."""
+else:
+    PAYLOAD = b"""cos
+system
+(S'whoami;uptime;uname -a;ls -la /etc/shadow'
+tR."""
+print("Attacker trying to inject PAYLOAD")
+p3 = Preserver(b"dont know")
+message = None
+message = p3.uncan(digest, PAYLOAD)
+```
+
+The integrity verification in `compliant01.py` throws an exception `ValueError: Integrity of jar compromised prior to deserializationunpickling to prevent the PAYLOAD executed.`
+
+## Compliant Solution JSON without pickle
+
+Text-based formats, such as `JSON` and `YAML`, should always be preferred. They have a lower set of capabilities and reduce the attack surface [python.org comparison-with-json 2023] when compared to `pickle`.
+
+The `compliant02.py`  code only allows serializing and deserialization of object data and not object methods as in `noncompliant01.py` or `compliant01.py`.
+
+Consider converting binary data into text using `Base64` encoding for performance and size irrelevant operations.
+
+*[compliant02.py](compliant02.py):*
+
+```py
+""" Compliant Code Example """
+import platform
+import json
+ 
+ 
+class Message(object):
+    """Sample Message Object"""
+    sender_id = int()
+    text = str()
+ 
+    def __init__(self):
+        self.sender_id = 42
+        self.text = "Some text"
+ 
+    def printout(self):
+        print(f"sender_id: {self.sender_id}\ntext: {self.text}")
+ 
+ 
+class Preserver(object):
+    """Demonstrating deserialisation"""
+ 
+    def can(self, _message: Message) -> str:
+        """Serializes a Message object.
+            Parameters:
+                _message (Message): Message object
+            Returns:
+                _jar (bytes): jar as string
+        """
+        return json.dumps(vars(_message))
+ 
+    def uncan(self, _jar) -> Message:
+        """Verifies and de-serializes a Message object.
+            Parameters:
+                _jar (String): Pickled jar
+            Returns:
+                (Message): Message object
+        """
+        j = json.loads(_jar)
+        _message = Message()
+        _message.sender_id = int(j["sender_id"])
+        _message.text = str(j["text"])
+        return _message
+ 
+ 
+# serialization of a normal package
+p1 = Preserver()
+message = Message()
+jar = p1.can(message)
+print(jar)
+print(type(json.loads(jar)))
+ 
+# sending or storing would happen here
+p2 = Preserver()
+message = None
+message = p2.uncan(jar)
+message.printout()
+print(message.sender_id)
+ 
+#####################
+# exploiting above code example
+#####################
+print("-" * 10)
+print("Attacker trying to read the message")
+print(jar)
+message.printout()
+ 
+print("-" * 10)
+if platform.system() == "Windows":
+    PAYLOAD = b"""cos
+system
+(S'calc.exe'
+tR."""
+else:
+    PAYLOAD = b"""cos
+system
+(S'whoami;uptime;uname -a;ls -la /etc/shadow'
+tR."""
+print("Attacker trying to inject PAYLOAD")
+p3 = Preserver()
+message = None
+message = p3.uncan(PAYLOAD)
+```
+
+The `compliant02.py` stops with the unpacking with a `json.decoder.JSONDecodeError`.
+
+## Exceptions
+
+Serialized data from a trusted input source does not require sanitization, provided that the code clearly documents that it relies on the input source being trustworthy. For example, if a library is being audited, a routine of that library may have a documented precondition that its callers pre-sanitize any passed-in serialized data or confirm the input source as trustworthy.
+
+## Automated Detection
+
+|Tool|Version|Checker|Description|
+|:----|:----|:----|:----|
+|Bandit|1.7.4|B301|Pickle and modules that wrap it can be unsafe when used to de-serialize untrusted data, possible security issue.Bandit can only detect a pickle module in use and is unable to detect an acceptable implementation code that combines pickle with `hmac` and proper key managment.|
+
+## Related Vulnerabilities
+
+|Product|CVE|Description|CVSS Rating|Comment|
+|:----|:----|:----|:----|:----|
+|TensorFlow using the pickle module|[CVE-2021-37678](https://www.cvedetails.com/cve/CVE-2021-37678/)|TensorFlow machine learning platform allows code execution when de-serializing a Keras model from `YAML` format.|v3.1: 8.8 High||
+|NVFLARE < 2.1.4|[CVE-2022-34668](https://www.cvedetails.com/cve/CVE-2022-34668/)|Deserialization of Untrusted Data with Pickle may allow an unprivileged network attacker to cause Remote Code Execution (RCE).|v3.1: 9.8 Critical|Exploit available on [exploit-db.com](https://www.exploit-db.com/exploits/51051)|
+|Graphite 0.9.5 through 0.9.10|[CVE-2013-5093](https://www.cvedetails.com/cve/CVE-2013-5093/)|The renderLocalView function in render/views.py uses the pickle Python module unsafely, which allows remote attackers to execute arbitrary code via a crafted serialized object|n/a|Exploit available on [exploit-db.com](https://www.exploit-db.com/exploits/27752)|
+|Superset prior to 0.23|[CVE-2018-8021](https://www.cvedetails.com/cve/CVE-2018-8021/)|TUnsafe load method from the pickle library to deserialize data leading to possible RCE|v3.1: 9.8 Critical|Exploit available on [exploit-db.com](https://www.exploit-db.com/exploits/45933)|
+|rpc.py through 0.6.0|[CVE-2022-35411](https://www.cvedetails.com/cve/CVE-2022-35411/)|HTTP HEADERS set to `"serializer: pickle"` triggers `rcp.py` to de-serialize with `pickle` instead of the default `JSON` allowing Allows Remote Code Execution|v3.1:9.8 Critical|Exploit available on [https://github.com/](https://github.com/ehtec/rpcpy-exploit/blob/main/rpcpy-exploit.py)|
+
+## Related Guidelines
+
+|||
+|:---|:---|
+|[SEI CERT Coding Standard for Java](https://wiki.sei.cmu.edu/confluence/display/java/SEI+CERT+Oracle+Coding+Standard+for+Java)|[SER01-J. Do not deviate from the proper signatures of serialization methods](https://wiki.sei.cmu.edu/confluence/display/java/SER01-J.+Do+not+deviate+from+the+proper+signatures+of+serialization+methods)|
+|[MITRE CWE](http://cwe.mitre.org/)|Pillar [CWE-664: Improper Control of a Resource Through its Lifetime (4.13) (mitre.org)](https://cwe.mitre.org/data/definitions/664.html)|
+|[MITRE CWE](http://cwe.mitre.org/)|Base [CWE-502, Deserialization of Untrusted Data](http://cwe.mitre.org/data/definitions/502.html)|
+
+## Biblography
+
+|||
+|:---|:---|
+|[[docs.python.org 2023]](https://docs.python.org/)|pickle — Python object serialization. Available from: <https://docs.python.org/3.9/library/pickle.html> \[Accessed 07 May 2024]|
+|[python.org comparison-with-json 2023]|pickle - Comparison with JSON. Available from: <https://docs.python.org/3.9/library/pickle.html#comparison-with-json> \[Acessed 07 May 2024]|
+|[pyca/cryptography 2023]|Welcome to pyca/cryptography. Available from: <https://cryptography.io/en/latest/> \[Acessed 07 May 2024]|
diff --git a/README.md b/README.md
@@ -2,11 +2,10 @@
 
 Promote secure products by knowing the difference between secure compliant
 and non-compliant code with `CPython >= 3.9` using modules listed on
-
-[[Python Module Index 2023]](https://docs.python.org/3.9/py-modindex.html) \[Python 2023].
+[Python Module Index](https://docs.python.org/3.9/py-modindex.html)\[Python 2023].
 
 This page is in initiative by Ericsson to improve secure coding in Python by providing a location for study. Its structure is based on
-Common Weakness Enamurator (CWE) [Pillar Weakness](https://cwe.mitre.org/documents/glossary/#Pillar%20Weakness) \[mitre.org 2023].
+Common Weakness Enamurator (CWE) [Pillar Weakness](https://cwe.mitre.org/documents/glossary/#Pillar%20Weakness) [mitre.org 2023].
 It currently contains *only* the code examples, documentation will follow.
 
 ## Disclaimer
@@ -40,12 +39,12 @@ It is **not production code** and requires code-style or python best practices t
 
 |[CWE-664: Improper Control of a Resource Through its Lifetime](https://cwe.mitre.org/data/definitions/664.html)|Prominent CVE|
 |:-----------------------------------------------------------------------------------------------------------------------------------------------|:----|
-|[CWE-134: Use of Externally-Controlled Format String](CWE-664/CWE-134/.)|[CVE-2022-27177](https://www.cvedetails.com/cve/CVE-2022-27177/),<br>CVSSv3.1: **9.8**,<br>EPSS:**00.37**(01.12.2023)|
+|[CWE-134: Use of Externally-Controlled Format String](CWE-664/CWE-134/.)|[CVE-2022-27177](https://www.cvedetails.com/cve/CVE-2022-27177/),<br/>CVSSv3.1: **9.8**,<br/>EPSS:**00.37**(01.12.2023)|
 |[CWE-197: Numeric Truncation Error](CWE-664/CWE-197/.)||
 |[CWE-400: Uncontrolled Resource Consumption](CWE-664/CWE-400/README.md)||
 |[CWE-409: Improper Handling of Highly Compressed Data (Data Amplification)](CWE-664/CWE-409/.)||
 |[CWE-410: Insufficient Resource Pool](CWE-664/CWE-410/.)||
-|[CWE-502: Deserialization of Untrusted Data)](CWE-664/CWE-502/.)||
+|[CWE-502: Deserialization of Untrusted Data)](CWE-664/CWE-502/README.md)||
 |[CWE-665: Improper Initialization](CWE-664/CWE-665/.)||
 |[CWE-681: Improper Control of a Resource Through its Lifetime](CWE-664/CWE-681/.)||
 |[CWE-833: Deadlock](CWE-664/CWE-833/README.md)||