Skip to content
This repository was archived by the owner on Oct 11, 2024. It is now read-only.

Commit c63f323

Browse files
authored
Merge pull request #12 from Ericsson/CWE-502-doc
Adding docs for CWE-502
2 parents 01115c1 + cb69856 commit c63f323

File tree

2 files changed

+331
-5
lines changed

2 files changed

+331
-5
lines changed

CWE-664/CWE-502/README.md

Lines changed: 327 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,327 @@
1+
# CWE-502: Deserialization of Untrusted Data
2+
3+
The `pickle` module is known to be vulnerable [[docs.python.org 2023]](https://docs.python.org/3.9/library/pickle.html) against unwanted code execution during deserialization and should only be used if there is no architectural text-based alternative.
4+
Even if data has been created from a trusted source we need to verify that it has not been tampered with during transport.
5+
6+
Security-related concerns during object serialization and deserialization include:
7+
8+
* Prefer text-based formats such as `JSON` or `YAML` if possible.
9+
* Consider using `Base64` encoding for binary data
10+
* Only unpickle data you trust [docs.python.org 2023].
11+
* Restricting Globals during deserialization.
12+
* Prefer `xmlrpc.client` for network operations that are already `XML` based.
13+
* Sign data that is crossing trust boundaries with `hmac`.
14+
* Use Input validation.
15+
16+
## Noncompliant Code Example
17+
18+
The `noncompliant01.py` code demonstrates arbitrary code execution [Checkoway Oct 2013] using `os.system` to launch a program during unpickling when `pickle.loads()`.
19+
20+
*[noncompliant01.py](noncompliant01.py):*
21+
22+
```py
23+
""" Non-Compliant Code Example """
24+
import platform
25+
import pickle
26+
27+
28+
class Message(object):
29+
"""Sample Message Object"""
30+
sender_id = 42
31+
text = "Some text"
32+
33+
def printout(self):
34+
"""prints content to stdout to demonstrate active content"""
35+
print(f"Message:sender_id={self.sender_id} text={self.text}")
36+
37+
38+
class Preserver(object):
39+
"""Demonstrating deserialisation"""
40+
41+
def can(self, _message: Message) -> bytes:
42+
"""Serializes a Message object.
43+
Parameters:
44+
_message (Message): Message object
45+
Returns:
46+
_jar (bytes): pickled jar as string
47+
"""
48+
return pickle.dumps(_message)
49+
50+
def uncan(self, _jar) -> Message:
51+
"""De-serializes a Message object.
52+
Parameters:
53+
_jar (String): Pickled jar
54+
Returns:
55+
(Message): Message object
56+
"""
57+
return pickle.loads(_jar)
58+
59+
60+
# serialization of a normal package
61+
p1 = Preserver()
62+
message = Message()
63+
message.printout()
64+
jar = p1.can(message)
65+
66+
# sending or storing would happen here
67+
p2 = Preserver()
68+
message = None
69+
message = p2.uncan(jar)
70+
message.printout()
71+
72+
#####################
73+
# exploiting above code example
74+
#####################
75+
print("-" * 10)
76+
print("Attacker trying to read the message")
77+
message = pickle.loads(jar)
78+
message.printout()
79+
80+
print("-" * 10)
81+
if platform.system() == "Windows":
82+
PAYLOAD = b"""cos
83+
system
84+
(S'calc.exe'
85+
tR."""
86+
else:
87+
PAYLOAD = b"""cos
88+
system
89+
(S'whoami;uptime;uname -a;ls -la /etc/shadow'
90+
tR."""
91+
print("Attacker trying to inject PAYLOAD")
92+
p3 = Preserver()
93+
message = None
94+
message = p3.uncan(PAYLOAD)
95+
```
96+
97+
The deserializating `Preserver.uncan()` method has no solution to verify the content prior to unpickling it and runs the PAYLOAD even before turning it into an object. On Windows you have `calc.exe` and on Unix a bunch of commands such as `uname -a and ls -la /etc/shadow`.
98+
99+
> [!CAUTION]
100+
> The `compliant01.py` code only demonstrates integrity protection with hmac.
101+
> The pickled object is not encrypted and key-handling is inappropriate!
102+
> Consider using proper key management with `x509` and encryption [[pyca/cryptography 2023]](https://cryptography.io/en/latest/).
103+
104+
*[compliant01.py](compliant01.py):*
105+
106+
```py
107+
""" Compliant Code Example """
108+
import hashlib
109+
import hmac
110+
import platform
111+
import pickle
112+
import secrets
113+
114+
115+
class Message(object):
116+
"""Sample Message Object"""
117+
sender_id = 42
118+
text = "Some text"
119+
120+
def printout(self):
121+
"""prints content to stdout to demonstrate active content"""
122+
print(f"Message:sender_id={self.sender_id} text={self.text}")
123+
124+
125+
class Preserver(object):
126+
"""Demonstrating deserialisation"""
127+
def __init__(self, _key):
128+
self._key = _key
129+
130+
def can(self, _message: Message) -> tuple:
131+
"""Serializes a Message object.
132+
Parameters:
133+
_message (Message): Message object
134+
Returns:
135+
_digest (String): HMAC digest string
136+
_jar (bytes): pickled jar as string
137+
"""
138+
_jar = pickle.dumps(_message)
139+
_digest = hmac.new(self._key, _jar, hashlib.sha256).hexdigest()
140+
return _digest, _jar
141+
142+
def uncan(self, _expected_digest, _jar) -> Message:
143+
"""Verifies and de-serializes a Message object.
144+
Parameters:
145+
_expected_digest (String): Message HMAC digest
146+
_jar (bytes): Pickled jar
147+
Returns:
148+
(Message): Message object
149+
"""
150+
_digest = hmac.new(self._key, _jar, hashlib.sha256).hexdigest()
151+
if _expected_digest != _digest:
152+
raise ValueError("Integrity of jar compromised")
153+
return pickle.loads(_jar)
154+
155+
156+
# serialization of a normal package
157+
key = secrets.token_bytes()
158+
print(f"key={key}")
159+
p1 = Preserver(key)
160+
message = Message()
161+
message.printout()
162+
digest, jar = p1.can(message)
163+
164+
# sending or storing would happen here
165+
p2 = Preserver(key)
166+
message = None
167+
message = p2.uncan(digest, jar)
168+
message.printout()
169+
170+
#####################
171+
# exploiting above code example
172+
#####################
173+
print("-" * 10)
174+
print("Attacker trying to read the message")
175+
message = pickle.loads(jar)
176+
message.printout()
177+
178+
print("-" * 10)
179+
if platform.system() == "Windows":
180+
PAYLOAD = b"""cos
181+
system
182+
(S'calc.exe'
183+
tR."""
184+
else:
185+
PAYLOAD = b"""cos
186+
system
187+
(S'whoami;uptime;uname -a;ls -la /etc/shadow'
188+
tR."""
189+
print("Attacker trying to inject PAYLOAD")
190+
p3 = Preserver(b"dont know")
191+
message = None
192+
message = p3.uncan(digest, PAYLOAD)
193+
```
194+
195+
The integrity verification in `compliant01.py` throws an exception `ValueError: Integrity of jar compromised prior to deserializationunpickling to prevent the PAYLOAD executed.`
196+
197+
## Compliant Solution JSON without pickle
198+
199+
Text-based formats, such as `JSON` and `YAML`, should always be preferred. They have a lower set of capabilities and reduce the attack surface [python.org comparison-with-json 2023] when compared to `pickle`.
200+
201+
The `compliant02.py` code only allows serializing and deserialization of object data and not object methods as in `noncompliant01.py` or `compliant01.py`.
202+
203+
Consider converting binary data into text using `Base64` encoding for performance and size irrelevant operations.
204+
205+
*[compliant02.py](compliant02.py):*
206+
207+
```py
208+
""" Compliant Code Example """
209+
import platform
210+
import json
211+
212+
213+
class Message(object):
214+
"""Sample Message Object"""
215+
sender_id = int()
216+
text = str()
217+
218+
def __init__(self):
219+
self.sender_id = 42
220+
self.text = "Some text"
221+
222+
def printout(self):
223+
print(f"sender_id: {self.sender_id}\ntext: {self.text}")
224+
225+
226+
class Preserver(object):
227+
"""Demonstrating deserialisation"""
228+
229+
def can(self, _message: Message) -> str:
230+
"""Serializes a Message object.
231+
Parameters:
232+
_message (Message): Message object
233+
Returns:
234+
_jar (bytes): jar as string
235+
"""
236+
return json.dumps(vars(_message))
237+
238+
def uncan(self, _jar) -> Message:
239+
"""Verifies and de-serializes a Message object.
240+
Parameters:
241+
_jar (String): Pickled jar
242+
Returns:
243+
(Message): Message object
244+
"""
245+
j = json.loads(_jar)
246+
_message = Message()
247+
_message.sender_id = int(j["sender_id"])
248+
_message.text = str(j["text"])
249+
return _message
250+
251+
252+
# serialization of a normal package
253+
p1 = Preserver()
254+
message = Message()
255+
jar = p1.can(message)
256+
print(jar)
257+
print(type(json.loads(jar)))
258+
259+
# sending or storing would happen here
260+
p2 = Preserver()
261+
message = None
262+
message = p2.uncan(jar)
263+
message.printout()
264+
print(message.sender_id)
265+
266+
#####################
267+
# exploiting above code example
268+
#####################
269+
print("-" * 10)
270+
print("Attacker trying to read the message")
271+
print(jar)
272+
message.printout()
273+
274+
print("-" * 10)
275+
if platform.system() == "Windows":
276+
PAYLOAD = b"""cos
277+
system
278+
(S'calc.exe'
279+
tR."""
280+
else:
281+
PAYLOAD = b"""cos
282+
system
283+
(S'whoami;uptime;uname -a;ls -la /etc/shadow'
284+
tR."""
285+
print("Attacker trying to inject PAYLOAD")
286+
p3 = Preserver()
287+
message = None
288+
message = p3.uncan(PAYLOAD)
289+
```
290+
291+
The `compliant02.py` stops with the unpacking with a `json.decoder.JSONDecodeError`.
292+
293+
## Exceptions
294+
295+
Serialized data from a trusted input source does not require sanitization, provided that the code clearly documents that it relies on the input source being trustworthy. For example, if a library is being audited, a routine of that library may have a documented precondition that its callers pre-sanitize any passed-in serialized data or confirm the input source as trustworthy.
296+
297+
## Automated Detection
298+
299+
|Tool|Version|Checker|Description|
300+
|:----|:----|:----|:----|
301+
|Bandit|1.7.4|B301|Pickle and modules that wrap it can be unsafe when used to de-serialize untrusted data, possible security issue.Bandit can only detect a pickle module in use and is unable to detect an acceptable implementation code that combines pickle with `hmac` and proper key managment.|
302+
303+
## Related Vulnerabilities
304+
305+
|Product|CVE|Description|CVSS Rating|Comment|
306+
|:----|:----|:----|:----|:----|
307+
|TensorFlow using the pickle module|[CVE-2021-37678](https://www.cvedetails.com/cve/CVE-2021-37678/)|TensorFlow machine learning platform allows code execution when de-serializing a Keras model from `YAML` format.|v3.1: 8.8 High||
308+
|NVFLARE < 2.1.4|[CVE-2022-34668](https://www.cvedetails.com/cve/CVE-2022-34668/)|Deserialization of Untrusted Data with Pickle may allow an unprivileged network attacker to cause Remote Code Execution (RCE).|v3.1: 9.8 Critical|Exploit available on [exploit-db.com](https://www.exploit-db.com/exploits/51051)|
309+
|Graphite 0.9.5 through 0.9.10|[CVE-2013-5093](https://www.cvedetails.com/cve/CVE-2013-5093/)|The renderLocalView function in render/views.py uses the pickle Python module unsafely, which allows remote attackers to execute arbitrary code via a crafted serialized object|n/a|Exploit available on [exploit-db.com](https://www.exploit-db.com/exploits/27752)|
310+
|Superset prior to 0.23|[CVE-2018-8021](https://www.cvedetails.com/cve/CVE-2018-8021/)|TUnsafe load method from the pickle library to deserialize data leading to possible RCE|v3.1: 9.8 Critical|Exploit available on [exploit-db.com](https://www.exploit-db.com/exploits/45933)|
311+
|rpc.py through 0.6.0|[CVE-2022-35411](https://www.cvedetails.com/cve/CVE-2022-35411/)|HTTP HEADERS set to `"serializer: pickle"` triggers `rcp.py` to de-serialize with `pickle` instead of the default `JSON` allowing Allows Remote Code Execution|v3.1:9.8 Critical|Exploit available on [https://github.com/](https://github.com/ehtec/rpcpy-exploit/blob/main/rpcpy-exploit.py)|
312+
313+
## Related Guidelines
314+
315+
|||
316+
|:---|:---|
317+
|[SEI CERT Coding Standard for Java](https://wiki.sei.cmu.edu/confluence/display/java/SEI+CERT+Oracle+Coding+Standard+for+Java)|[SER01-J. Do not deviate from the proper signatures of serialization methods](https://wiki.sei.cmu.edu/confluence/display/java/SER01-J.+Do+not+deviate+from+the+proper+signatures+of+serialization+methods)|
318+
|[MITRE CWE](http://cwe.mitre.org/)|Pillar [CWE-664: Improper Control of a Resource Through its Lifetime (4.13) (mitre.org)](https://cwe.mitre.org/data/definitions/664.html)|
319+
|[MITRE CWE](http://cwe.mitre.org/)|Base [CWE-502, Deserialization of Untrusted Data](http://cwe.mitre.org/data/definitions/502.html)|
320+
321+
## Biblography
322+
323+
|||
324+
|:---|:---|
325+
|[[docs.python.org 2023]](https://docs.python.org/)|pickle — Python object serialization. Available from: <https://docs.python.org/3.9/library/pickle.html> \[Accessed 07 May 2024]|
326+
|[python.org comparison-with-json 2023]|pickle - Comparison with JSON. Available from: <https://docs.python.org/3.9/library/pickle.html#comparison-with-json> \[Acessed 07 May 2024]|
327+
|[pyca/cryptography 2023]|Welcome to pyca/cryptography. Available from: <https://cryptography.io/en/latest/> \[Acessed 07 May 2024]|

README.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,10 @@
22

33
Promote secure products by knowing the difference between secure compliant
44
and non-compliant code with `CPython >= 3.9` using modules listed on
5-
6-
[[Python Module Index 2023]](https://docs.python.org/3.9/py-modindex.html) \[Python 2023].
5+
[Python Module Index](https://docs.python.org/3.9/py-modindex.html)\[Python 2023].
76

87
This page is in initiative by Ericsson to improve secure coding in Python by providing a location for study. Its structure is based on
9-
Common Weakness Enamurator (CWE) [Pillar Weakness](https://cwe.mitre.org/documents/glossary/#Pillar%20Weakness) \[mitre.org 2023].
8+
Common Weakness Enamurator (CWE) [Pillar Weakness](https://cwe.mitre.org/documents/glossary/#Pillar%20Weakness) [mitre.org 2023].
109
It currently contains *only* the code examples, documentation will follow.
1110

1211
## Disclaimer
@@ -40,12 +39,12 @@ It is **not production code** and requires code-style or python best practices t
4039

4140
|[CWE-664: Improper Control of a Resource Through its Lifetime](https://cwe.mitre.org/data/definitions/664.html)|Prominent CVE|
4241
|:-----------------------------------------------------------------------------------------------------------------------------------------------|:----|
43-
|[CWE-134: Use of Externally-Controlled Format String](CWE-664/CWE-134/.)|[CVE-2022-27177](https://www.cvedetails.com/cve/CVE-2022-27177/),<br>CVSSv3.1: **9.8**,<br>EPSS:**00.37**(01.12.2023)|
42+
|[CWE-134: Use of Externally-Controlled Format String](CWE-664/CWE-134/.)|[CVE-2022-27177](https://www.cvedetails.com/cve/CVE-2022-27177/),<br/>CVSSv3.1: **9.8**,<br/>EPSS:**00.37**(01.12.2023)|
4443
|[CWE-197: Numeric Truncation Error](CWE-664/CWE-197/.)||
4544
|[CWE-400: Uncontrolled Resource Consumption](CWE-664/CWE-400/README.md)||
4645
|[CWE-409: Improper Handling of Highly Compressed Data (Data Amplification)](CWE-664/CWE-409/.)||
4746
|[CWE-410: Insufficient Resource Pool](CWE-664/CWE-410/.)||
48-
|[CWE-502: Deserialization of Untrusted Data)](CWE-664/CWE-502/.)||
47+
|[CWE-502: Deserialization of Untrusted Data)](CWE-664/CWE-502/README.md)||
4948
|[CWE-665: Improper Initialization](CWE-664/CWE-665/.)||
5049
|[CWE-681: Improper Control of a Resource Through its Lifetime](CWE-664/CWE-681/.)||
5150
|[CWE-833: Deadlock](CWE-664/CWE-833/README.md)||

0 commit comments

Comments
 (0)