|
| 1 | +# CWE-502: Deserialization of Untrusted Data |
| 2 | + |
| 3 | +The `pickle` module is known to be vulnerable [[docs.python.org 2023]](https://docs.python.org/3.9/library/pickle.html) against unwanted code execution during deserialization and should only be used if there is no architectural text-based alternative. |
| 4 | +Even if data has been created from a trusted source we need to verify that it has not been tampered with during transport. |
| 5 | + |
| 6 | +Security-related concerns during object serialization and deserialization include: |
| 7 | + |
| 8 | +* Prefer text-based formats such as `JSON` or `YAML` if possible. |
| 9 | +* Consider using `Base64` encoding for binary data |
| 10 | +* Only unpickle data you trust [docs.python.org 2023]. |
| 11 | +* Restricting Globals during deserialization. |
| 12 | +* Prefer `xmlrpc.client` for network operations that are already `XML` based. |
| 13 | +* Sign data that is crossing trust boundaries with `hmac`. |
| 14 | +* Use Input validation. |
| 15 | + |
| 16 | +## Noncompliant Code Example |
| 17 | + |
| 18 | +The `noncompliant01.py` code demonstrates arbitrary code execution [Checkoway Oct 2013] using `os.system` to launch a program during unpickling when `pickle.loads()`. |
| 19 | + |
| 20 | +*[noncompliant01.py](noncompliant01.py):* |
| 21 | + |
| 22 | +```py |
| 23 | +""" Non-Compliant Code Example """ |
| 24 | +import platform |
| 25 | +import pickle |
| 26 | + |
| 27 | + |
| 28 | +class Message(object): |
| 29 | + """Sample Message Object""" |
| 30 | + sender_id = 42 |
| 31 | + text = "Some text" |
| 32 | + |
| 33 | + def printout(self): |
| 34 | + """prints content to stdout to demonstrate active content""" |
| 35 | + print(f"Message:sender_id={self.sender_id} text={self.text}") |
| 36 | + |
| 37 | + |
| 38 | +class Preserver(object): |
| 39 | + """Demonstrating deserialisation""" |
| 40 | + |
| 41 | + def can(self, _message: Message) -> bytes: |
| 42 | + """Serializes a Message object. |
| 43 | + Parameters: |
| 44 | + _message (Message): Message object |
| 45 | + Returns: |
| 46 | + _jar (bytes): pickled jar as string |
| 47 | + """ |
| 48 | + return pickle.dumps(_message) |
| 49 | + |
| 50 | + def uncan(self, _jar) -> Message: |
| 51 | + """De-serializes a Message object. |
| 52 | + Parameters: |
| 53 | + _jar (String): Pickled jar |
| 54 | + Returns: |
| 55 | + (Message): Message object |
| 56 | + """ |
| 57 | + return pickle.loads(_jar) |
| 58 | + |
| 59 | + |
| 60 | +# serialization of a normal package |
| 61 | +p1 = Preserver() |
| 62 | +message = Message() |
| 63 | +message.printout() |
| 64 | +jar = p1.can(message) |
| 65 | + |
| 66 | +# sending or storing would happen here |
| 67 | +p2 = Preserver() |
| 68 | +message = None |
| 69 | +message = p2.uncan(jar) |
| 70 | +message.printout() |
| 71 | + |
| 72 | +##################### |
| 73 | +# exploiting above code example |
| 74 | +##################### |
| 75 | +print("-" * 10) |
| 76 | +print("Attacker trying to read the message") |
| 77 | +message = pickle.loads(jar) |
| 78 | +message.printout() |
| 79 | + |
| 80 | +print("-" * 10) |
| 81 | +if platform.system() == "Windows": |
| 82 | + PAYLOAD = b"""cos |
| 83 | +system |
| 84 | +(S'calc.exe' |
| 85 | +tR.""" |
| 86 | +else: |
| 87 | + PAYLOAD = b"""cos |
| 88 | +system |
| 89 | +(S'whoami;uptime;uname -a;ls -la /etc/shadow' |
| 90 | +tR.""" |
| 91 | +print("Attacker trying to inject PAYLOAD") |
| 92 | +p3 = Preserver() |
| 93 | +message = None |
| 94 | +message = p3.uncan(PAYLOAD) |
| 95 | +``` |
| 96 | + |
| 97 | +The deserializating `Preserver.uncan()` method has no solution to verify the content prior to unpickling it and runs the PAYLOAD even before turning it into an object. On Windows you have `calc.exe` and on Unix a bunch of commands such as `uname -a and ls -la /etc/shadow`. |
| 98 | + |
| 99 | +> [!CAUTION] |
| 100 | +> The `compliant01.py` code only demonstrates integrity protection with hmac. |
| 101 | +> The pickled object is not encrypted and key-handling is inappropriate! |
| 102 | +> Consider using proper key management with `x509` and encryption [[pyca/cryptography 2023]](https://cryptography.io/en/latest/). |
| 103 | +
|
| 104 | +*[compliant01.py](compliant01.py):* |
| 105 | + |
| 106 | +```py |
| 107 | +""" Compliant Code Example """ |
| 108 | +import hashlib |
| 109 | +import hmac |
| 110 | +import platform |
| 111 | +import pickle |
| 112 | +import secrets |
| 113 | + |
| 114 | + |
| 115 | +class Message(object): |
| 116 | + """Sample Message Object""" |
| 117 | + sender_id = 42 |
| 118 | + text = "Some text" |
| 119 | + |
| 120 | + def printout(self): |
| 121 | + """prints content to stdout to demonstrate active content""" |
| 122 | + print(f"Message:sender_id={self.sender_id} text={self.text}") |
| 123 | + |
| 124 | + |
| 125 | +class Preserver(object): |
| 126 | + """Demonstrating deserialisation""" |
| 127 | + def __init__(self, _key): |
| 128 | + self._key = _key |
| 129 | + |
| 130 | + def can(self, _message: Message) -> tuple: |
| 131 | + """Serializes a Message object. |
| 132 | + Parameters: |
| 133 | + _message (Message): Message object |
| 134 | + Returns: |
| 135 | + _digest (String): HMAC digest string |
| 136 | + _jar (bytes): pickled jar as string |
| 137 | + """ |
| 138 | + _jar = pickle.dumps(_message) |
| 139 | + _digest = hmac.new(self._key, _jar, hashlib.sha256).hexdigest() |
| 140 | + return _digest, _jar |
| 141 | + |
| 142 | + def uncan(self, _expected_digest, _jar) -> Message: |
| 143 | + """Verifies and de-serializes a Message object. |
| 144 | + Parameters: |
| 145 | + _expected_digest (String): Message HMAC digest |
| 146 | + _jar (bytes): Pickled jar |
| 147 | + Returns: |
| 148 | + (Message): Message object |
| 149 | + """ |
| 150 | + _digest = hmac.new(self._key, _jar, hashlib.sha256).hexdigest() |
| 151 | + if _expected_digest != _digest: |
| 152 | + raise ValueError("Integrity of jar compromised") |
| 153 | + return pickle.loads(_jar) |
| 154 | + |
| 155 | + |
| 156 | +# serialization of a normal package |
| 157 | +key = secrets.token_bytes() |
| 158 | +print(f"key={key}") |
| 159 | +p1 = Preserver(key) |
| 160 | +message = Message() |
| 161 | +message.printout() |
| 162 | +digest, jar = p1.can(message) |
| 163 | + |
| 164 | +# sending or storing would happen here |
| 165 | +p2 = Preserver(key) |
| 166 | +message = None |
| 167 | +message = p2.uncan(digest, jar) |
| 168 | +message.printout() |
| 169 | + |
| 170 | +##################### |
| 171 | +# exploiting above code example |
| 172 | +##################### |
| 173 | +print("-" * 10) |
| 174 | +print("Attacker trying to read the message") |
| 175 | +message = pickle.loads(jar) |
| 176 | +message.printout() |
| 177 | + |
| 178 | +print("-" * 10) |
| 179 | +if platform.system() == "Windows": |
| 180 | + PAYLOAD = b"""cos |
| 181 | +system |
| 182 | +(S'calc.exe' |
| 183 | +tR.""" |
| 184 | +else: |
| 185 | + PAYLOAD = b"""cos |
| 186 | +system |
| 187 | +(S'whoami;uptime;uname -a;ls -la /etc/shadow' |
| 188 | +tR.""" |
| 189 | +print("Attacker trying to inject PAYLOAD") |
| 190 | +p3 = Preserver(b"dont know") |
| 191 | +message = None |
| 192 | +message = p3.uncan(digest, PAYLOAD) |
| 193 | +``` |
| 194 | + |
| 195 | +The integrity verification in `compliant01.py` throws an exception `ValueError: Integrity of jar compromised prior to deserializationunpickling to prevent the PAYLOAD executed.` |
| 196 | + |
| 197 | +## Compliant Solution JSON without pickle |
| 198 | + |
| 199 | +Text-based formats, such as `JSON` and `YAML`, should always be preferred. They have a lower set of capabilities and reduce the attack surface [python.org comparison-with-json 2023] when compared to `pickle`. |
| 200 | + |
| 201 | +The `compliant02.py` code only allows serializing and deserialization of object data and not object methods as in `noncompliant01.py` or `compliant01.py`. |
| 202 | + |
| 203 | +Consider converting binary data into text using `Base64` encoding for performance and size irrelevant operations. |
| 204 | + |
| 205 | +*[compliant02.py](compliant02.py):* |
| 206 | + |
| 207 | +```py |
| 208 | +""" Compliant Code Example """ |
| 209 | +import platform |
| 210 | +import json |
| 211 | + |
| 212 | + |
| 213 | +class Message(object): |
| 214 | + """Sample Message Object""" |
| 215 | + sender_id = int() |
| 216 | + text = str() |
| 217 | + |
| 218 | + def __init__(self): |
| 219 | + self.sender_id = 42 |
| 220 | + self.text = "Some text" |
| 221 | + |
| 222 | + def printout(self): |
| 223 | + print(f"sender_id: {self.sender_id}\ntext: {self.text}") |
| 224 | + |
| 225 | + |
| 226 | +class Preserver(object): |
| 227 | + """Demonstrating deserialisation""" |
| 228 | + |
| 229 | + def can(self, _message: Message) -> str: |
| 230 | + """Serializes a Message object. |
| 231 | + Parameters: |
| 232 | + _message (Message): Message object |
| 233 | + Returns: |
| 234 | + _jar (bytes): jar as string |
| 235 | + """ |
| 236 | + return json.dumps(vars(_message)) |
| 237 | + |
| 238 | + def uncan(self, _jar) -> Message: |
| 239 | + """Verifies and de-serializes a Message object. |
| 240 | + Parameters: |
| 241 | + _jar (String): Pickled jar |
| 242 | + Returns: |
| 243 | + (Message): Message object |
| 244 | + """ |
| 245 | + j = json.loads(_jar) |
| 246 | + _message = Message() |
| 247 | + _message.sender_id = int(j["sender_id"]) |
| 248 | + _message.text = str(j["text"]) |
| 249 | + return _message |
| 250 | + |
| 251 | + |
| 252 | +# serialization of a normal package |
| 253 | +p1 = Preserver() |
| 254 | +message = Message() |
| 255 | +jar = p1.can(message) |
| 256 | +print(jar) |
| 257 | +print(type(json.loads(jar))) |
| 258 | + |
| 259 | +# sending or storing would happen here |
| 260 | +p2 = Preserver() |
| 261 | +message = None |
| 262 | +message = p2.uncan(jar) |
| 263 | +message.printout() |
| 264 | +print(message.sender_id) |
| 265 | + |
| 266 | +##################### |
| 267 | +# exploiting above code example |
| 268 | +##################### |
| 269 | +print("-" * 10) |
| 270 | +print("Attacker trying to read the message") |
| 271 | +print(jar) |
| 272 | +message.printout() |
| 273 | + |
| 274 | +print("-" * 10) |
| 275 | +if platform.system() == "Windows": |
| 276 | + PAYLOAD = b"""cos |
| 277 | +system |
| 278 | +(S'calc.exe' |
| 279 | +tR.""" |
| 280 | +else: |
| 281 | + PAYLOAD = b"""cos |
| 282 | +system |
| 283 | +(S'whoami;uptime;uname -a;ls -la /etc/shadow' |
| 284 | +tR.""" |
| 285 | +print("Attacker trying to inject PAYLOAD") |
| 286 | +p3 = Preserver() |
| 287 | +message = None |
| 288 | +message = p3.uncan(PAYLOAD) |
| 289 | +``` |
| 290 | + |
| 291 | +The `compliant02.py` stops with the unpacking with a `json.decoder.JSONDecodeError`. |
| 292 | + |
| 293 | +## Exceptions |
| 294 | + |
| 295 | +Serialized data from a trusted input source does not require sanitization, provided that the code clearly documents that it relies on the input source being trustworthy. For example, if a library is being audited, a routine of that library may have a documented precondition that its callers pre-sanitize any passed-in serialized data or confirm the input source as trustworthy. |
| 296 | + |
| 297 | +## Automated Detection |
| 298 | + |
| 299 | +|Tool|Version|Checker|Description| |
| 300 | +|:----|:----|:----|:----| |
| 301 | +|Bandit|1.7.4|B301|Pickle and modules that wrap it can be unsafe when used to de-serialize untrusted data, possible security issue.Bandit can only detect a pickle module in use and is unable to detect an acceptable implementation code that combines pickle with `hmac` and proper key managment.| |
| 302 | + |
| 303 | +## Related Vulnerabilities |
| 304 | + |
| 305 | +|Product|CVE|Description|CVSS Rating|Comment| |
| 306 | +|:----|:----|:----|:----|:----| |
| 307 | +|TensorFlow using the pickle module|[CVE-2021-37678](https://www.cvedetails.com/cve/CVE-2021-37678/)|TensorFlow machine learning platform allows code execution when de-serializing a Keras model from `YAML` format.|v3.1: 8.8 High|| |
| 308 | +|NVFLARE < 2.1.4|[CVE-2022-34668](https://www.cvedetails.com/cve/CVE-2022-34668/)|Deserialization of Untrusted Data with Pickle may allow an unprivileged network attacker to cause Remote Code Execution (RCE).|v3.1: 9.8 Critical|Exploit available on [exploit-db.com](https://www.exploit-db.com/exploits/51051)| |
| 309 | +|Graphite 0.9.5 through 0.9.10|[CVE-2013-5093](https://www.cvedetails.com/cve/CVE-2013-5093/)|The renderLocalView function in render/views.py uses the pickle Python module unsafely, which allows remote attackers to execute arbitrary code via a crafted serialized object|n/a|Exploit available on [exploit-db.com](https://www.exploit-db.com/exploits/27752)| |
| 310 | +|Superset prior to 0.23|[CVE-2018-8021](https://www.cvedetails.com/cve/CVE-2018-8021/)|TUnsafe load method from the pickle library to deserialize data leading to possible RCE|v3.1: 9.8 Critical|Exploit available on [exploit-db.com](https://www.exploit-db.com/exploits/45933)| |
| 311 | +|rpc.py through 0.6.0|[CVE-2022-35411](https://www.cvedetails.com/cve/CVE-2022-35411/)|HTTP HEADERS set to `"serializer: pickle"` triggers `rcp.py` to de-serialize with `pickle` instead of the default `JSON` allowing Allows Remote Code Execution|v3.1:9.8 Critical|Exploit available on [https://github.com/](https://github.com/ehtec/rpcpy-exploit/blob/main/rpcpy-exploit.py)| |
| 312 | + |
| 313 | +## Related Guidelines |
| 314 | + |
| 315 | +||| |
| 316 | +|:---|:---| |
| 317 | +|[SEI CERT Coding Standard for Java](https://wiki.sei.cmu.edu/confluence/display/java/SEI+CERT+Oracle+Coding+Standard+for+Java)|[SER01-J. Do not deviate from the proper signatures of serialization methods](https://wiki.sei.cmu.edu/confluence/display/java/SER01-J.+Do+not+deviate+from+the+proper+signatures+of+serialization+methods)| |
| 318 | +|[MITRE CWE](http://cwe.mitre.org/)|Pillar [CWE-664: Improper Control of a Resource Through its Lifetime (4.13) (mitre.org)](https://cwe.mitre.org/data/definitions/664.html)| |
| 319 | +|[MITRE CWE](http://cwe.mitre.org/)|Base [CWE-502, Deserialization of Untrusted Data](http://cwe.mitre.org/data/definitions/502.html)| |
| 320 | + |
| 321 | +## Biblography |
| 322 | + |
| 323 | +||| |
| 324 | +|:---|:---| |
| 325 | +|[[docs.python.org 2023]](https://docs.python.org/)|pickle — Python object serialization. Available from: <https://docs.python.org/3.9/library/pickle.html> \[Accessed 07 May 2024]| |
| 326 | +|[python.org comparison-with-json 2023]|pickle - Comparison with JSON. Available from: <https://docs.python.org/3.9/library/pickle.html#comparison-with-json> \[Acessed 07 May 2024]| |
| 327 | +|[pyca/cryptography 2023]|Welcome to pyca/cryptography. Available from: <https://cryptography.io/en/latest/> \[Acessed 07 May 2024]| |
0 commit comments