model-threats: fix image links

jfrog · Mar 3, 2025 · 0111e05 · 0111e05
1 parent 59b0381
commit 0111e05
Show file tree

Hide file tree

Showing 11 changed files with 175 additions and 3 deletions.
diff --git a/model-threats/GGUF-SSTI.md b/model-threats/GGUF-SSTI.md
@@ -0,0 +1,56 @@
+---
+description: GGUF model attempting template injection for arbitrary code execution
+title: GGUF-SSTI
+type: modelThreat
+---
+
+
+## Overview
+
+A GGUF model may contain a [Jinja2](https://palletsprojects.com/projects/jinja/) template which may cause server-side template injection ([SSTI](https://portswigger.net/web-security/server-side-template-injection)) that leads to **execution of malicious Python code** when the model is loaded.
+
+<u>Important Note</u> - The only publicly known case where loading a GGUF model leads to dangerous server-side template injection is related to the [CVE-2024-34359](https://github.com/abetlen/llama-cpp-python/security/advisories/GHSA-56xg-wfcc-g829) ("Llama Drama") vulnerability. This vulnerability is only exploitable when loading a GGUF model using a vulnerable version of the [llama-cpp-python](https://pypi.org/project/llama-cpp-python/) library (`llama-cpp-python` < 0.2.72). This means that arbitrary code execution <u>will not occur</u> when loading a malicious GGUF model into `llama-cpp-python` >= 0.2.72 or into other GGUF-compatible libraries, such as [ollama](https://ollama.com/download) .
+
+The GGUF model format is a binary format optimized for quickly loading and storing models. One of the features of the GGUF model format, is the ability to store a **chat template** in the Model's metadata (`tokenizer.chat_template`).
+
+![](/img/chat_template.png)
+
+A chat template simply defines how a chat (prompt) interaction with the model will look like. The chat template in a GGUF model is written using the Jinja2 template language. Since the Jinja2 template language supports execution of arbitrary Python code, loading arbitrary Jinja2 templates from a GGUF model into an **unsandboxed** Jinja2 engine leads to arbitrary code execution.
+
+For example - the following Jinja2 template, which can be inserted into a GGUF model's `chat_template` metadata parameter, will run the shell command `touch /tmp/retr0reg` if the GGUF model is loaded in an unsandboxed environment -
+
+```
+{% for x in ().__class__.__base__.__subclasses__() %}{% if "warning" in x.__name__ %}{{x()._module.__builtins__['__import__']('os').popen("touch /tmp/retr0reg")}}{%endif%}{% endfor %}
+```
+
+
+
+## Time of Infection
+
+**[v] Model Load**
+
+[] Model Query
+
+[] Other
+
+
+
+## Evidence Extraction and False Positive Elimination
+
+To safely determine if the suspected GGUF model contains a malicious Jinja2 template -
+
+1. Parse the GGUF model's metadata parameters and extract the `tokenizer.chat_template` string
+
+2. Inspect the chat template data for suspicious strings such as `__class__`, `os`, `subprocess`, `eval` and `exec`
+
+
+
+JFrog conducts metadata extraction and detailed analysis on each GGUF model in order to determine whether any malicious code is present.
+
+
+
+## Additional Information
+
+* https://github.com/abetlen/llama-cpp-python/security/advisories/GHSA-56xg-wfcc-g829
+* https://github.com/huggingface/smol-course/blob/main/1_instruction_tuning/chat_templates.md
+* https://techtonics.medium.com/secure-templating-with-jinja2-understanding-ssti-and-jinja2-sandbox-environment-b956edd60456
diff --git a/model-threats/H5-LAMBDA.md b/model-threats/H5-LAMBDA.md
@@ -0,0 +1,58 @@
+---
+description: TensorFlow H5 model with Lambda Layers containing malicious code
+title: H5-LAMBDA
+type: modelThreat
+---
+
+
+## Overview
+
+A TensorFlow HDF5/H5 model may contain a "Lambda" layer, which contains embedded Python code in binary format. **This code may contain malicious instructions** which will be executed when the model is loaded.
+
+The HDF5/H5 format is a legacy format used by TensorFlow and Keras to store ML models.
+
+![](/img/hdf5_format.png)
+
+Internally, this format contains an embedded JSON section called `model_config` which specifies the configuration of the ML Model.
+
+The Model Configuration specifies all the layers of the model, and may specify a **Lambda** layer.
+
+The Lambda layer specifies custom operations defined by the model author, which are defined simply by a raw Python code object (Python Bytecode).
+
+![](/img/hdf5_lambda.png)
+
+**Since arbitrary Python Bytecode can contain any operation, including malicious operations**, loading an untrusted HDF5/H5 Model is considered to be dangerous.
+
+
+
+## Time of Infection
+
+**[v] Model Load**
+
+[] Model Query
+
+[] Other
+
+
+
+## Evidence Extraction and False Positive Elimination
+
+To safely determine if the suspected HDF5 model contains malicious code -
+
+1. Parse the `model_config` JSON embedded in the HDF5 model to identify `Lambda` layers
+
+2. Extract and decode the Base64-encoded data of the `Lambda` layer to obtain a Python code object
+
+3. Decompile the raw Python code object, ex. using [pycdc](https://github.com/zrax/pycdc)
+
+4. Examine the decompiled Python code to determine if it contains any malicious instructions
+
+
+
+JFrog conducts extraction, decompilation and detailed analysis on each TensorFlow HDF5 model in order to determine whether any malicious code is present.
+
+
+
+## Additional Information
+
+* https://hiddenlayer.com/innovation-hub/models-are-code/#Code-Execution-via-Lambda
diff --git a/model-threats/KERAS-LAMBDA.md b/model-threats/KERAS-LAMBDA.md
@@ -0,0 +1,58 @@
+---
+description: Keras model with Lambda Layers containing malicious code
+title: KERAS-LAMBDA
+type: modelThreat
+---
+
+
+## Overview
+
+A Keras model may contain a "Lambda" layer, which contains embedded Python code in binary format. **This code may contain malicious instructions** which will be executed when the model is loaded.
+
+The Keras v3 format is the latest format used by TensorFlow and Keras to store ML models.
+
+![](/img/kerasv3_format.png)
+
+Internally, this format is a ZIP archive which contains a JSON file called `config.json` which specifies the configuration of the ML Model.
+
+The Model Configuration specifies all the layers of the model, and may specify a **Lambda** layer.
+
+The Lambda layer specifies custom operations defined by the model author, which are defined simply by a raw Python code object (Python Bytecode).
+
+![](/img/hdf5_lambda.png)
+
+**Since arbitrary Python Bytecode can contain any operation, including malicious operations**, loading an untrusted Keras v3 Model is considered to be dangerous.
+
+
+
+## Time of Infection
+
+**[v] Model Load**
+
+[] Model Query
+
+[] Other
+
+
+
+## Evidence Extraction and False Positive Elimination
+
+To safely determine if the suspected Keras v3 model contains malicious code -
+
+1. Extract and parse the `config.json` file from the Keras v3 model zip archive, to identify `Lambda` layers
+
+2. Extract and decode the Base64-encoded data of the `Lambda` layer to obtain a Python code object
+
+3. Decompile the raw Python code object, ex. using [pycdc](https://github.com/zrax/pycdc)
+
+4. Examine the decompiled Python code to determine if it contains any malicious instructions
+
+
+
+JFrog conducts extraction, decompilation and detailed analysis on each Keras v3 model in order to determine whether any malicious code is present.
+
+
+
+## Additional Information
+
+* https://hiddenlayer.com/innovation-hub/models-are-code/#Code-Execution-via-Lambda
diff --git a/model-threats/PICKLE-GETATTR.md b/model-threats/PICKLE-GETATTR.md
@@ -13,7 +13,7 @@ Many ML model formats such as PyTorch, JobLib, NumPy and more, use Python's Pick
 
 The Pickle format is well-known to be a **dangerous** serialization format, since in addition to serialized data, it may contain serialized code which will be automatically executed when the Pickled/Serialized file is loaded.
 
-![](img/pickle_deserialization.png)
+![](/img/pickle_deserialization.png)
 
 Specifically - the potentially malicious Python code may contain a reference to the `getattr` function, which is considered a malicious function by many ML model scanners.
 

diff --git a/model-threats/PYTORCH-GETATTR.md b/model-threats/PYTORCH-GETATTR.md
@@ -11,11 +11,11 @@ A PyTorch model contains serialized [Pickle](https://docs.python.org/3/library/p
 
 The PyTorch model format internally uses Python's Pickle data serialization format. 
 
-![](img/pytorch_format.png)
+![](/img/pytorch_format.png)
 
 The Pickle format is well-known to be a **dangerous** serialization format, since in addition to serialized data, it may contain serialized code which will be automatically executed when the Pickled/Serialized file is loaded.
 
-![](img/pickle_deserialization.png)
+![](/img/pickle_deserialization.png)
 
 Specifically - the potentially malicious Python code may contain a reference to the `getattr` function, which is considered a malicious function by many ML model scanners.
 

diff --git a/model-threats/img/pickle_deserialization.png b/model-threats/img/pickle_deserialization.png
diff --git a/model-threats/img/pytorch_format.png b/model-threats/img/pytorch_format.png
diff --git a/static/img/chat_template.png b/static/img/chat_template.png
diff --git a/static/img/hdf5_format.png b/static/img/hdf5_format.png
diff --git a/static/img/hdf5_lambda.png b/static/img/hdf5_lambda.png
diff --git a/static/img/kerasv3_format.png b/static/img/kerasv3_format.png