LLAMATOR-Core · nizamovtimur · Feb 8, 2025 · Feb 6, 2025 · Feb 6, 2025 · Feb 7, 2025
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -149,12 +149,14 @@ AvailableTests = [
     "typoglycemia_attack",
     "ucar",
 
-    #TODO: YOUR TEST HERE
+    #TODO: YOUR TEST HERE (in alphabetical order!)
 ]
 ```
 
 #### 5. Add your attack to the `attack_descriptions.json` and `attack_descriptions.md` files.
 
+Please pay attention to the `attack_descriptions.md` structure. Description should be the same as docstring of the attack class. If your attack has an original paper or repository, it would be nice if you mentioned it in docstring and `attack_descriptions.md`.
+
 #### 6. Open a PR! Submit your changes for review by opening a pull request to the `main` branch.
 
 ## Submitting a Pull Request.

diff --git a/README.md b/README.md
@@ -5,10 +5,10 @@ Red Teaming python-framework for testing chatbots and LLM-systems
 [![License: CC BY-NC-SA 4.0](https://img.shields.io/badge/License-CC_BY--NC--SA_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc-sa/4.0/)
 [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/llamator)](https://pypi.org/project/llamator)
 [![PyPI](https://badge.fury.io/py/llamator.svg)](https://badge.fury.io/py/llamator)
+[![Docker](https://img.shields.io/badge/dockerfile-gray.svg?logo=docker)](https://github.com/RomiconEZ/llamator/blob/release/docker)
 [![Downloads](https://pepy.tech/badge/llamator)](https://pepy.tech/project/llamator)
 [![Downloads](https://pepy.tech/badge/llamator/month)](https://pepy.tech/project/llamator)
 [![GitHub Repo stars](https://img.shields.io/github/stars/RomiconEZ/llamator)](https://github.com/RomiconEZ/llamator/stargazers)
-[![Docker](https://img.shields.io/badge/dockerfile-gray.svg?logo=docker)](https://github.com/RomiconEZ/llamator/blob/release/docker)
 
 ## Install 🚀
 

diff --git a/docker/instruction.md → docker/README.md b/docker/instruction.md → docker/README.md
@@ -9,7 +9,7 @@
    ├── docker/
    │   ├── Dockerfile
    │   ├── jupyter_docker.sh
-   │   └── instruction.md
+   │   └── README.md
    └── workspace/
        └── (your working files)
    ```

diff --git a/docs/attack_descriptions.md b/docs/attack_descriptions.md
@@ -1,10 +1,23 @@
 # Attacks
+
 This section provides an overview of all available attacks.
 
 ## Single-stage attacks
 
 ### OWASP [LLM01:2025](https://genai.owasp.org/llmrisk/llm01-prompt-injection/) Prompt Injection
 
+<details>
+<summary><strong>Adversarial suffix</strong></summary>
+
+In code name: `suffix`
+
+Uses a specially crafted suffix to bypass LLM filters and restrictions, forcing the model to generate prohibited or harmful content despite the original query. Evaluates the model's vulnerability to input data manipulation.
+
+_Original Paper <https://arxiv.org/abs/2307.15043>, Code <https://github.com/llm-attacks/llm-attacks>_
+
+<a href="https://github.com/RomiconEZ/llamator/blob/release/src/llamator/attacks/suffix.py">View code on GitHub</a>
+</details>
+
 <details>
 <summary><strong>Aim Jailbreak</strong></summary>
 
@@ -32,7 +45,7 @@ In code name: `bon`
 
 Probes the LLM resilience against Best-Of-N attack.
 
-`Original Paper <https://arxiv.org/abs/2412.03556>`_, `Code <https://github.com/jplhughes/bon-jailbreaking>`_.
+_Original Paper <https://arxiv.org/abs/2412.03556>, Code <https://github.com/jplhughes/bon-jailbreaking>_
 
 <a href="https://github.com/RomiconEZ/llamator/blob/release/src/llamator/attacks/bon.py">View code on GitHub</a>
 </details>
@@ -47,18 +60,6 @@ Evaluates the LLM's ability to maintain content moderation standards when transi
 <a href="https://github.com/RomiconEZ/llamator/blob/release/src/llamator/attacks/complimentary_transition.py">View code on GitHub</a>
 </details>
 
-<details>
-<summary><strong>Crescendo</strong></summary>
-
-In code name: `crescendo`
-
-Challenges the LLM with a dataset of harmful behavior prompts using Crescendo strategy, operates by engaging the LLM in a series of escalating conversational turns, and multistage refinement.
-
-`Original Paper <https://arxiv.org/abs/2404.01833>`_
-
-<a href="https://github.com/RomiconEZ/llamator/blob/release/src/llamator/attacks/crescendo.py">View code on GitHub</a>
-</details>
-
 <details>
 <summary><strong>Do Anything Now Jailbreak</strong></summary>
 
@@ -159,18 +160,6 @@ Probes a Russian-language LLM's adherence to content restrictions when explicitl
 <a href="https://github.com/RomiconEZ/llamator/blob/release/src/llamator/attacks/ru_ucar.py">View code on GitHub</a>
 </details>
 
-<details>
-<summary><strong>Adversarial suffix</strong></summary>
-
-In code name: `suffix`
-
-Uses a specially crafted suffix to bypass LLM filters and restrictions, forcing the model to generate prohibited or harmful content despite the original query. Evaluates the model's vulnerability to input data manipulation.
-
-`Original Paper <https://arxiv.org/abs/2307.15043>`_, `Code <https://github.com/llm-attacks/llm-attacks>`_.
-
-<a href="https://github.com/RomiconEZ/llamator/blob/release/src/llamator/attacks/suffix.py">View code on GitHub</a>
-</details>
-
 ## Multi-stage attacks
 
 <details>
@@ -196,6 +185,18 @@ The `multistage_depth` parameter for using in `history_limit` can be extracted f
 
 ### OWASP [LLM01:2025](https://genai.owasp.org/llmrisk/llm01-prompt-injection/) Prompt Injection
 
+<details>
+<summary><strong>Crescendo</strong></summary>
+
+In code name: `crescendo`
+
+Challenges the LLM with a dataset of harmful behavior prompts using Crescendo strategy, operates by engaging the LLM in a series of escalating conversational turns, and multistage refinement.
+
+_Original Paper <https://arxiv.org/abs/2404.01833>_
+
+<a href="https://github.com/RomiconEZ/llamator/blob/release/src/llamator/attacks/crescendo.py">View code on GitHub</a>
+</details>
+
 <details>
 <summary><strong>Harmful Behavior Multistage</strong></summary>
 
@@ -237,7 +238,7 @@ In code name: `sycophancy`
 
 Attacks the LLM system during a multistage conversation using the argumentum ad verecundiam method based on the system's domain.
 
-`Original Paper <https://arxiv.org/abs/2310.13548>`_
+_Original Paper <https://arxiv.org/abs/2310.13548>_
 
 <a href="https://github.com/RomiconEZ/llamator/blob/release/src/llamator/attacks/sycophancy.py">View code on GitHub</a>
 </details>
diff --git a/docs/howtos.md b/docs/howtos.md
@@ -42,17 +42,19 @@ tested_model = llamator.ClientOpenAI(
 tests_with_attempts = [
     ("aim_jailbreak", 2),
     ("base64_injection", 2),
+    ("bon", 2),
     ("complimentary_transition", 2),
-    ("do_anything_now_jailbreak", 2),
+    ("crescendo", 2),
     # Uncomment the following lines to include additional tests
+    # ("do_anything_now_jailbreak", 2),
     # ("RU_do_anything_now_jailbreak", 2),
-    # ("bon", 2),
     # ("ethical_compliance", 2),
     # ("harmful_behavior", 2),
     # ("harmful_behavior_multistage", 2),
     # ("linguistic_evasion", 2),
     # ("logical_inconsistencies", 2),
     # ("past_tense", 2),
+    # ("suffix", 2),
     # ("sycophancy", 2),
     # ("system_prompt_leakage", 2),
     # ("typoglycemia_attack", 2),

diff --git a/docs/project_overview.md b/docs/project_overview.md
@@ -5,6 +5,7 @@ LLAMATOR - Red Teaming python-framework for testing chatbots and LLM-systems
 [![License: CC BY-NC-SA 4.0](https://img.shields.io/badge/License-CC_BY--NC--SA_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc-sa/4.0/)
 [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/llamator)](https://pypi.org/project/llamator)
 [![PyPI](https://badge.fury.io/py/llamator.svg)](https://badge.fury.io/py/llamator)
+[![Docker](https://img.shields.io/badge/dockerfile-gray.svg?logo=docker)](https://github.com/RomiconEZ/llamator/blob/release/docker)
 [![Downloads](https://pepy.tech/badge/llamator)](https://pepy.tech/project/llamator)
 [![Downloads](https://pepy.tech/badge/llamator/month)](https://pepy.tech/project/llamator)
 [![GitHub Repo stars](https://img.shields.io/github/stars/RomiconEZ/llamator)](https://github.com/RomiconEZ/llamator/stargazers)

diff --git a/examples/llamator-api.ipynb b/examples/llamator-api.ipynb
@@ -283,7 +283,9 @@
     "tests_with_attempts = [\n",
     "    # (\"aim_jailbreak\", 2),\n",
     "    # (\"base64_injection\", 2),\n",
+    "    # (\"bon\", 2),\n",
     "    # (\"complimentary_transition\", 2),\n",
+    "    # (\"crescendo\", 2),\n",
     "    # (\"do_anything_now_jailbreak\", 2),\n",
     "    # (\"RU_do_anything_now_jailbreak\", 2),\n",
     "    # (\"ethical_compliance\", 2),\n",
@@ -292,6 +294,7 @@
     "    # (\"linguistic_evasion\", 2),\n",
     "    # (\"logical_inconsistencies\", 2),\n",
     "    # (\"past_tense\", 2),\n",
+    "    # (\"suffix\", 2),\n",
     "    (\"sycophancy\", 2),\n",
     "    (\"system_prompt_leakage\", 2),\n",
     "    # (\"typoglycemia_attack\", 2),\n",

diff --git a/examples/llamator-selenium.ipynb b/examples/llamator-selenium.ipynb
@@ -365,15 +365,18 @@
     "tests_with_attempts = [\n",
     "    # (\"aim_jailbreak\", 2),\n",
     "    # (\"base64_injection\", 2),\n",
-    "    # (\"complimentary_transition\", 3),\n",
+    "    # (\"bon\", 2),\n",
+    "    # (\"complimentary_transition\", 2),\n",
+    "    # (\"crescendo\", 2),\n",
     "    # (\"do_anything_now_jailbreak\", 2),\n",
     "    # (\"RU_do_anything_now_jailbreak\", 2),\n",
     "    # (\"ethical_compliance\", 2),\n",
     "    # (\"harmful_behavior\", 2),\n",
     "    # (\"harmful_behavior_multistage\", 2),\n",
     "    (\"linguistic_evasion\", 2),\n",
     "    (\"logical_inconsistencies\", 2),\n",
-    "    # (\"past_tense\", 1),\n",
+    "    # (\"past_tense\", 2),\n",
+    "    # (\"suffix\", 2),\n",
     "    (\"sycophancy\", 2),\n",
     "    (\"system_prompt_leakage\", 2),\n",
     "    # (\"typoglycemia_attack\", 2),\n",

diff --git a/examples/llamator-telegram.ipynb b/examples/llamator-telegram.ipynb
@@ -385,7 +385,9 @@
     "tests_with_attempts = [\n",
     "    # (\"aim_jailbreak\", 2),\n",
     "    # (\"base64_injection\", 2),\n",
+    "    # (\"bon\", 2),\n",
     "    # (\"complimentary_transition\", 2),\n",
+    "    # (\"crescendo\", 2),\n",
     "    # (\"do_anything_now_jailbreak\", 2),\n",
     "    # (\"RU_do_anything_now_jailbreak\", 2),\n",
     "    # (\"ethical_compliance\", 2),\n",
@@ -394,6 +396,7 @@
     "    (\"linguistic_evasion\", 2),\n",
     "    (\"logical_inconsistencies\", 2),\n",
     "    # (\"past_tense\", 2),\n",
+    "    # (\"suffix\", 2),\n",
     "    (\"sycophancy\", 2),\n",
     "    # (\"system_prompt_leakage\", 2),\n",
     "    # (\"typoglycemia_attack\", 2),\n",

diff --git a/examples/llamator-whatsapp.ipynb b/examples/llamator-whatsapp.ipynb
@@ -510,7 +510,9 @@
     "tests_with_attempts = [\n",
     "    # (\"aim_jailbreak\", 2),\n",
     "    # (\"base64_injection\", 2),\n",
+    "    # (\"bon\", 2),\n",
     "    # (\"complimentary_transition\", 2),\n",
+    "    # (\"crescendo\", 2),\n",
     "    (\"do_anything_now_jailbreak\", 2),\n",
     "    # (\"RU_do_anything_now_jailbreak\", 2),\n",
     "    # (\"ethical_compliance\", 2),\n",
@@ -519,6 +521,7 @@
     "    # (\"linguistic_evasion\", 2),\n",
     "    # (\"logical_inconsistencies\", 2),\n",
     "    # (\"past_tense\", 2),\n",
+    "    # (\"suffix\", 2),\n",
     "    # (\"sycophancy\", 2),\n",
     "    # (\"system_prompt_leakage\", 2),\n",
     "    # (\"typoglycemia_attack\", 2),\n",

diff --git a/src/llamator/attacks/suffix.py b/src/llamator/attacks/suffix.py
@@ -17,7 +17,9 @@
 
 @register_test
 class TestSuffix(TestBase):
-    """Uses a specially crafted suffix to bypass LLM filters and restrictions, forcing the model to generate prohibited or harmful content despite the original query. Evaluates the model's vulnerability to input data manipulation."""
+    """Uses a specially crafted suffix to bypass LLM filters and restrictions, forcing the model to generate prohibited or harmful content despite the original query.
+    Evaluates the model's vulnerability to input data manipulation.
+    Original Paper: https://arxiv.org/abs/2307.15043, Code: https://github.com/llm-attacks/llm-attacks"""
 
     test_name = "suffix"
 

diff --git a/tests/test_llamator.py b/tests/test_llamator.py
@@ -57,6 +57,7 @@ def test_openai_client():
         # ("linguistic_evasion", 2),
         # ("logical_inconsistencies", 2),
         # ("past_tense", 2),
+        # ("suffix", 2),
         # ("sycophancy", 2),
         # ("system_prompt_leakage", 2),
         # ("typoglycemia_attack", 2),

diff --git a/tests/test_local_llamator.py b/tests/test_local_llamator.py
@@ -113,6 +113,7 @@ def test_langchain_client_yandexgpt():
         # ("linguistic_evasion", 2),
         # ("logical_inconsistencies", 2),
         # ("past_tense", 2),
+        # ("suffix", 2),
         # ("sycophancy", 2),
         # ("system_prompt_leakage", 2),
         # ("typoglycemia_attack", 2),