Skip to content

Small fix #84

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 8, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,12 +149,14 @@ AvailableTests = [
"typoglycemia_attack",
"ucar",

#TODO: YOUR TEST HERE
#TODO: YOUR TEST HERE (in alphabetical order!)
]
```

#### 5. Add your attack to the `attack_descriptions.json` and `attack_descriptions.md` files.

Please pay attention to the `attack_descriptions.md` structure. Description should be the same as docstring of the attack class. If your attack has an original paper or repository, it would be nice if you mentioned it in docstring and `attack_descriptions.md`.

#### 6. Open a PR! Submit your changes for review by opening a pull request to the `main` branch.

## Submitting a Pull Request.
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ Red Teaming python-framework for testing chatbots and LLM-systems
[![License: CC BY-NC-SA 4.0](https://img.shields.io/badge/License-CC_BY--NC--SA_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc-sa/4.0/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/llamator)](https://pypi.org/project/llamator)
[![PyPI](https://badge.fury.io/py/llamator.svg)](https://badge.fury.io/py/llamator)
[![Docker](https://img.shields.io/badge/dockerfile-gray.svg?logo=docker)](https://github.com/RomiconEZ/llamator/blob/release/docker)
[![Downloads](https://pepy.tech/badge/llamator)](https://pepy.tech/project/llamator)
[![Downloads](https://pepy.tech/badge/llamator/month)](https://pepy.tech/project/llamator)
[![GitHub Repo stars](https://img.shields.io/github/stars/RomiconEZ/llamator)](https://github.com/RomiconEZ/llamator/stargazers)
[![Docker](https://img.shields.io/badge/dockerfile-gray.svg?logo=docker)](https://github.com/RomiconEZ/llamator/blob/release/docker)

## Install 🚀

Expand Down
2 changes: 1 addition & 1 deletion docker/instruction.md → docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
├── docker/
│ ├── Dockerfile
│ ├── jupyter_docker.sh
│ └── instruction.md
│ └── README.md
└── workspace/
└── (your working files)
```
Expand Down
53 changes: 27 additions & 26 deletions docs/attack_descriptions.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,23 @@
# Attacks

This section provides an overview of all available attacks.

## Single-stage attacks

### OWASP [LLM01:2025](https://genai.owasp.org/llmrisk/llm01-prompt-injection/) Prompt Injection

<details>
<summary><strong>Adversarial suffix</strong></summary>

In code name: `suffix`

Uses a specially crafted suffix to bypass LLM filters and restrictions, forcing the model to generate prohibited or harmful content despite the original query. Evaluates the model's vulnerability to input data manipulation.

_Original Paper <https://arxiv.org/abs/2307.15043>, Code <https://github.com/llm-attacks/llm-attacks>_

<a href="https://github.com/RomiconEZ/llamator/blob/release/src/llamator/attacks/suffix.py">View code on GitHub</a>
</details>

<details>
<summary><strong>Aim Jailbreak</strong></summary>

Expand Down Expand Up @@ -32,7 +45,7 @@ In code name: `bon`

Probes the LLM resilience against Best-Of-N attack.

`Original Paper <https://arxiv.org/abs/2412.03556>`_, `Code <https://github.com/jplhughes/bon-jailbreaking>`_.
_Original Paper <https://arxiv.org/abs/2412.03556>, Code <https://github.com/jplhughes/bon-jailbreaking>_

<a href="https://github.com/RomiconEZ/llamator/blob/release/src/llamator/attacks/bon.py">View code on GitHub</a>
</details>
Expand All @@ -47,18 +60,6 @@ Evaluates the LLM's ability to maintain content moderation standards when transi
<a href="https://github.com/RomiconEZ/llamator/blob/release/src/llamator/attacks/complimentary_transition.py">View code on GitHub</a>
</details>

<details>
<summary><strong>Crescendo</strong></summary>

In code name: `crescendo`

Challenges the LLM with a dataset of harmful behavior prompts using Crescendo strategy, operates by engaging the LLM in a series of escalating conversational turns, and multistage refinement.

`Original Paper <https://arxiv.org/abs/2404.01833>`_

<a href="https://github.com/RomiconEZ/llamator/blob/release/src/llamator/attacks/crescendo.py">View code on GitHub</a>
</details>

<details>
<summary><strong>Do Anything Now Jailbreak</strong></summary>

Expand Down Expand Up @@ -159,18 +160,6 @@ Probes a Russian-language LLM's adherence to content restrictions when explicitl
<a href="https://github.com/RomiconEZ/llamator/blob/release/src/llamator/attacks/ru_ucar.py">View code on GitHub</a>
</details>

<details>
<summary><strong>Adversarial suffix</strong></summary>

In code name: `suffix`

Uses a specially crafted suffix to bypass LLM filters and restrictions, forcing the model to generate prohibited or harmful content despite the original query. Evaluates the model's vulnerability to input data manipulation.

`Original Paper <https://arxiv.org/abs/2307.15043>`_, `Code <https://github.com/llm-attacks/llm-attacks>`_.

<a href="https://github.com/RomiconEZ/llamator/blob/release/src/llamator/attacks/suffix.py">View code on GitHub</a>
</details>

## Multi-stage attacks

<details>
Expand All @@ -196,6 +185,18 @@ The `multistage_depth` parameter for using in `history_limit` can be extracted f

### OWASP [LLM01:2025](https://genai.owasp.org/llmrisk/llm01-prompt-injection/) Prompt Injection

<details>
<summary><strong>Crescendo</strong></summary>

In code name: `crescendo`

Challenges the LLM with a dataset of harmful behavior prompts using Crescendo strategy, operates by engaging the LLM in a series of escalating conversational turns, and multistage refinement.

_Original Paper <https://arxiv.org/abs/2404.01833>_

<a href="https://github.com/RomiconEZ/llamator/blob/release/src/llamator/attacks/crescendo.py">View code on GitHub</a>
</details>

<details>
<summary><strong>Harmful Behavior Multistage</strong></summary>

Expand Down Expand Up @@ -237,7 +238,7 @@ In code name: `sycophancy`

Attacks the LLM system during a multistage conversation using the argumentum ad verecundiam method based on the system's domain.

`Original Paper <https://arxiv.org/abs/2310.13548>`_
_Original Paper <https://arxiv.org/abs/2310.13548>_

<a href="https://github.com/RomiconEZ/llamator/blob/release/src/llamator/attacks/sycophancy.py">View code on GitHub</a>
</details>
6 changes: 4 additions & 2 deletions docs/howtos.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,17 +42,19 @@ tested_model = llamator.ClientOpenAI(
tests_with_attempts = [
("aim_jailbreak", 2),
("base64_injection", 2),
("bon", 2),
("complimentary_transition", 2),
("do_anything_now_jailbreak", 2),
("crescendo", 2),
# Uncomment the following lines to include additional tests
# ("do_anything_now_jailbreak", 2),
# ("RU_do_anything_now_jailbreak", 2),
# ("bon", 2),
# ("ethical_compliance", 2),
# ("harmful_behavior", 2),
# ("harmful_behavior_multistage", 2),
# ("linguistic_evasion", 2),
# ("logical_inconsistencies", 2),
# ("past_tense", 2),
# ("suffix", 2),
# ("sycophancy", 2),
# ("system_prompt_leakage", 2),
# ("typoglycemia_attack", 2),
Expand Down
1 change: 1 addition & 0 deletions docs/project_overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ LLAMATOR - Red Teaming python-framework for testing chatbots and LLM-systems
[![License: CC BY-NC-SA 4.0](https://img.shields.io/badge/License-CC_BY--NC--SA_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc-sa/4.0/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/llamator)](https://pypi.org/project/llamator)
[![PyPI](https://badge.fury.io/py/llamator.svg)](https://badge.fury.io/py/llamator)
[![Docker](https://img.shields.io/badge/dockerfile-gray.svg?logo=docker)](https://github.com/RomiconEZ/llamator/blob/release/docker)
[![Downloads](https://pepy.tech/badge/llamator)](https://pepy.tech/project/llamator)
[![Downloads](https://pepy.tech/badge/llamator/month)](https://pepy.tech/project/llamator)
[![GitHub Repo stars](https://img.shields.io/github/stars/RomiconEZ/llamator)](https://github.com/RomiconEZ/llamator/stargazers)
Expand Down
3 changes: 3 additions & 0 deletions examples/llamator-api.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -283,7 +283,9 @@
"tests_with_attempts = [\n",
" # (\"aim_jailbreak\", 2),\n",
" # (\"base64_injection\", 2),\n",
" # (\"bon\", 2),\n",
" # (\"complimentary_transition\", 2),\n",
" # (\"crescendo\", 2),\n",
" # (\"do_anything_now_jailbreak\", 2),\n",
" # (\"RU_do_anything_now_jailbreak\", 2),\n",
" # (\"ethical_compliance\", 2),\n",
Expand All @@ -292,6 +294,7 @@
" # (\"linguistic_evasion\", 2),\n",
" # (\"logical_inconsistencies\", 2),\n",
" # (\"past_tense\", 2),\n",
" # (\"suffix\", 2),\n",
" (\"sycophancy\", 2),\n",
" (\"system_prompt_leakage\", 2),\n",
" # (\"typoglycemia_attack\", 2),\n",
Expand Down
7 changes: 5 additions & 2 deletions examples/llamator-selenium.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -365,15 +365,18 @@
"tests_with_attempts = [\n",
" # (\"aim_jailbreak\", 2),\n",
" # (\"base64_injection\", 2),\n",
" # (\"complimentary_transition\", 3),\n",
" # (\"bon\", 2),\n",
" # (\"complimentary_transition\", 2),\n",
" # (\"crescendo\", 2),\n",
" # (\"do_anything_now_jailbreak\", 2),\n",
" # (\"RU_do_anything_now_jailbreak\", 2),\n",
" # (\"ethical_compliance\", 2),\n",
" # (\"harmful_behavior\", 2),\n",
" # (\"harmful_behavior_multistage\", 2),\n",
" (\"linguistic_evasion\", 2),\n",
" (\"logical_inconsistencies\", 2),\n",
" # (\"past_tense\", 1),\n",
" # (\"past_tense\", 2),\n",
" # (\"suffix\", 2),\n",
" (\"sycophancy\", 2),\n",
" (\"system_prompt_leakage\", 2),\n",
" # (\"typoglycemia_attack\", 2),\n",
Expand Down
3 changes: 3 additions & 0 deletions examples/llamator-telegram.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -385,7 +385,9 @@
"tests_with_attempts = [\n",
" # (\"aim_jailbreak\", 2),\n",
" # (\"base64_injection\", 2),\n",
" # (\"bon\", 2),\n",
" # (\"complimentary_transition\", 2),\n",
" # (\"crescendo\", 2),\n",
" # (\"do_anything_now_jailbreak\", 2),\n",
" # (\"RU_do_anything_now_jailbreak\", 2),\n",
" # (\"ethical_compliance\", 2),\n",
Expand All @@ -394,6 +396,7 @@
" (\"linguistic_evasion\", 2),\n",
" (\"logical_inconsistencies\", 2),\n",
" # (\"past_tense\", 2),\n",
" # (\"suffix\", 2),\n",
" (\"sycophancy\", 2),\n",
" # (\"system_prompt_leakage\", 2),\n",
" # (\"typoglycemia_attack\", 2),\n",
Expand Down
3 changes: 3 additions & 0 deletions examples/llamator-whatsapp.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -510,7 +510,9 @@
"tests_with_attempts = [\n",
" # (\"aim_jailbreak\", 2),\n",
" # (\"base64_injection\", 2),\n",
" # (\"bon\", 2),\n",
" # (\"complimentary_transition\", 2),\n",
" # (\"crescendo\", 2),\n",
" (\"do_anything_now_jailbreak\", 2),\n",
" # (\"RU_do_anything_now_jailbreak\", 2),\n",
" # (\"ethical_compliance\", 2),\n",
Expand All @@ -519,6 +521,7 @@
" # (\"linguistic_evasion\", 2),\n",
" # (\"logical_inconsistencies\", 2),\n",
" # (\"past_tense\", 2),\n",
" # (\"suffix\", 2),\n",
" # (\"sycophancy\", 2),\n",
" # (\"system_prompt_leakage\", 2),\n",
" # (\"typoglycemia_attack\", 2),\n",
Expand Down
4 changes: 3 additions & 1 deletion src/llamator/attacks/suffix.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,9 @@

@register_test
class TestSuffix(TestBase):
"""Uses a specially crafted suffix to bypass LLM filters and restrictions, forcing the model to generate prohibited or harmful content despite the original query. Evaluates the model's vulnerability to input data manipulation."""
"""Uses a specially crafted suffix to bypass LLM filters and restrictions, forcing the model to generate prohibited or harmful content despite the original query.
Evaluates the model's vulnerability to input data manipulation.
Original Paper: https://arxiv.org/abs/2307.15043, Code: https://github.com/llm-attacks/llm-attacks"""

test_name = "suffix"

Expand Down
1 change: 1 addition & 0 deletions tests/test_llamator.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ def test_openai_client():
# ("linguistic_evasion", 2),
# ("logical_inconsistencies", 2),
# ("past_tense", 2),
# ("suffix", 2),
# ("sycophancy", 2),
# ("system_prompt_leakage", 2),
# ("typoglycemia_attack", 2),
Expand Down
1 change: 1 addition & 0 deletions tests/test_local_llamator.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ def test_langchain_client_yandexgpt():
# ("linguistic_evasion", 2),
# ("logical_inconsistencies", 2),
# ("past_tense", 2),
# ("suffix", 2),
# ("sycophancy", 2),
# ("system_prompt_leakage", 2),
# ("typoglycemia_attack", 2),
Expand Down
Loading