Simple BoN Jailbreaking for Ollama

This project is based on the references linked below. Since the original source code from the paper is difficult to use, I created a simple Python program for local testing.

https://x.com/AnthropicAI/status/1867608917595107443
https://jplhughes.github.io/bon-jailbreaking/
https://github.com/jplhughes/bon-jailbreaking

The main source file is bon.py, which borrows code from bon-jailbreaking, including functions such as FALSE_POSITIVE_PHRASES, apply_word_scrambling, apply_random_capitalization, and apply_ascii_noising for text augmentation.

To determine whether a response is harmful, this program uses the OpenAI moderation API.

In bon.py, the model llama3.2 is hardcoded for testing purposes. You can replace it with any Ollama-supported model.

For an example of test result, see candidate.txt.

How to run

Install Ollama
Download Ollama from the following link:
- https://ollama.com/download
Install an Ollama Model
Use the following command to install the llama3.2 model:

ollama run llama3.2

For more information, refer to https://ollama.com/library/llama3.2

Install Ollama Python Library Install the required Python library with:

pip install ollama

For details, see https://github.com/ollama/ollama-python

Run this program

Execute the program using:

python bon.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Simple BoN Jailbreaking for Ollama

How to run

Files

README.md

Latest commit

History

README.md

File metadata and controls

Simple BoN Jailbreaking for Ollama

How to run