Simple BoN Jailbreaking for Ollama

This project is based on the references linked below. Since the original source code from the paper is difficult to use, I created a simple Python program for local testing.

The main source file is bon.py, which borrows code from bon-jailbreaking, including functions such as FALSE_POSITIVE_PHRASES, apply_word_scrambling, apply_random_capitalization, and apply_ascii_noising for text augmentation.

To determine whether a response is harmful, this program uses the OpenAI moderation API.

In bon.py, the model llama3.2 is hardcoded for testing purposes. You can replace it with any Ollama-supported model.

For an example of test result, see candidate.txt.

How to run

Install Ollama
Download Ollama from the following link:
- https://ollama.com/download
Install an Ollama Model
Use the following command to install the llama3.2 model:

ollama run llama3.2

For more information, refer to https://ollama.com/library/llama3.2

Install Ollama Python Library Install the required Python library with:

pip install ollama

For details, see https://github.com/ollama/ollama-python

Run this program

Execute the program using:

python bon.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
bon.py		bon.py
candidate.txt		candidate.txt
test_on_anthropic.py		test_on_anthropic.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple BoN Jailbreaking for Ollama

How to run

About

Releases

Packages

Languages

lanesky/simple-bon-ollama

Folders and files

Latest commit

History

Repository files navigation

Simple BoN Jailbreaking for Ollama

How to run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages