Skip to content

lanesky/simple-bon-ollama

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple BoN Jailbreaking for Ollama

This project is based on the references linked below. Since the original source code from the paper is difficult to use, I created a simple Python program for local testing.

The main source file is bon.py, which borrows code from bon-jailbreaking, including functions such as FALSE_POSITIVE_PHRASES, apply_word_scrambling, apply_random_capitalization, and apply_ascii_noising for text augmentation.

To determine whether a response is harmful, this program uses the OpenAI moderation API.

In bon.py, the model llama3.2 is hardcoded for testing purposes. You can replace it with any Ollama-supported model.

For an example of test result, see candidate.txt.

How to run

  1. Install Ollama
    Download Ollama from the following link:

  2. Install an Ollama Model
    Use the following command to install the llama3.2 model:

ollama run llama3.2

For more information, refer to https://ollama.com/library/llama3.2

  1. Install Ollama Python Library Install the required Python library with:
pip install ollama

For details, see https://github.com/ollama/ollama-python

  1. Run this program

Execute the program using:

python bon.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages