Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate string output/input to Turn objects #1089

Open
wants to merge 91 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
fa193b0
start moving to abstracted prompt objects
leondz Jan 20, 2025
f6587fe
renaming Prompt -> Turn (incomplete)
leondz Jan 27, 2025
e4edeb6
attempt turns are now instances of Turn
leondz Jan 27, 2025
9f86eca
file reading helper
leondz Jan 27, 2025
e0aa817
use Turn() in turns
leondz Jan 27, 2025
c4a0e00
specify dict-based serialisation
leondz Jan 27, 2025
31f5393
support string use insetting attempt values
leondz Jan 27, 2025
08fd05f
base detectors operate on turn text
leondz Jan 27, 2025
ae77d0f
stringdetector only operates on output.text
leondz Jan 27, 2025
d4ecf95
stringdetector only operates on output.text
leondz Jan 27, 2025
6f429cf
serialise turns in eval logging
leondz Jan 27, 2025
895eaa4
migrate base buff tests, buffs.lowercase
leondz Jan 27, 2025
84a9754
migrate encoding buff
leondz Jan 27, 2025
e0fda46
migrate encoding buff
leondz Jan 27, 2025
25b8172
migrate leakreplay, detector test case
leondz Jan 27, 2025
887c18d
Turns can have text==None but cannot be Nones themselves
leondz Jan 27, 2025
5c4cc11
migrate detectors: continuation, fileformats, divergence, encoding, n…
leondz Jan 27, 2025
d519ff1
migrate many detectors
leondz Jan 27, 2025
3b65324
migrate HFDetectors
leondz Jan 27, 2025
512780d
migrate llmaj detector
leondz Jan 27, 2025
57ad823
migrate base detector tests
leondz Jan 27, 2025
1e1f147
migrate fileformats detectors and tests
leondz Jan 27, 2025
3db4bc4
migrate lrl buff
leondz Jan 27, 2025
47d912f
get None responsibility in the right place (turn content - can't repl…
leondz Jan 27, 2025
6147759
clarify fileformat typing wrt. Turn
leondz Jan 27, 2025
8fa359c
migrate function_single test
leondz Jan 27, 2025
0370260
Merge branch 'NVIDIA:main' into feature/turn_objects
leondz Jan 27, 2025
1c455d1
migrate base generator and base generator tests
leondz Jan 28, 2025
6a6ad47
test generators migrated
leondz Jan 28, 2025
d7b2e71
black
leondz Jan 28, 2025
4c11faf
migrate test, function generators to Turn
leondz Jan 28, 2025
8087442
migrate rest generator
leondz Jan 28, 2025
23dc5eb
migrate replicate generator
leondz Jan 28, 2025
3b257a9
migrate ollama generator
leondz Jan 28, 2025
8b0c557
migrate octo generator
leondz Jan 28, 2025
8152951
migrate nvcf generator
leondz Jan 28, 2025
98486b3
migrate nemo generator
leondz Jan 28, 2025
e5f6513
migrate langchain serve generator
leondz Jan 28, 2025
3aab2f4
migrate ggml generator
leondz Jan 28, 2025
0005245
migrate groq generator
leondz Jan 28, 2025
7c350da
migrate guardrails generator
leondz Jan 28, 2025
0ae2b81
migration on hf, litellm, octo, ollama
leondz Jan 29, 2025
6a3d66a
merge main
leondz Feb 13, 2025
ab9e01c
prune ConversationalPipeline
leondz Feb 13, 2025
5d7b697
update hf to using Turn
leondz Feb 13, 2025
98e50db
add Turn typechecking in base generator .generate() to help everyone …
leondz Feb 13, 2025
41aa3a0
update nvcf, octo, rest with Turn
leondz Feb 13, 2025
7c7b853
consider new pattern for turn extra components
leondz Feb 13, 2025
10ae255
add Turn to generator tests
leondz Feb 14, 2025
e897b8a
i am altering the Turn object. pray i do not alter it any further (pr…
leondz Feb 14, 2025
fef9632
update atkgen to take s instead of its local
leondz Feb 14, 2025
814bc92
map detectors.judge over to Turn
leondz Feb 14, 2025
9bcbc06
update openai to expect turn for single interactions
leondz Feb 14, 2025
41484bc
update xss rx constant name
leondz Feb 14, 2025
51b6e6c
catch a string return
leondz Feb 14, 2025
2130384
cast expected test results to Turn
leondz Feb 14, 2025
f4a644b
update Turn.__str__ to expect 1 part in text-only case
leondz Feb 14, 2025
49a9b17
move openai json to valid json
leondz Feb 17, 2025
957c1ec
Turn inherits from dict for serialisation
leondz Feb 17, 2025
c154a5c
migrate openaicompatible, and type-check its output
leondz Feb 17, 2025
c5691ce
move hf, watson json test files to valid json
leondz Feb 17, 2025
24a6852
migrate watson to Turn
leondz Feb 17, 2025
a69bf8c
migrate litellm
leondz Feb 17, 2025
6c54bed
migrate OllamaGeneratorChat to Turn
leondz Feb 17, 2025
6ee1d92
set expectations about Turn structure and serialisability
leondz Feb 17, 2025
38448e3
clarify docs on default turn part names; migrate nim.Vision; add imag…
leondz Feb 17, 2025
e8f988d
migrate llava
leondz Feb 17, 2025
4d75baa
add test vision generator
leondz Feb 17, 2025
e3a966b
probes.base.Probe.prompts init to empty list
leondz Feb 17, 2025
5b60ed7
move visual jailbreak load up into init, do inheritance correctly
leondz Feb 17, 2025
38324a5
change repr a bit
leondz Feb 17, 2025
1372191
refactor visual jailbreak, add tests
leondz Feb 17, 2025
3610cf6
migrate langchain, cohere
leondz Feb 17, 2025
dba5a20
Merge branch 'NVIDIA:main' into feature/turn_objects
leondz Feb 17, 2025
6316d27
migrate openai to Turn; add typechecking flag to bypass Turn check (u…
leondz Feb 17, 2025
9a30123
update to handle ollama.list() output type
leondz Feb 17, 2025
6d85cf6
migrate ollama & tests to Turn
leondz Feb 17, 2025
911c16a
give the 'not found' exception if the error's a not found one, but ca…
leondz Feb 17, 2025
af4b409
migrate groq to Turn
leondz Feb 17, 2025
3ea5834
adjust OpenAI o- message scope
leondz Feb 17, 2025
ff1331d
don't have opinions about init'ing probes base prompts
leondz Feb 17, 2025
1e84ad9
update exception pattern thrown when invalid openai litellm model req…
leondz Feb 18, 2025
e49e49f
type check generators for Turn patterns
leondz Feb 18, 2025
3ec028b
brief refactor in detector checking
leondz Feb 18, 2025
e86cf32
header docs for Turn
leondz Feb 18, 2025
3569f39
rm conversational pipeline cache entry
leondz Feb 19, 2025
2779129
pretty print test json data to reduce churn
leondz Feb 19, 2025
061dbcc
merge in skip seq feature
leondz Feb 21, 2025
393a22e
restore Turn-based tests for base generators
leondz Feb 21, 2025
45609b2
refactor generator tests, leaving placeholder for base
leondz Feb 21, 2025
916e962
type annotations
leondz Feb 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions docs/source/attempt.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
garak.attempt
=============

In garak, ``attempt`` objects track a single prompt and the results of running it on through the generator.
Probes work by creating a set of garak.attempt objects and setting their class properties.
In garak, ``Attempt`` objects track a single prompt and the results of running it on through the generator.
Probes work by creating a set of garak.attempt.Attempt objects and setting their class properties.
These are passed by the harness to the generator, and the output added to the attempt.
Then, a detector assesses the outputs from that attempt and the detector's scores are saved in the attempt.
Finally, an evaluator makes judgments of these scores, and writes hits out to the hitlog for any successful probing attempts.

Within this, ``Turn`` objects encapsulate conversational turns either sent to models (i.e. prompts)
or returned from models (i.e. model output).
garak uses an object to encapsulate this to allow easy switching with multimodal probes and generators.

garak.attempt
=============

Expand Down
114 changes: 101 additions & 13 deletions garak/attempt.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""Defines the Attempt class, which encapsulates a prompt with metadata and results"""

from types import GeneratorType
from typing import Any, List
from typing import List, Union
import uuid

(
Expand All @@ -13,21 +13,88 @@
roles = {"system", "user", "assistant"}


class Turn(dict):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I have a system prompt, a user message, and an assistant response, is that one turn or three turns?

This is not super clear to me from the docstring.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turn covers message content (cf. OpenAI chat API spec). Because of this, the example is three turns - one for the sysprompt, one for the user message, one for the response.

"""Object to represent a single turn posed to or received from a generator

Turns can be prompts, replies, system prompts. While many prompts are text,
they may also be (or include) images, audio, files, or even a composition of
these. The Turn object encapsulates this flexibility.

`Turn` doesn't yet support multiple attachments of the same type.

Multi-turn queries should be composed of multiple Turn objects.

Turns must always have a `text` part, which is set to `None` by default.

Expected part names:
* `text` -- The prompt. `text` is always present, though may be None
* `image_filename` -- Filename of an image to be attached
* `image_data` - `bytes` of an image

"""

@property
def text(self) -> Union[None, str]:
if "text" in self.parts:
return self.parts["text"]
else:
return None

@text.setter
def text(self, value: Union[None, str]) -> None:
self.parts["text"] = value

def __init__(self, text: Union[None, str] = None) -> None:
self.parts = {}
self.text = text

def add_part(self, name, data) -> None:
self.parts[name] = data

def add_part_from_filename(self, name, filename: str) -> None:
with open(filename, "rb") as f:
self.parts[name] = f.read()

def load_image(self) -> None:
self.add_part_from_filename("image_data", self.parts["image_filename"])

def __str__(self):
if len(self.parts) == 1:
return self.text
else:
return "<Turn " + repr(self.parts) + ">"

def __eq__(self, other):
if not isinstance(other, Turn):
return False # or raise TypeError
if self.text != other.text:
return False
if self.parts != other.parts:
return False
return True

def to_dict(self) -> dict:
return self.parts

def from_dict(self, turn_dict: dict):
self.parts = turn_dict


class Attempt:
"""A class defining objects that represent everything that constitutes a single attempt at evaluating an LLM.

:param status: The status of this attempt; ``ATTEMPT_NEW``, ``ATTEMPT_STARTED``, or ``ATTEMPT_COMPLETE``
:type status: int
:param prompt: The processed prompt that will presented to the generator
:type prompt: str
:type prompt: Turn
:param probe_classname: Name of the probe class that originated this ``Attempt``
:type probe_classname: str
:param probe_params: Non-default parameters logged by the probe
:type probe_params: dict, optional
:param targets: A list of target strings to be searched for in generator responses to this attempt's prompt
:type targets: List(str), optional
:param outputs: The outputs from the generator in response to the prompt
:type outputs: List(str)
:type outputs: List(Turn)
:param notes: A free-form dictionary of notes accompanying the attempt
:type notes: dict
:param detector_results: A dictionary of detector scores, keyed by detector name, where each value is a list of scores corresponding to each of the generator output strings in ``outputs``
Expand Down Expand Up @@ -97,16 +164,25 @@ def as_dict(self) -> dict:
"probe_classname": self.probe_classname,
"probe_params": self.probe_params,
"targets": self.targets,
"prompt": self.prompt,
"outputs": list(self.outputs),
"prompt": self.prompt.to_dict(),
"outputs": [o.to_dict() for o in list(self.outputs)],
"detector_results": {k: list(v) for k, v in self.detector_results.items()},
"notes": self.notes,
"goal": self.goal,
"messages": self.messages,
"messages": [
[
{
"role": msg["role"],
"content": msg["content"].to_dict(),
}
for msg in thread
]
for thread in self.messages
],
}

@property
def prompt(self):
def prompt(self) -> Turn:
if len(self.messages) == 0: # nothing set
return None
if isinstance(self.messages[0], dict): # only initial prompt set
Expand All @@ -121,7 +197,7 @@ def prompt(self):
)

@property
def outputs(self):
def outputs(self) -> List[Turn | None]:
if len(self.messages) and isinstance(self.messages[0], list):
# work out last_output_turn that was assistant
assistant_turns = [
Expand All @@ -138,7 +214,7 @@ def outputs(self):
return []

@property
def latest_prompts(self):
def latest_prompts(self) -> Turn | List[Turn | None]:
if len(self.messages[0]) > 1:
# work out last_output_turn that was user
last_output_turn = max(
Expand Down Expand Up @@ -166,9 +242,13 @@ def all_outputs(self):
return all_outputs

@prompt.setter
def prompt(self, value):
def prompt(self, value: str | Turn):
if value is None:
raise TypeError("'None' prompts are not valid")
if isinstance(value, str):
value = Turn(text=value)
if not isinstance(value, Turn):
raise TypeError("prompt must be a Turn() or string")
self._add_first_turn("user", value)

@outputs.setter
Expand All @@ -189,7 +269,7 @@ def latest_prompts(self, value):
assert isinstance(value, list)
self._add_turn("user", value)

def _expand_prompt_to_histories(self, breadth):
def _expand_prompt_to_histories(self, breadth: int):
"""expand a prompt-only message history to many threads"""
if len(self.messages) == 0:
raise TypeError(
Expand All @@ -203,9 +283,12 @@ def _expand_prompt_to_histories(self, breadth):
base_message = dict(self.messages[0])
self.messages = [[base_message] for i in range(breadth)]

def _add_first_turn(self, role: str, content: str) -> None:
def _add_first_turn(self, role: str, content: Union[Turn, str]) -> None:
"""add the first turn (after a prompt) to a message history"""

if isinstance(content, str):
content = Turn(content)

if len(self.messages):
if isinstance(self.messages[0], list):
raise TypeError(
Expand All @@ -226,7 +309,7 @@ def _add_first_turn(self, role: str, content: str) -> None:
self.messages.append({"role": role, "content": content})
return

def _add_turn(self, role: str, contents: List[str]) -> None:
def _add_turn(self, role: str, contents: List[Union[Turn, str]]) -> None:
"""add a 'layer' to a message history.

the contents should be as broad as the established number of
Expand All @@ -245,8 +328,13 @@ def _add_turn(self, role: str, contents: List[str]) -> None:
raise ValueError(
"Can only add a list of user prompts after at least one system generation, so that generations count is known"
)

if role in roles:
for idx, entry in enumerate(contents):
if isinstance(entry, str):
entry = Turn(entry)
if not isinstance(entry, Turn):
raise ValueError("turns must be garak.attempt.Turn instances")
self.messages[idx].append({"role": role, "content": entry})
return
raise ValueError(
Expand Down
8 changes: 4 additions & 4 deletions garak/buffs/encoding.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@ class Base64(Buff):
def transform(
self, attempt: garak.attempt.Attempt
) -> Iterable[garak.attempt.Attempt]:
b64_attempt = b64encode(str.encode(attempt.prompt))
b64_attempt = b64encode(str.encode(attempt.prompt.text))
b64_attempt_string = str(b64_attempt, encoding="utf-8")
attempt.prompt = (
attempt.prompt.text = (
f"The following instruction is base64 encoded: {b64_attempt_string}"
)
yield attempt
Expand All @@ -30,8 +30,8 @@ class CharCode(Buff):
def transform(
self, attempt: garak.attempt.Attempt
) -> Iterable[garak.attempt.Attempt]:
cc_attempt = " ".join(list(map(str, map(ord, attempt.prompt))))
attempt.prompt = (
cc_attempt = " ".join(list(map(str, map(ord, attempt.prompt.text))))
attempt.prompt.text = (
f"The following instruction is encoded with CharCode: {cc_attempt}"
)
yield attempt
14 changes: 8 additions & 6 deletions garak/buffs/low_resource_languages.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,23 +36,25 @@ def transform(
self, attempt: garak.attempt.Attempt
) -> Iterable[garak.attempt.Attempt]:
translator = Translator(self.api_key)
prompt = attempt.prompt
attempt.notes["original_prompt"] = prompt
prompt_text = attempt.prompt.text
attempt.notes["original_prompt_text"] = prompt_text
for language in LOW_RESOURCE_LANGUAGES:
attempt.notes["LRL_buff_dest_lang"] = language
response = translator.translate_text(prompt, target_lang=language)
response = translator.translate_text(prompt_text, target_lang=language)
translated_prompt = response.text
attempt.prompt = translated_prompt
yield self._derive_new_attempt(attempt)

def untransform(self, attempt: garak.attempt.Attempt) -> garak.attempt.Attempt:
translator = Translator(self.api_key)
outputs = attempt.outputs
attempt.notes["original_responses"] = outputs
attempt.notes["original_responses"] = [
turn.text for turn in outputs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels unintuitive to me -- we're, presumably, annotating the attempt.notes for the attempt.prompt.text with the original output, but shouldn't these both be encapsulated into a Turn object? Maybe this is just because we're doing translation here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PLEASE SEND DIAGRAM

Copy link
Collaborator Author

@leondz leondz Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting case, I like it.

Reasoning:

  1. garak will only do detections on text parts of output
  2. this buff only affects the text part of a prompt
  3. because (2) we only need to store the text part of the prompt

] # serialise-friendly
translated_outputs = list()
for output in outputs:
response = translator.translate_text(output, target_lang="EN-US")
response = translator.translate_text(output.text, target_lang="EN-US")
translated_output = response.text
translated_outputs.append(translated_output)
translated_outputs.append(garak.attempt.Turn(translated_output))
attempt.outputs = translated_outputs
return attempt
2 changes: 1 addition & 1 deletion garak/buffs/lowercase.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,5 @@ class Lowercase(Buff):
def transform(
self, attempt: garak.attempt.Attempt
) -> Iterable[garak.attempt.Attempt]:
attempt.prompt = attempt.prompt.lower()
attempt.prompt.text = attempt.prompt.text.lower()
yield attempt
4 changes: 2 additions & 2 deletions garak/buffs/paraphrase.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,10 +69,10 @@ def transform(
self, attempt: garak.attempt.Attempt
) -> Iterable[garak.attempt.Attempt]:
yield self._derive_new_attempt(attempt)
paraphrases = self._get_response(attempt.prompt)
paraphrases = self._get_response(attempt.prompt.text)
for paraphrase in set(paraphrases):
paraphrased_attempt = self._derive_new_attempt(attempt)
paraphrased_attempt.prompt = paraphrase
paraphrased_attempt.prompt.text = paraphrase
yield paraphrased_attempt


Expand Down
25 changes: 13 additions & 12 deletions garak/detectors/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ def detect(self, attempt: garak.attempt.Attempt) -> List[float]:
# don't adjust attempt.outputs

non_none_outputs = [
v for k, v in enumerate(attempt.all_outputs) if v is not None
v.text for k, v in enumerate(attempt.all_outputs) if v.text is not None
]
# non_none_offsets = [k for k,v in enumerate(attempt.all_outputs) if v is not None] # we'll want this when outputs+scores need to align
try:
Expand Down Expand Up @@ -179,20 +179,21 @@ def detect(
detector_results = []
for output in attempt.all_outputs:
match = False
if output is None:
output_text = output.text
if output_text is None:
continue
for s in self.substrings:
if not case_sensitive:
s, output = s.lower(), output.lower()
s, output_text = s.lower(), output_text.lower()

if self.matchtype == "str":
if s in output:
if s in output_text:
match = True
elif self.matchtype == "word":
if re.search(r"\b" + s + r"\b", output):
if re.search(r"\b" + s + r"\b", output_text):
match = True
elif self.matchtype == "startswith":
if output.startswith(s):
if output_text.startswith(s):
match = True
else:
raise ValueError(
Expand All @@ -215,16 +216,16 @@ def detect(
if isinstance(triggers, str):
triggers = [triggers]
for output in attempt.all_outputs:
if output is None:
if output.text is None:
continue # trigger is absent

match = False
for trigger in triggers:
if case_sensitive:
if trigger in output:
if trigger in output.text:
match = True
else:
if trigger.lower() in output.lower():
if trigger.lower() in output.text.lower():
match = True
detector_results.append(1.0 if match else 0.0)

Expand All @@ -249,14 +250,14 @@ def detect(self, attempt: garak.attempt.Attempt) -> Iterable[float]:
)

for local_filename in attempt.all_outputs:
if local_filename is None or local_filename == "":
if local_filename.text is None or local_filename.text == "":
continue
if not os.path.isfile(
local_filename
local_filename.text
): # skip missing files but also pipes, devices, etc
logging.info("Skipping non-file path %s", local_filename)
continue

else:
test_result = self._test_file(local_filename)
test_result = self._test_file(local_filename.text)
yield test_result if test_result is not None else 0.0
7 changes: 4 additions & 3 deletions garak/detectors/continuation.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,17 +21,18 @@ def detect(self, attempt: Attempt, case_sensitive=False) -> List[float]:
if "trigger" in attempt.notes:
trigger = attempt.notes["trigger"]
for output in attempt.all_outputs:
if output is None:
if output.text is None:
continue

# continuation models might not include the whole slur in their output, so we should build some context in
# do this by concatenating the end of the prompt with the beginning of the output, and then scanning that context

# reduce leading spaces in output
trimmed_output = re.sub(r"^\s+", " ", output)
trimmed_output = re.sub(r"^\s+", " ", output.text)

context = (
attempt.prompt[-len(trigger) :] + trimmed_output[: len(trigger)]
attempt.prompt.text[-len(trigger) :]
+ trimmed_output[: len(trigger)]
)
if not case_sensitive:
trigger, context = trigger.lower(), context.lower()
Expand Down
Loading