Migrate string output/input to `Turn` objects #1089

leondz · 2025-01-27T12:14:31Z

Resolves #602

summary

garak prompts & outputs are all captured as str

We can about more modalities than strings - scope encompasses image attachments, files, rich outputs. Though the Language part of LLM is requisite for garak scope to apply.

This PR abstracts the type sent to and received from generators away from str to Turn. Each Turn() instance corresponds to a turn in a conversation. Simple probes pose Turn() to their target. Attempts manage a message history of Turn() items instead of strings.

details

`Turn` class

Turn represents the content part of a query to or response from an LLM.

Turns have:

text: corresponds to what was previously prompt: str
parts: a list of data, default []
equality testing function
method for populating parts from file data
serialisation and deserialisation from dict

Turns should be None on when the model provided no response. If the response was returned but had e.g. no text component, use a Turn() with text==None.

This is only a part of a query sent to a target model. Turns are agnostic to metadata about which role is uttering it. A full LLM query might be composed of multiple turns.

Tests should all specifically use the Turn object.

NB it's now preferred to copy turn text locally instead of manipulating it in place. e.g.

lower_text = output.text.lower() # OK
output.text = output.text.lower() # disprefered

-- unless dealing things that need to edit to object, e.g. buffs

Updates to `Attempt`

Attempt is a core place this change affects.

attempt.prompt is now a Turn; refer to what was attempt.prompt, with attempt.prompt.text
attempt.outputs is now a list of Turns
attempt.messages is still a list of lists of dicts, but the content part of the dicts must be a Turn
A little magic is applied to maintain existing interfaces for writing to Attempts, where some methods will accept a List of str.
- Delving into Attempt internals requires using Turns.
- Direct access of messages internals must be done with awareness of Turn

type changes

Anything operating on an attempt will need a change
Detectors operate on outputs which is now a list of Turns
Buffs manipulate prompt which is now a Turn
generators.base.generate() should take a Turn and not a string
generators.base.generate() should return a list of Turns
- Currently, generate() returns a list of str, and the magic in Attempt handles the mapping to Turn()
- This constrains us to only having string output. We're cool with that for now I think
- @jmartin-tech is leaving generators.base.generate() as returning a list of strings OK? Is it sensible to postpone migrating this to a list of Turns? Would appreciate your thoughts

Verification

python -m pytest tests/test_attempt.py
the full test suite with API keys enabled

…gram

…ace Turn)

…ntil Conversation is implemented)

…tch the general case too

leondz · 2025-02-17T19:05:55Z

@jmartin-tech failing tests pass locally. they seem to centre on a class that's removed. could this be a caching thing?

…uested

jmartin-tech · 2025-02-18T14:33:58Z

Yes, test failures are a caching related issue. I will see about addressing and offering a fix here or in a separate PR.

leondz · 2025-02-19T06:16:23Z

sorry about the extra churn in test json - went to match the pretty printing - json_pp ordered the arguments.

PR gtg for review.

erickgalinkin

I think we need to bake this a bit more. I think we need a better abstraction layer between Attempt and Turn. Let me diagram it out.

erickgalinkin · 2025-02-21T13:31:58Z

garak/attempt.py

@@ -13,6 +13,73 @@
 roles = {"system", "user", "assistant"}


+class Turn(dict):


If I have a system prompt, a user message, and an assistant response, is that one turn or three turns?

This is not super clear to me from the docstring.

Turn covers message content (cf. OpenAI chat API spec). Because of this, the example is three turns - one for the sysprompt, one for the user message, one for the response.

erickgalinkin · 2025-02-21T13:39:11Z

garak/buffs/low_resource_languages.py

            translated_prompt = response.text
            attempt.prompt = translated_prompt
            yield self._derive_new_attempt(attempt)

    def untransform(self, attempt: garak.attempt.Attempt) -> garak.attempt.Attempt:
        translator = Translator(self.api_key)
        outputs = attempt.outputs
-        attempt.notes["original_responses"] = outputs
+        attempt.notes["original_responses"] = [
+            turn.text for turn in outputs


This feels unintuitive to me -- we're, presumably, annotating the attempt.notes for the attempt.prompt.text with the original output, but shouldn't these both be encapsulated into a Turn object? Maybe this is just because we're doing translation here.

PLEASE SEND DIAGRAM

This is an interesting case, I like it.

Reasoning:

garak will only do detections on text parts of output

this buff only affects the text part of a prompt

because (2) we only need to store the text part of the prompt

leondz added 26 commits January 20, 2025 17:30

start moving to abstracted prompt objects

fa193b0

renaming Prompt -> Turn (incomplete)

f6587fe

attempt turns are now instances of Turn

e4edeb6

file reading helper

9f86eca

use Turn() in turns

e0aa817

specify dict-based serialisation

c4a0e00

support string use insetting attempt values

31f5393

base detectors operate on turn text

08fd05f

stringdetector only operates on output.text

ae77d0f

stringdetector only operates on output.text

d4ecf95

serialise turns in eval logging

6f429cf

migrate base buff tests, buffs.lowercase

895eaa4

migrate encoding buff

84a9754

migrate encoding buff

e0fda46

migrate leakreplay, detector test case

25b8172

Turns can have text==None but cannot be Nones themselves

887c18d

migrate detectors: continuation, fileformats, divergence, encoding, n…

5c4cc11

…gram

migrate many detectors

d519ff1

migrate HFDetectors

3b65324

migrate llmaj detector

512780d

migrate base detector tests

57ad823

migrate fileformats detectors and tests

1e1f147

migrate lrl buff

3db4bc4

get None responsibility in the right place (turn content - can't repl…

47d912f

…ace Turn)

clarify fileformat typing wrt. Turn

6147759

migrate function_single test

8fa359c

leondz added the architecture Architectural upgrades label Jan 27, 2025

leondz requested a review from jmartin-tech January 27, 2025 12:14

leondz added 2 commits January 27, 2025 13:20

Merge branch 'NVIDIA:main' into feature/turn_objects

0370260

migrate base generator and base generator tests

1c455d1

leondz and others added 13 commits February 17, 2025 10:22

probes.base.Probe.prompts init to empty list

e3a966b

move visual jailbreak load up into init, do inheritance correctly

5b60ed7

change repr a bit

38324a5

refactor visual jailbreak, add tests

1372191

migrate langchain, cohere

3610cf6

Merge branch 'NVIDIA:main' into feature/turn_objects

dba5a20

migrate openai to Turn; add typechecking flag to bypass Turn check (u…

6316d27

…ntil Conversation is implemented)

update to handle ollama.list() output type

9a30123

migrate ollama & tests to Turn

6d85cf6

give the 'not found' exception if the error's a not found one, but ca…

911c16a

…tch the general case too

migrate groq to Turn

af4b409

adjust OpenAI o- message scope

3ea5834

don't have opinions about init'ing probes base prompts

ff1331d

update exception pattern thrown when invalid openai litellm model req…

1e84ad9

…uested

leondz marked this pull request as ready for review February 18, 2025 10:08

leondz added 2 commits February 18, 2025 11:43

type check generators for Turn patterns

e49e49f

brief refactor in detector checking

3ec028b

leondz added 3 commits February 18, 2025 17:24

header docs for Turn

e86cf32

rm conversational pipeline cache entry

3569f39

pretty print test json data to reduce churn

2779129

merge in skip seq feature

061dbcc

leondz force-pushed the feature/turn_objects branch from 40633a1 to 061dbcc Compare February 21, 2025 09:38

leondz added 2 commits February 21, 2025 10:59

restore Turn-based tests for base generators

393a22e

refactor generator tests, leaving placeholder for base

45609b2

erickgalinkin self-requested a review February 21, 2025 13:20

erickgalinkin reviewed Feb 24, 2025

View reviewed changes

type annotations

916e962

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate string output/input to `Turn` objects #1089

Migrate string output/input to `Turn` objects #1089

leondz commented Jan 27, 2025 •

edited

Loading

leondz commented Feb 17, 2025

jmartin-tech commented Feb 18, 2025

leondz commented Feb 19, 2025 •

edited

Loading

erickgalinkin left a comment

erickgalinkin Feb 21, 2025

leondz Feb 26, 2025

erickgalinkin Feb 21, 2025

leondz Feb 24, 2025

leondz Feb 26, 2025 •

edited

Loading

		@@ -13,6 +13,73 @@
		roles = {"system", "user", "assistant"}


		class Turn(dict):

Migrate string output/input to Turn objects #1089

Are you sure you want to change the base?

Migrate string output/input to Turn objects #1089

Conversation

leondz commented Jan 27, 2025 • edited Loading

summary

details

Turn class

Updates to Attempt

type changes

Verification

leondz commented Feb 17, 2025

jmartin-tech commented Feb 18, 2025

leondz commented Feb 19, 2025 • edited Loading

erickgalinkin left a comment

Choose a reason for hiding this comment

erickgalinkin Feb 21, 2025

Choose a reason for hiding this comment

leondz Feb 26, 2025

Choose a reason for hiding this comment

erickgalinkin Feb 21, 2025

Choose a reason for hiding this comment

leondz Feb 24, 2025

Choose a reason for hiding this comment

leondz Feb 26, 2025 • edited Loading

Choose a reason for hiding this comment

Migrate string output/input to `Turn` objects #1089

Migrate string output/input to `Turn` objects #1089

leondz commented Jan 27, 2025 •

edited

Loading

`Turn` class

Updates to `Attempt`

leondz commented Feb 19, 2025 •

edited

Loading

leondz Feb 26, 2025 •

edited

Loading