Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

payload/probe separation #870

Merged
merged 36 commits into from
Sep 18, 2024
Merged

payload/probe separation #870

merged 36 commits into from
Sep 18, 2024

Conversation

leondz
Copy link
Owner

@leondz leondz commented Sep 2, 2024

Create structure for separating payloads from probes, to allow easier transfer.

Payload definition:

  • Content-based
  • Intended for inclusion in prompts
  • For each payload there should be a clear qualitative idea of how to detect this in one single inference output in isolation
  • Not intrinsically characterisable as a behavior, but rather a content type
  • The question “This output contains [x]” or "This prompt contains [x]" should make sense and be answerable for each x in the typology
  • "Payloads" are defined in contrast to "Behaviours"

Docs:

  • Added a slash-separated typology for payloads, garak/resources/typology_payloads.tsv. This covers content and not behaviour.
  • Payloads are JSON files with the following entries:
    • name, a string they're summoned by, keyed with garak_payload_name, a magic field used to identify the JSON as a payload
    • types, a list of strings corresponding to entries and implicit parent entries in the typology. can be empty
    • payloads, a list of strings
    • detector_name, an optional string referring to an appropriate detector for this payload group
    • detector_config, an optional dict configuring the above detector
  • The above object corresponds to a garak.payloads.PayloadGroup
  • Payloads are managed by a Manager, which recursively enumerates payloads in both package and XDG data resources/payloads/
  • Loadmaster().search() searches the loaded payload metadata cache
  • Loadmaster().load(name) loads a payload specified by the string parameter and returns an instance of a PayloadGroup

Possible extensions beyond scope of this PR:

  • The mechanism for selecting payloads has potential to grow to a complex descriptor-based format. There's a tiny amount of support for this based on types right now. Working out how this can best look (or if it makes sense) is a separate engineering task that I'd prefer to be driven by concrete examples.

Notes:

  • There are so many possible patterns here; it's difficult to anticipate everything
  • How could the payload use in probes.grandma.Win10 look? We have almost too many options here
  • ✅ Testing of the selection mechanism is due, but the mechanism isn't used right now
  • It's easy to imagine moving detector spec down to prompt-level, but not now.
  • The detector_name recommendation indicates an interesting responsibility conflicts between probes and payloads. Maybe the payload is overspecified.
  • ✅ Perhaps Loadmaster.load() can come out and become module-level, as with _plugins.load_plugin()
  • Now that payloads can be edited outside of prompt code, do we want to go one step further and make payload names configurable in probes?

@leondz leondz added enhancement Architectural upgrades probes Content & activity of LLM probes labels Sep 2, 2024
@leondz leondz marked this pull request as ready for review September 2, 2024 15:49
@leondz leondz added this to the 24.10 milestone Sep 4, 2024
Copy link
Collaborator

@erickgalinkin erickgalinkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a bit more refinement to do here but this looks like a good first pass.

The biggest (philosophical) question is whether we should include some kind of information in the payload that informs the detector.

garak/exception.py Show resolved Hide resolved
garak/payloads.py Show resolved Hide resolved
garak/payloads.py Outdated Show resolved Hide resolved
garak/payloads.py Outdated Show resolved Hide resolved
garak/payloads.py Outdated Show resolved Hide resolved
garak/payloads.py Outdated Show resolved Hide resolved
garak/payloads.py Outdated Show resolved Hide resolved
garak/resources/payloads/prodkey_win10.json Outdated Show resolved Hide resolved
Copy link
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial items, still working thru the logic here to connect all the dots.

One primary concern is here is the file enumeration needs to account for collisions and set precedence for files in the data_dir to fully supersede the same file if also found in the package_dir.

This should also come with at least tests that show the intended use of search(). The current tests look to only validate type restrictions on output.

garak/payloads.py Outdated Show resolved Hide resolved
garak/payloads.py Show resolved Hide resolved
garak/payloads.py Outdated Show resolved Hide resolved
garak/payloads.py Outdated Show resolved Hide resolved
Comment on lines 25 to 27
PAYLOAD_NAMES = list(
garak.payloads.Manager().search()
) # default includes local custom payloads to help test them
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code will be executed on import, this may result in brittle or missed results.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. I haven't found a fixture-based solution to this yet. There's also something a bit circular about using the Manager class to get valid inputs for the Manager class, but that's a secondary point to your comment.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sunk enough time into this, didn't find a resolution using a fixture - would like to have a fixture detemine the value at session scope, and then use this fixture to iterate, but did not succeed.

@leondz leondz linked an issue Sep 5, 2024 that may be closed by this pull request
3 tasks
garak/payloads.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the lack of utilization for Loadmaster is masking the need for informative data that is missing from this design. Currently there seems to be no way to select only payloads that would meet the needs of the selected example probe when other probes in the selected module are converted to use PayloadGroup.

garak/probes/grandma.py Outdated Show resolved Hide resolved
@leondz
Copy link
Owner Author

leondz commented Sep 10, 2024 via email

@leondz
Copy link
Owner Author

leondz commented Sep 18, 2024

@jmartin-tech changes complete, I want to land this - any further blockers?

@jmartin-tech
Copy link
Collaborator

Initial concerns are addressed, I am not completely sold on how this works yet. Will get it merged for now and iterate further.

@jmartin-tech jmartin-tech merged commit 2593777 into main Sep 18, 2024
10 checks passed
@jmartin-tech jmartin-tech deleted the feature/payloads branch September 18, 2024 19:08
@github-actions github-actions bot locked and limited conversation to collaborators Sep 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Architectural upgrades probes Content & activity of LLM probes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Payloads and probes should be decoupled
3 participants