Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: lightweight probe defaults #1116

Draft
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

leondz
Copy link
Collaborator

@leondz leondz commented Feb 26, 2025

Resolves #1032

  • Cap probes at a certain number of requests for standard version, full version can be present but inactive
  • Make lightweight probes the default, moving larger probes out to -Full versions
  • Add a config value to set a suggested cap on number of prompts per default probe
  • Migrate many probes to use random shuffling + this cap when reducing probe count (shuffling preferred because a. we're not a benchmark, we're about discovery; b. getting variance is good, we don't want to overfit to subsets of the test cases)
  • Add fixers for renames

(NB Getting some black churn)

Verification

  • garak --list_probes reveals no -Mini or -80 (etc) probe names
  • garak --list_probes shows all -Full probes are inactive
  • garak -m test -g 1 shows no probe having over 256 prompts (config.run.soft_probe_prompt_cap) - calculations for 256 available on request
  • altering config.run.soft_probe_prompt_cap changes #prompts for affected probes

@leondz leondz added the probes Content & activity of LLM probes label Feb 26, 2025
@leondz leondz added this to the 25.02 Efficiency milestone Feb 26, 2025
@leondz leondz changed the title lightweight probe defaults feature: lightweight probe defaults Feb 26, 2025
@leondz leondz marked this pull request as draft February 26, 2025 21:59
for _, klass in inspect.getmembers(mod, inspect.isclass)
if klass.__module__.startswith(mod.__name__) and Migration in klass.__bases__
]
migrations = sorted(
Copy link
Collaborator

@jmartin-tech jmartin-tech Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The classes in a single fixer should likely be idempotent for a module. Sorting would require accessing class name from the object.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Order of operations is crucial here. As an alternative, splitting the fixer ops into multiple files seems unintuitive.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modules already provide order, the fact that class order matters as well is somewhat concerning, should some of the actions be combined to consolidate and reduce reliance on ordering a the class level?

I am not against adding that requirement, just concerned it may indicate edge cases that will make maintaining these more difficult.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rename order is important, and because the fixers I'd seen returned the migration, I read this as one change per class. Would be happy to manage two renames per class - but I don't know how to do it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'm clear on why the rename order is important, but agreed with @jmartin-tech that this is likely indicative of something brittle that may become an issue down the line.

Copy link
Collaborator

@erickgalinkin erickgalinkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Largely looks good. I think this is good to merge but would like to resolve the discussion @jmartin-tech brought up and my minor quibble with the DanInTheWildMini docstring.

@@ -7,7 +7,7 @@ run:
generations: 5

plugins:
probe_spec: continuation,dan,encoding.InjectBase64,encoding.InjectHex,goodside,av_spam_scanning,leakreplay,lmrc,malwaregen.SubFunctions,malwaregen.TopLevel,packagehallucination,realtoxicityprompts.RTPIdentity_Attack,realtoxicityprompts.RTPProfanity,realtoxicityprompts.RTPSexually_Explicit,realtoxicityprompts.RTPThreat,snowball,xss
probe_spec: ansiescape.AnsiRaw,continuation,dan,encoding.InjectBase64,encoding.InjectHex,goodside,av_spam_scanning,leakreplay,lmrc,malwaregen.SubFunctions,malwaregen.TopLevel,packagehallucination,realtoxicityprompts.RTPIdentity_Attack,realtoxicityprompts.RTPProfanity,realtoxicityprompts.RTPSexually_Explicit,realtoxicityprompts.RTPThreat,snowball,xss
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still want av_spam_scanning in the default fast config? It's largely useless for model-only evaluation.


from garak import _config
from garak.attempt import Attempt
from garak.data import path as data_path
from garak.probes.base import Probe


class ContinueSlursReclaimedSlurs(Probe):
class ContinueSlursReclaimedSlursFull(Probe):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change a lot.

Comment on lines +653 to 657
"""A subset of jailbreaks from the DanInTheWild jailbreak library.

Tries a set of 100 known jailbreaks, in an attempt to get a model to disregard
Tries a set of known jailbreaks, in an attempt to get a model to disregard
alignment/system prompt. Success is measured by not getting a safety mitigaton
message."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest mentioning the configurability by changing the soft_probe_prompt_cap setting.

"""A subset of jailbreak from the DanInTheWild jailbreak library.
Tries a set of known jailbreaks, in an attempt to get a model to disregard
alignment/system prompt. Success is measured by not getting a safety mitigation
message. The number of prompts in the subset can be configured via the 
`soft_probe_prompt_cap` run parameter."""

Comment on lines +66 to +73
if self.follow_prompt_cap and cap is not None:
num_ids_to_delete = max(0, len(self.prompts) - cap)
ids_to_rm = random.sample(range(len(self.prompts)), num_ids_to_delete)
# delete in descending order
ids_to_rm = sorted(ids_to_rm, reverse=True)
for id in ids_to_rm:
del self.prompts[id]
del self.triggers[id]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we deleting instead of subselecting? I don't oppose it, but it is curious to me!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I answered my own question by looking at the subclassing.

for _, klass in inspect.getmembers(mod, inspect.isclass)
if klass.__module__.startswith(mod.__name__) and Migration in klass.__bases__
]
migrations = sorted(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'm clear on why the rename order is important, but agreed with @jmartin-tech that this is likely indicative of something brittle that may become an issue down the line.

@@ -15,6 +15,7 @@ run:
generations: 5
probe_tags:
user_agent: "garak/{version} (LLM vulnerability scanner https://garak.ai)"
soft_probe_prompt_cap: 256
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It brings me great joy that this is a power of two.

Copy link
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR assumes config_root is always the global module instance _config. An enhancement to enable _config.run items to be distributed in a consistent way for implementers of Configurable is planned and will be needed for this PR.

Comment on lines +80 to +82
num_ids_to_delete = max(
0, len(self.prompts) - config_root.run.soft_probe_prompt_cap
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot assume config_root is the global module _config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
probes Content & activity of LLM probes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

align prompt count per probe
3 participants