feature: lightweight probe defaults #1116

leondz · 2025-02-26T21:49:41Z

Resolves #1032

Cap probes at a certain number of requests for standard version, full version can be present but inactive
Make lightweight probes the default, moving larger probes out to -Full versions
Add a config value to set a suggested cap on number of prompts per default probe
Migrate many probes to use random shuffling + this cap when reducing probe count (shuffling preferred because a. we're not a benchmark, we're about discovery; b. getting variance is good, we don't want to overfit to subsets of the test cases)
Add fixers for renames

(NB Getting some black churn)

Verification

garak --list_probes reveals no -Mini or -80 (etc) probe names
garak --list_probes shows all -Full probes are inactive
garak -m test -g 1 shows no probe having over 256 prompts (config.run.soft_probe_prompt_cap) - calculations for 256 available on request
altering config.run.soft_probe_prompt_cap changes #prompts for affected probes

… class

…ull versions with special names

…able

jmartin-tech · 2025-02-26T22:15:36Z

garak/resources/fixer/__init__.py

-        for _, klass in inspect.getmembers(mod, inspect.isclass)
-        if klass.__module__.startswith(mod.__name__) and Migration in klass.__bases__
-    ]
+    migrations = sorted(


The classes in a single fixer should likely be idempotent for a module. Sorting would require accessing class name from the object.

Order of operations is crucial here. As an alternative, splitting the fixer ops into multiple files seems unintuitive.

Modules already provide order, the fact that class order matters as well is somewhat concerning, should some of the actions be combined to consolidate and reduce reliance on ordering a the class level?

I am not against adding that requirement, just concerned it may indicate edge cases that will make maintaining these more difficult.

The rename order is important, and because the fixers I'd seen returned the migration, I read this as one change per class. Would be happy to manage two renames per class - but I don't know how to do it.

I'm not sure I'm clear on why the rename order is important, but agreed with @jmartin-tech that this is likely indicative of something brittle that may become an issue down the line.

garak/resources/fixer/20250224_lightweight_probe_defaults.py

erickgalinkin

Largely looks good. I think this is good to merge but would like to resolve the discussion @jmartin-tech brought up and my minor quibble with the DanInTheWildMini docstring.

erickgalinkin · 2025-02-28T13:33:11Z

garak/configs/fast.yaml

@@ -7,7 +7,7 @@ run:
  generations: 5

 plugins:
-  probe_spec: continuation,dan,encoding.InjectBase64,encoding.InjectHex,goodside,av_spam_scanning,leakreplay,lmrc,malwaregen.SubFunctions,malwaregen.TopLevel,packagehallucination,realtoxicityprompts.RTPIdentity_Attack,realtoxicityprompts.RTPProfanity,realtoxicityprompts.RTPSexually_Explicit,realtoxicityprompts.RTPThreat,snowball,xss
+  probe_spec: ansiescape.AnsiRaw,continuation,dan,encoding.InjectBase64,encoding.InjectHex,goodside,av_spam_scanning,leakreplay,lmrc,malwaregen.SubFunctions,malwaregen.TopLevel,packagehallucination,realtoxicityprompts.RTPIdentity_Attack,realtoxicityprompts.RTPProfanity,realtoxicityprompts.RTPSexually_Explicit,realtoxicityprompts.RTPThreat,snowball,xss


Do we still want av_spam_scanning in the default fast config? It's largely useless for model-only evaluation.

erickgalinkin · 2025-02-28T13:34:08Z

garak/probes/continuation.py


 from garak import _config
 from garak.attempt import Attempt
 from garak.data import path as data_path
 from garak.probes.base import Probe


-class ContinueSlursReclaimedSlurs(Probe):
+class ContinueSlursReclaimedSlursFull(Probe):


I like this change a lot.

erickgalinkin · 2025-02-28T13:38:18Z

garak/probes/dan.py

+    """A subset of jailbreaks from the DanInTheWild jailbreak library.

-    Tries a set of 100 known jailbreaks, in an attempt to get a model to disregard
+    Tries a set of known jailbreaks, in an attempt to get a model to disregard
    alignment/system prompt. Success is measured by not getting a safety mitigaton
    message."""


Suggest mentioning the configurability by changing the soft_probe_prompt_cap setting.

"""A subset of jailbreak from the DanInTheWild jailbreak library. Tries a set of known jailbreaks, in an attempt to get a model to disregard alignment/system prompt. Success is measured by not getting a safety mitigation message. The number of prompts in the subset can be configured via the `soft_probe_prompt_cap` run parameter."""

erickgalinkin · 2025-02-28T13:39:57Z

garak/probes/latentinjection.py

+        if self.follow_prompt_cap and cap is not None:
+            num_ids_to_delete = max(0, len(self.prompts) - cap)
+            ids_to_rm = random.sample(range(len(self.prompts)), num_ids_to_delete)
+            # delete in descending order
+            ids_to_rm = sorted(ids_to_rm, reverse=True)
+            for id in ids_to_rm:
+                del self.prompts[id]
+                del self.triggers[id]


Why are we deleting instead of subselecting? I don't oppose it, but it is curious to me!

I think I answered my own question by looking at the subclassing.

erickgalinkin · 2025-02-28T13:48:50Z

garak/resources/fixer/__init__.py

-        for _, klass in inspect.getmembers(mod, inspect.isclass)
-        if klass.__module__.startswith(mod.__name__) and Migration in klass.__bases__
-    ]
+    migrations = sorted(


I'm not sure I'm clear on why the rename order is important, but agreed with @jmartin-tech that this is likely indicative of something brittle that may become an issue down the line.

erickgalinkin · 2025-02-28T13:49:36Z

garak/resources/garak.core.yaml

@@ -15,6 +15,7 @@ run:
  generations: 5
  probe_tags:
  user_agent: "garak/{version} (LLM vulnerability scanner https://garak.ai)"
+  soft_probe_prompt_cap: 256


It brings me great joy that this is a power of two.

jmartin-tech

This PR assumes config_root is always the global module instance _config. An enhancement to enable _config.run items to be distributed in a consistent way for implementers of Configurable is planned and will be needed for this PR.

jmartin-tech · 2025-02-28T14:39:42Z

garak/probes/continuation.py

+            num_ids_to_delete = max(
+                0, len(self.prompts) - config_root.run.soft_probe_prompt_cap
+            )


Cannot assume config_root is the global module _config.

leondz added 16 commits February 24, 2025 11:32

stabilise and make explicit order of multiple Migrations in one fixer…

640d304

… class

update FigStep names

e7c7db5

rename and fixers for snowball

a85c257

add config entry for soft cap on how many prompts per probe

4370779

rename promptinject probes & bind to soft probe prompt cap

169d481

migrate past tense probe names

2e40865

resolve merge

829f97f

rename probes to have lightweight versions as defaults and extended/f…

ea5bed8

…ull versions with special names

shrink LatentInjectionFactSnippetEiffel to soft cap, w/ shuffle

79105cb

rename FalseAssertion, Glitch, use soft cap

3aa6677

fix rename

3b3e786

get order of operations right: set max_prompts after _config is avail…

88411ab

…able

lightweight defaults for latent injection probes

3f03bc2

use random shuffle + prune for lightweight slur continuation

0375e52

move to using shuffling & prompt cap to produce lightweight probes

02d202a

access config_root not _config

7310811

leondz added the probes Content & activity of LLM probes label Feb 26, 2025

leondz added this to the 25.02 Efficiency milestone Feb 26, 2025

leondz requested review from jmartin-tech and erickgalinkin February 26, 2025 21:49

leondz changed the title ~~lightweight probe defaults~~ feature: lightweight probe defaults Feb 26, 2025

leondz marked this pull request as draft February 26, 2025 21:59

jmartin-tech reviewed Feb 26, 2025

View reviewed changes

leondz added 3 commits February 27, 2025 16:25

fixer class sorting should.. work

e28f8c0

update test cases to fit current state of class names

f45938b

constrain class replacement to final position in plugin name

0163a4c

jmartin-tech reviewed Feb 27, 2025

View reviewed changes

garak/resources/fixer/20250224_lightweight_probe_defaults.py Outdated Show resolved Hide resolved

place migrations involving ordered ops into single classes. much tidier

f5d168a

erickgalinkin approved these changes Feb 28, 2025

View reviewed changes

jmartin-tech requested changes Feb 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: lightweight probe defaults #1116

feature: lightweight probe defaults #1116

leondz commented Feb 26, 2025 •

edited

Loading

jmartin-tech Feb 26, 2025 •

edited

Loading

leondz Feb 27, 2025

jmartin-tech Feb 27, 2025

leondz Feb 27, 2025

erickgalinkin Feb 28, 2025

erickgalinkin left a comment

erickgalinkin Feb 28, 2025

erickgalinkin Feb 28, 2025

erickgalinkin Feb 28, 2025

erickgalinkin Feb 28, 2025

erickgalinkin Feb 28, 2025

erickgalinkin Feb 28, 2025

erickgalinkin Feb 28, 2025

jmartin-tech left a comment

jmartin-tech Feb 28, 2025

feature: lightweight probe defaults #1116

Are you sure you want to change the base?

feature: lightweight probe defaults #1116

Conversation

leondz commented Feb 26, 2025 • edited Loading

Verification

jmartin-tech Feb 26, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erickgalinkin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmartin-tech left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leondz commented Feb 26, 2025 •

edited

Loading

jmartin-tech Feb 26, 2025 •

edited

Loading