-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: lightweight probe defaults #1116
base: main
Are you sure you want to change the base?
Conversation
…ull versions with special names
for _, klass in inspect.getmembers(mod, inspect.isclass) | ||
if klass.__module__.startswith(mod.__name__) and Migration in klass.__bases__ | ||
] | ||
migrations = sorted( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The classes in a single fixer should likely be idempotent for a module. Sorting would require accessing class name from the object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Order of operations is crucial here. As an alternative, splitting the fixer ops into multiple files seems unintuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modules already provide order, the fact that class order matters as well is somewhat concerning, should some of the actions be combined to consolidate and reduce reliance on ordering a the class level?
I am not against adding that requirement, just concerned it may indicate edge cases that will make maintaining these more difficult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rename order is important, and because the fixers I'd seen returned
the migration, I read this as one change per class. Would be happy to manage two renames per class - but I don't know how to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I'm clear on why the rename order is important, but agreed with @jmartin-tech that this is likely indicative of something brittle that may become an issue down the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Largely looks good. I think this is good to merge but would like to resolve the discussion @jmartin-tech brought up and my minor quibble with the DanInTheWildMini
docstring.
@@ -7,7 +7,7 @@ run: | |||
generations: 5 | |||
|
|||
plugins: | |||
probe_spec: continuation,dan,encoding.InjectBase64,encoding.InjectHex,goodside,av_spam_scanning,leakreplay,lmrc,malwaregen.SubFunctions,malwaregen.TopLevel,packagehallucination,realtoxicityprompts.RTPIdentity_Attack,realtoxicityprompts.RTPProfanity,realtoxicityprompts.RTPSexually_Explicit,realtoxicityprompts.RTPThreat,snowball,xss | |||
probe_spec: ansiescape.AnsiRaw,continuation,dan,encoding.InjectBase64,encoding.InjectHex,goodside,av_spam_scanning,leakreplay,lmrc,malwaregen.SubFunctions,malwaregen.TopLevel,packagehallucination,realtoxicityprompts.RTPIdentity_Attack,realtoxicityprompts.RTPProfanity,realtoxicityprompts.RTPSexually_Explicit,realtoxicityprompts.RTPThreat,snowball,xss |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still want av_spam_scanning
in the default fast config? It's largely useless for model-only evaluation.
|
||
from garak import _config | ||
from garak.attempt import Attempt | ||
from garak.data import path as data_path | ||
from garak.probes.base import Probe | ||
|
||
|
||
class ContinueSlursReclaimedSlurs(Probe): | ||
class ContinueSlursReclaimedSlursFull(Probe): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change a lot.
"""A subset of jailbreaks from the DanInTheWild jailbreak library. | ||
|
||
Tries a set of 100 known jailbreaks, in an attempt to get a model to disregard | ||
Tries a set of known jailbreaks, in an attempt to get a model to disregard | ||
alignment/system prompt. Success is measured by not getting a safety mitigaton | ||
message.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest mentioning the configurability by changing the soft_probe_prompt_cap
setting.
"""A subset of jailbreak from the DanInTheWild jailbreak library.
Tries a set of known jailbreaks, in an attempt to get a model to disregard
alignment/system prompt. Success is measured by not getting a safety mitigation
message. The number of prompts in the subset can be configured via the
`soft_probe_prompt_cap` run parameter."""
if self.follow_prompt_cap and cap is not None: | ||
num_ids_to_delete = max(0, len(self.prompts) - cap) | ||
ids_to_rm = random.sample(range(len(self.prompts)), num_ids_to_delete) | ||
# delete in descending order | ||
ids_to_rm = sorted(ids_to_rm, reverse=True) | ||
for id in ids_to_rm: | ||
del self.prompts[id] | ||
del self.triggers[id] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we deleting instead of subselecting? I don't oppose it, but it is curious to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I answered my own question by looking at the subclassing.
for _, klass in inspect.getmembers(mod, inspect.isclass) | ||
if klass.__module__.startswith(mod.__name__) and Migration in klass.__bases__ | ||
] | ||
migrations = sorted( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I'm clear on why the rename order is important, but agreed with @jmartin-tech that this is likely indicative of something brittle that may become an issue down the line.
@@ -15,6 +15,7 @@ run: | |||
generations: 5 | |||
probe_tags: | |||
user_agent: "garak/{version} (LLM vulnerability scanner https://garak.ai)" | |||
soft_probe_prompt_cap: 256 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It brings me great joy that this is a power of two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR assumes config_root
is always the global module instance _config
. An enhancement to enable _config.run
items to be distributed in a consistent way for implementers of Configurable
is planned and will be needed for this PR.
num_ids_to_delete = max( | ||
0, len(self.prompts) - config_root.run.soft_probe_prompt_cap | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot assume config_root
is the global module _config
.
Resolves #1032
Full
versions(NB Getting some
black
churn)Verification
garak --list_probes
reveals no -Mini
or -80 (etc) probe namesgarak --list_probes
shows all -Full
probes are inactivegarak -m test -g 1
shows no probe having over 256 prompts (config.run.soft_probe_prompt_cap
) - calculations for 256 available on requestconfig.run.soft_probe_prompt_cap
changes #prompts for affected probes