From 4f34211b0084f0dd3cd2cdaf6a18c05d53b9b3fa Mon Sep 17 00:00:00 2001 From: dltemple Date: Sun, 19 Jan 2025 09:31:07 -0600 Subject: [PATCH] pedantic spelling --- CONTRIBUTING.md | 4 ++-- FAQ.md | 2 +- README.md | 44 ++++++++++++++++++++++---------------------- 3 files changed, 25 insertions(+), 25 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 322ac2e0e..a9b8c654f 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -29,7 +29,7 @@ And if you like the project, but just don't have time to contribute, that's fine ## I Have a Question -If you want to ask a question, good places to check first are the [garak quick start docs](https://docs.garak.ai) and, if its a coding question, the [garak reference](https://reference.garak.ai/). +If you want to ask a question, good places to check first are the [garak quick start docs](https://docs.garak.ai) and, if it's a coding question, the [garak reference](https://reference.garak.ai/). Before you ask a question, it is best to search for existing [Issues](https://github.com/NVIDIA/garak/issues) that might help you. In case you have found a suitable issue and still need clarification, you can write your question in this issue. It is also advisable to search the internet for answers first. You can also often find helpful people on the garak [Discord](https://discord.gg/uVch4puUCs). @@ -70,7 +70,7 @@ A good bug report shouldn't leave others needing to chase you up for more inform #### How Do I Submit a Good Bug Report? -You should never report security related issues, vulnerabilities or bugs including sensitive information to the issue tracker, or elsewhere in public. Instead sensitive bugs must be sent by email to security@garak.ai. +You should never report security related issues, vulnerabilities or bugs including sensitive information to the issue tracker, or elsewhere in public. Instead, sensitive bugs must be sent by email to security@garak.ai. We use GitHub issues to track bugs and errors. If you run into an issue with the project: diff --git a/FAQ.md b/FAQ.md index 0aa163920..80243f72c 100644 --- a/FAQ.md +++ b/FAQ.md @@ -79,7 +79,7 @@ No, if the model is the same, you should get the same results - though there are ## How can I scan a RAG pipeline with garak? -Currently the major attack we hear about in RAG systems is indirect prompt injection, and garak already scans for a few of those. +Currently, the major attack we hear about in RAG systems is indirect prompt injection, and garak already scans for a few of those. ## There are so many probes in garak, I was trying to scan a model for all probes, but it took hours and I eventually had to kill that scan. What is the recommended practice on scanning a model? Which typical probes are recommended? diff --git a/README.md b/README.md index 03d9e941d..b9d9f50fd 100644 --- a/README.md +++ b/README.md @@ -228,28 +228,28 @@ For testing. This generator repeats back the prompt it received. ## Intro to probes -| Probe | Description | -| --- | --- | -| blank | A simple probe that always sends an empty prompt. | -| atkgen | Automated Attack Generation. A red-teaming LLM probes the target and reacts to it in an attempt to get toxic output. Prototype, mostly stateless, for now uses a simple GPT-2 [fine-tuned](https://huggingface.co/garak-llm/artgpt2tox) on the subset of hhrlhf attempts that yielded detectable toxicity (the only target currently supported for now). | -| av_spam_scanning | Probes that attempt to make the model output malicious content signatures | -| continuation | Probes that test if the model will continue a probably undesirable word | -| dan | Various [DAN](https://adguard.com/en/blog/chatgpt-dan-prompt-abuse.html) and DAN-like attacks | -| donotanswer | Prompts to which responsible language models should not answer. | -| encoding | Prompt injection through text encoding | -| gcg | Disrupt a system prompt by appending an adversarial suffix. | -| glitch | Probe model for glitch tokens that provoke unusual behavior. | -| grandma | Appeal to be reminded of one's grandmother. | -| goodside | Implementations of Riley Goodside attacks. | -| leakerplay | Evaluate if a model will replay training data. | -| lmrc | Subsample of the [Language Model Risk Cards](https://arxiv.org/abs/2303.18190) probes | -| malwaregen | Attempts to have the model generate code for building malware | -| misleading | Attempts to make a model support misleading and false claims | -| packagehallucination | Trying to get code generations that specify non-existent (and therefore insecure) packages. | -| promptinject | Implementation of the Agency Enterprise [PromptInject](https://github.com/agencyenterprise/PromptInject/tree/main/promptinject) work (best paper awards @ NeurIPS ML Safety Workshop 2022) | -| realtoxicityprompts | Subset of the RealToxicityPrompts work (data constrained because the full test will take so long to run) | -| snowball | [Snowballed Hallucination](https://ofir.io/snowballed_hallucination.pdf) probes designed to make a model give a wrong answer to questions too complex for it to process | -| xss | Look for vulnerabilities the permit or enact cross-site attacks, such as private data exfiltration. | +| Probe | Description | +|----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| blank | A simple probe that always sends an empty prompt. | +| atkgen | Automated Attack Generation. A red-teaming LLM probes the target and reacts to it in an attempt to get toxic output. Prototype, mostly stateless, for now uses a simple GPT-2 [fine-tuned](https://huggingface.co/garak-llm/artgpt2tox) on the subset of hhrlhf attempts that yielded detectable toxicity (the only target currently supported for now). | +| av_spam_scanning | Probes that attempt to make the model output malicious content signatures | +| continuation | Probes that test if the model will continue a probably undesirable word | +| dan | Various [DAN](https://adguard.com/en/blog/chatgpt-dan-prompt-abuse.html) and DAN-like attacks | +| donotanswer | Prompts to which responsible language models should not answer. | +| encoding | Prompt injection through text encoding | +| gcg | Disrupt a system prompt by appending an adversarial suffix. | +| glitch | Probe model for glitch tokens that provoke unusual behavior. | +| grandma | Appeal to be reminded of one's grandmother. | +| goodside | Implementations of Riley Goodside attacks. | +| leakerplay | Evaluate if a model will replay training data. | +| lmrc | Subsample of the [Language Model Risk Cards](https://arxiv.org/abs/2303.18190) probes | +| malwaregen | Attempts to have the model generate code for building malware | +| misleading | Attempts to make a model support misleading and false claims | +| packagehallucination | Trying to get code generations that specify non-existent (and therefore insecure) packages. | +| promptinject | Implementation of the Agency Enterprise [PromptInject](https://github.com/agencyenterprise/PromptInject/tree/main/promptinject) work (best paper awards @ NeurIPS ML Safety Workshop 2022) | +| realtoxicityprompts | Subset of the RealToxicityPrompts work (data constrained because the full test will take so long to run) | +| snowball | [Snowballed Hallucination](https://ofir.io/snowballed_hallucination.pdf) probes designed to make a model give a wrong answer to questions too complex for it to process | +| xss | Look for vulnerabilities the permit or enact cross-site attacks, such as private data exfiltration. | ## Logging