Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend LLM-04: RAG poisoning with glitch tokens causes DoS #283

Open
13 tasks done
mhupfauer opened this issue Apr 10, 2024 · 6 comments
Open
13 tasks done

Extend LLM-04: RAG poisoning with glitch tokens causes DoS #283

mhupfauer opened this issue Apr 10, 2024 · 6 comments
Assignees
Labels
enhancement Changes/additions to the Top 10; eg. clarifications, examples, links to external resources, etc llm-04 Relates to LLM Top-10 entry #4

Comments

@mhupfauer
Copy link
Contributor

mhupfauer commented Apr 10, 2024

Remember, an issue is not the place to ask questions. You can use our Slack channel for that, or you may want to start a discussion on the Discussion Board.

When reporting an issue, please be sure to include the following:

  • Before you open an issue, please check if a similar issue already exists or has been closed before.
  • A descriptive title and apply the specific LLM-0-10 label relative to the entry. See our available labels.
  • A description of the problem you're trying to solve, including why you think this is a problem
  • If the enhancement changes current behavior, reasons why your solution is better
  • What artifact and version of the project you're referencing, and the location (I.E OWASP site, llmtop10.com, repo)
  • The behavior you expect to see, and the actual behavior

Steps to Reproduce


  1. NA

What happens?


2_0_vulns/LLM04_ModelDoS.md does not refer to DoS through glitch tokens injected into the prompt from a RAG solution. If a malicious actor is able to introduce broadly relevant information to the RAG database which includes glitch tokens for the given model that is used they are able to effectively run a denial of service attack for all users of the LLM.

The main issue is that there is no clear indication which token caused the model to glitch so there is no obvious way to automatically remediate such issues. Enterprises build large RAG databases from (mostly) user generated content (i.e. Confluence / SharePoint / ... ) and also update this content frequently. A malicious actor could therefore easily introduce new content to the RAG database, including but not limited to glitch tokens which effectively cause a Denial of Service situation for all end users of the application.

What were you expecting to happen?


Any logs, error output, etc?


Any other comments?


A "normal" glitch token attack doesn't pose a significant threat as it only renders the current user session / context unusable. However through a poisoned RAG database a malicious user can inject these tokens into some/many/most/ if not all conversations, thus causing a broad service outage.

Sources talking about the issue


  1. https://www.youtube.com/watch?v=WO2X3oZEJOA
  2. https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generation
  3. https://www.lesswrong.com/posts/8viQEp8KBg2QSW4Yc/solidgoldmagikarp-iii-glitch-token-archaeology

What versions of hardware and software are you using?


Operating System:
Browser:

  • Chrome
  • Firefox
  • Edge
  • Safari 11
  • Safari 10
  • IE 11
@GangGreenTemperTatum GangGreenTemperTatum added enhancement Changes/additions to the Top 10; eg. clarifications, examples, links to external resources, etc llm-04 Relates to LLM Top-10 entry #4 labels Apr 10, 2024
@GangGreenTemperTatum
Copy link
Collaborator

@kenhuangus , you able to take this? tyia!

@kenhuangus
Copy link
Collaborator

Yes, I will investigate and if needed I will incorprate this vector of attack using glitch token for RAG based LLM Apps.

@kenhuangus
Copy link
Collaborator

@mhupfauer Thanks Markus Hupfauer for the contribution, I will incorpate the following text from Mark.

Description
[...]
An additional Denial of Service method involves glitch tokens — unique, problematic strings of characters that disrupt model processing, resulting in partial or complete failure to produce coherent responses. This vulnerability is magnified as RAGs increasingly source data from dynamic internal resources like collaboration tools and document management systems. Attackers can exploit this by inserting glitch tokens into these sources, thus trigger a Denial of Service by compromising the model's functionality.
Common Examples of Vulnerability
[...]
7. Glitch token RAG poisoning: The attacker introduces glitch tokens to the data sources of the RAGs vector database, thereby introducing these malicious tokens into the model's context window through the RAG process, causing the model to produce (partially) incoherent results.
Prevention and Mitigation Strategies
[...]
8. Build lists of known glitch tokens and scan RAG output before adding it to the model’s context window.
Example Attack Scenarios
[...]
8. An attacker adds glitch tokens to existing documents or creates new documents with such tokens in a collaboration or document management tool. If the RAGs vector database is automatically updated, these malicious tokens are added to its information store. Upon retrieval through the LLM these tokens glitch the inference process, potentially causing the LLM to generate incoherent output.

@mhupfauer
Copy link
Contributor Author

@kenhuangus Thanks for merging my proposal!

@kenhuangus
Copy link
Collaborator

Thank you as well.

@mhupfauer
Copy link
Contributor Author

@kenhuangus: There was a slight copy-paste error I think. Example Attack Scenario is now twice in the document. Not the entire chapter but the headline :)

@mhupfauer mhupfauer reopened this Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Changes/additions to the Top 10; eg. clarifications, examples, links to external resources, etc llm-04 Relates to LLM Top-10 entry #4
Projects
None yet
Development

No branches or pull requests

3 participants