From d7a9c483e7544e93a990b3e4bbe43149b7fb7ae6 Mon Sep 17 00:00:00 2001 From: Wei-Lin Chiang Date: Fri, 13 Sep 2024 09:54:48 -0700 Subject: [PATCH] Update 2024-09-13-redteam-arena.md Add link --- blog/2024-09-13-redteam-arena.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/2024-09-13-redteam-arena.md b/blog/2024-09-13-redteam-arena.md index d0626dc5..bf669856 100644 --- a/blog/2024-09-13-redteam-arena.md +++ b/blog/2024-09-13-redteam-arena.md @@ -12,7 +12,7 @@ We are excited to launch [RedTeam Arena](https://redarena.ai), a community-drive

Figure 1: RedTeam Arena with Bad Words at redarena.ai

-RedTeam Arena is an open-source red-teaming platform for LLMs. Our plan is to provide games that people can play to have fun, while sharpening their red-teaming skills. The first game we created is called *[Bad Words](https://redarena.ai)*, challenging players to convince models to say target "bad words”. It already has strong community adoption, with thousands of users participating and competing for the top spot on the jailbreaker leaderboard. +RedTeam Arena is an [open-source](https://github.com/redteaming-arena/redteam-arena) red-teaming platform for LLMs. Our plan is to provide games that people can play to have fun, while sharpening their red-teaming skills. The first game we created is called *[Bad Words](https://redarena.ai)*, challenging players to convince models to say target "bad words”. It already has strong community adoption, with thousands of users participating and competing for the top spot on the jailbreaker leaderboard. We plan to open the data after a short responsible disclosure delay. We hope this data will help the community determine the boundaries of AI models—how they can be controlled and convinced.