From f8b9bbf01c268ceda637614d1289d74ca18e9e21 Mon Sep 17 00:00:00 2001 From: Wei-Lin Chiang Date: Mon, 20 May 2024 10:16:31 -0700 Subject: [PATCH] update --- blog/2024-05-17-category-hard.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/2024-05-17-category-hard.md b/blog/2024-05-17-category-hard.md index 7de00bad..aff2b4ef 100644 --- a/blog/2024-05-17-category-hard.md +++ b/blog/2024-05-17-category-hard.md @@ -82,7 +82,7 @@ We are commited to continuously enhance the Chatbot Arena leaderboard and share ### Note: Enhancing Quality Through De-duplication -To improve the overall quality of prompts in Chatbot Arena, we also implement a de-duplication pipeline. This new pipeline aims to remove overly redundant user prompts that might skew the distribution and affect the accuracy of our leaderboard. During our analysis, we noticed that many first-time users tend to ask similar greeting prompts, such as "hello," leading to an over-representation of these types of queries. To address this, we down-sample the top 0.01% most common prompts (approximately 1000 prompts, mostly greetings in different languages) to the 99.9% percentile frequency (25 occurrences). After this process, about 8.6% of the votes are removed. We believe this helps maintain a diverse and high-quality set of prompts for evaluation. We hope to encourage users to submit more unique & fresh prompts to reduce the risk of contamination. +To improve the overall quality of prompts in Chatbot Arena, we also implement a de-duplication pipeline. This new pipeline aims to remove overly redundant user prompts that might skew the distribution and affect the accuracy of our leaderboard. During our analysis, we noticed that many first-time users tend to ask similar greeting prompts, such as "hello," leading to an over-representation of these types of queries. To address this, we down-sample the top 0.1% most common prompts (approximately 1000 prompts, mostly greetings in different languages) to the 99.9% percentile frequency (25 occurrences). After this process, about 8.6% of the votes are removed. We believe this helps maintain a diverse and high-quality set of prompts for evaluation. We hope to encourage users to submit more unique & fresh prompts to reduce the risk of contamination. We have also open-sourced this de-duplication script on [Github](https://github.com/lm-sys/FastChat/tree/main/fastchat/serve/monitor) and publish the vote data with de-duplication tags in the [notebook](https://colab.research.google.com/drive/1KdwokPjirkTmpO_P1WByFNFiqxWQquwH#scrollTo=CP35mjnHfpfN). We will continue to monitor the impact of this de-duplication process on the leaderboard and make adjustments as necessary to ensure the diversity and quality of our dataset.