Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
infwinston committed May 20, 2024
1 parent 330cf68 commit 0b84ea3
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion blog/2024-05-17-category-hard.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ We are commited to continuously enhance the Chatbot Arena leaderboard and share

### Note: Enhancing Quality Through De-duplication

To improve the overall quality of prompts in Chatbot Arena, we also implement a de-duplication pipeline. This new pipeline aims to remove overly redundant user prompts that might skew the distribution and affect the accuracy of our leaderboard. During our analysis, we noticed that many first-time users tend to ask similar greeting prompts, such as "hello," leading to an over-representation of these types of queries. To address this, we down-sample the top 0.01% most common prompts (approximately 100 prompts, mostly greetings in different languages) to the 99.99% percentile frequency (approximately 150 occurrences). After this process, about 6% of the votes are removed. We believe this helps maintain a diverse and high-quality set of prompts for evaluation.
To improve the overall quality of prompts in Chatbot Arena, we also implement a de-duplication pipeline. This new pipeline aims to remove overly redundant user prompts that might skew the distribution and affect the accuracy of our leaderboard. During our analysis, we noticed that many first-time users tend to ask similar greeting prompts, such as "hello," leading to an over-representation of these types of queries. To address this, we down-sample the top 0.01% most common prompts (approximately 1000 prompts, mostly greetings in different languages) to the 99.9% percentile frequency (25 occurrences). After this process, about 8.6% of the votes are removed. We believe this helps maintain a diverse and high-quality set of prompts for evaluation. We hope to encourage users to submit more unique & fresh prompts to reduce the risk of contamination.

We have also open-sourced this de-duplication script on [Github](https://github.com/lm-sys/FastChat/tree/main/fastchat/serve/monitor) and publish the vote data with de-duplication tags in the [notebook](https://colab.research.google.com/drive/1KdwokPjirkTmpO_P1WByFNFiqxWQquwH#scrollTo=CP35mjnHfpfN). We will continue to monitor the impact of this de-duplication process on the leaderboard and make adjustments as necessary to ensure the diversity and quality of our dataset.

Expand Down

0 comments on commit 0b84ea3

Please sign in to comment.