Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
infwinston committed May 20, 2024
1 parent f8b9bbf commit 0da2a99
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion blog/2024-05-17-category-hard.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ A few weeks ago, we introduce the [Arena-Hard](https://lmsys.org/blog/2024-04-19
</tr>
</table>

We employ Meta's **Llama-3-70B-Instruct** as the judge model to help us label over 1 million Arena battles. Figure 3 shows the criteria breakdown (i.e., how many prompts satisfy each criteria). We observe the most common criteria are Specificity, Domain Knowledge, and Real-world Application, while the relatively rare criteria are Problem-Solving and Complexity.
We employ Meta's **Llama-3-70B-Instruct** to help us label over 1 million Arena prompts on whether certain critieria are met. Note that we use human preference votes to rank models rather than LLM judges. Figure 3 shows the criteria breakdown (i.e., how many prompts satisfy each criteria). We observe the most common criteria are Specificity, Domain Knowledge, and Real-world Application, while the relatively rare criteria are Problem-Solving and Complexity.

<img src="/images/blog/category_hard/key_criteria_breakdown.png" style="display:block; margin-top: auto; margin-left: auto; margin-right: auto; margin-bottom: auto; width: 85%"></img>
<p style="color:gray; text-align: center;">Figure 3. The percentage of each criteria within 1 million Chatbot Arena data.</p>
Expand Down

0 comments on commit 0da2a99

Please sign in to comment.