Skip to content

Commit

Permalink
Merge branch 'llama3' of github.com:lm-sys/lm-sys.github.io into llama3
Browse files Browse the repository at this point in the history
  • Loading branch information
Lisa Dunlap authored and Lisa Dunlap committed May 8, 2024
2 parents 4901981 + 246007f commit 54322bf
Showing 1 changed file with 4 additions and 5 deletions.
9 changes: 4 additions & 5 deletions blog/2024-05-01-llama3.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,20 +117,19 @@ td {text-align: left}
<table style="display: flex; justify-content: center;">
<tbody>
<tr>
<th>Model</th> <th>Battles</th> <th>Unique Judges</th> <th>Mean Votes per Judge</th> <th>Median Votes per Judge</th> <th>Max Votes per Judge</th> <th>Mean Judge Age</th> <th>Median Judge Age</th> <th>Max Judge Age</th>
<th>Model</th> <th>Battles</th> <th>Unique Judges</th> <th>Mean Votes per Judge</th> <th>Median Votes per Judge</th> <th>Max Votes per Judge</th>
</tr>
<tr>
<td>Llama 3-70B-Instruct</td> <td>12,719</td> <td>7,591</td> <td>1.68</td> <td>1</td> <td>65</td> <td>1 hr 10 min</td> <td>2 hr 12 min</td> <td>2 days</td>
<td>Llama 3-70B-Instruct</td> <td>12,719</td> <td>7,591</td> <td>1.68</td> <td>1</td> <td>65</td>
</tr>
<tr>
<td>Claude-3-opus-20240229</td> <td>68,656</td> <td>48,570</td> <td>1.41</td> <td>1</td> <td>73</td> <td>1 hr 57 min</td> <td>1 hr 55 min</td> <td>3 days</td>
<td>Claude-3-opus-20240229</td> <td>68,656</td> <td>48,570</td> <td>1.41</td> <td>1</td> <td>73</td>
</tr>
<tr>
<td>All Models All Time</td> <td>749,205</td> <td>316,372</td> <td>2.37</td> <td>1</td> <td>591</td> <td>8 hr 27 min</td> <td>2 hr 23 min</td> <td>295 days</td>
<td>All Models All Time</td> <td>749,205</td> <td>316,372</td> <td>2.37</td> <td>1</td> <td>591</td>
</tr>
</tbody>
</table>
<p>*Judge age is defined as rating_date - min_rating_date(ip_address)</p>


In order to limit the impact of user’s that vote many times we can take the mean of each judge’s win rate, thereby bounding the impact of each individual judge. In this case, we find this stratified win rate shown in Table 3 is still very similar to the original winrate, suggesting that very active judges are not skewing the result.
Expand Down

0 comments on commit 54322bf

Please sign in to comment.