Skip to content

Commit

Permalink
table cleanup (#80)
Browse files Browse the repository at this point in the history
  • Loading branch information
efrick2002 committed May 8, 2024
1 parent 770f570 commit 1d8a179
Showing 1 changed file with 4 additions and 5 deletions.
9 changes: 4 additions & 5 deletions blog/2024-05-01-llama3.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,20 +117,19 @@ td {text-align: left}
<table style="display: flex; justify-content: center;">
<tbody>
<tr>
<th>Model</th> <th>Battles</th> <th>Unique Judges</th> <th>Mean Votes per Judge</th> <th>Median Votes per Judge</th> <th>Max Votes per Judge</th> <th>Mean Judge Age</th> <th>Median Judge Age</th> <th>Max Judge Age</th>
<th>Model</th> <th>Battles</th> <th>Unique Judges</th> <th>Mean Votes per Judge</th> <th>Median Votes per Judge</th> <th>Max Votes per Judge</th>
</tr>
<tr>
<td>Llama 3-70B-Instruct</td> <td>12,719</td> <td>7,591</td> <td>1.68</td> <td>1</td> <td>65</td> <td>1 hr 10 min</td> <td>2 hr 12 min</td> <td>2 days</td>
<td>Llama 3-70B-Instruct</td> <td>12,719</td> <td>7,591</td> <td>1.68</td> <td>1</td> <td>65</td>
</tr>
<tr>
<td>Claude-3-opus-20240229</td> <td>68,656</td> <td>48,570</td> <td>1.41</td> <td>1</td> <td>73</td> <td>1 hr 57 min</td> <td>1 hr 55 min</td> <td>3 days</td>
<td>Claude-3-opus-20240229</td> <td>68,656</td> <td>48,570</td> <td>1.41</td> <td>1</td> <td>73</td>
</tr>
<tr>
<td>All Models All Time</td> <td>749,205</td> <td>316,372</td> <td>2.37</td> <td>1</td> <td>591</td> <td>8 hr 27 min</td> <td>2 hr 23 min</td> <td>295 days</td>
<td>All Models All Time</td> <td>749,205</td> <td>316,372</td> <td>2.37</td> <td>1</td> <td>591</td>
</tr>
</tbody>
</table>
<p>*Judge age is defined as rating_date - min_rating_date(ip_address)</p>


In order to limit the impact of user’s that vote many times we can take the mean of each judge’s win rate, thereby bounding the impact of each individual judge. In this case, we find this stratified win rate shown in Table 3 is still very similar to the original winrate, suggesting that very active judges are not skewing the result.
Expand Down

0 comments on commit 1d8a179

Please sign in to comment.