Skip to content

Commit

Permalink
fixed (#86)
Browse files Browse the repository at this point in the history
Co-authored-by: Evan Frick <[email protected]>
  • Loading branch information
efrick2002 and Evan Frick committed May 13, 2024
1 parent a640eef commit 2b994c0
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion blog/2024-05-08-llama3.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ We can further analyze which types of prompts affect win rate by fitting a decis
<img src="/images/blog/llama3/dtree.svg" style="display:block; margin-top: auto; margin-left: auto; margin-right: auto; margin-bottom: auto; width: 100%"></img>
<p style="color:gray; text-align: center;">Figure 4. Llama 3-70b-Instruct's win rate conditioned on hierarchical prompt criteria subsets as fitted using a standard decision tree algorithm.</p>

The first thing to notice is that “Specificity” is the root node of the tree, suggesting that this criteria already divides Llama 3-70b-Instruct’s performance into its strengths and weaknesses. It supports our initial findings above that Llama 3-70b-Instruct is stronger on open-ended prompts (not specific) rather than more objective tasks. We can traverse further down the tree and see that Llama 3-70b-Instruct is quite strong on open-ended creative prompts (see the blue path), reaching around a 60% win rate against these top models. Following the orange path, we notice that Llama 3-70b-Instruct has a much lower win rate against top models when answering specific reasoning-based prompts.
The first thing to notice is that “Specificity” is the root node of the tree, suggesting that this criteria most immediately divides Llama3-70b-Instruct’s performance into its strengths and weaknesses. It supports our initial findings above that Llama3-70b-Instruct is stronger on open-ended tasks rather than more closed-ended tasks. We can traverse further down the tree and see that Llama3-70b-Instruct is quite strong on open-ended creative questions (see the blue path), reaching around a 60% win-rate against these top models. Emperically, these types of questions are often writing and brainstorming style questions. For example two prompts where Llama-3-70B-Instruct won are: "Write the first chapter of a novel." and "Could you provide two story suggestions for children that promote altruism? ". On the other hand, following the orange path, we can notice that Llama3-70b-Instruct has a lower win-rate against top models when answering close-ended, non-real-world, reasoning-based questions. These questions are often logic puzzles and math word word problems. Two examples where Llama-3-70B-Instruct won are: "123x = -4x * 2 - 65" and "There are two ducks in front of a duck, two ducks behind a duck and a duck in the middle. How many ducks are there?"

## The effect of overrepresented prompts and judges

Expand Down

0 comments on commit 2b994c0

Please sign in to comment.