fixed (#86)

Co-authored-by: Evan Frick <[email protected]>
lm-sys · May 13, 2024 · 2b994c0 · 2b994c0
1 parent a640eef
commit 2b994c0
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/blog/2024-05-08-llama3.md b/blog/2024-05-08-llama3.md
@@ -73,7 +73,7 @@ We can further analyze which types of prompts affect win rate by fitting a decis
 <img src="/images/blog/llama3/dtree.svg" style="display:block; margin-top: auto; margin-left: auto; margin-right: auto; margin-bottom: auto; width: 100%"></img>
 <p style="color:gray; text-align: center;">Figure 4. Llama 3-70b-Instruct's win rate conditioned on hierarchical prompt criteria subsets as fitted using a standard decision tree algorithm.</p>
 
-The first thing to notice is that “Specificity” is the root node of the tree, suggesting that this criteria already divides Llama 3-70b-Instruct’s performance into its strengths and weaknesses.  It supports our initial findings above that Llama 3-70b-Instruct is stronger on open-ended prompts (not specific) rather than more objective tasks.  We can traverse further down the tree and see that Llama 3-70b-Instruct is quite strong on open-ended creative prompts (see the blue path), reaching around a 60% win rate against these top models.  Following the orange path, we notice that Llama 3-70b-Instruct has a much lower win rate against top models when answering specific reasoning-based prompts.
+The first thing to notice is that “Specificity” is the root node of the tree, suggesting that this criteria most immediately divides Llama3-70b-Instruct’s performance into its strengths and weaknesses. It supports our initial findings above that Llama3-70b-Instruct is stronger on open-ended tasks rather than more closed-ended tasks. We can traverse further down the tree and see that Llama3-70b-Instruct is quite strong on open-ended creative questions (see the blue path), reaching around a 60% win-rate against these top models. Emperically, these types of questions are often writing and brainstorming style questions. For example two prompts where Llama-3-70B-Instruct won are: "Write the first chapter of a novel." and "Could you provide two story suggestions for children that promote altruism? ". On the other hand, following the orange path, we can notice that Llama3-70b-Instruct has a lower win-rate against top models when answering close-ended, non-real-world, reasoning-based questions. These questions are often logic puzzles and math word word problems. Two examples where Llama-3-70B-Instruct won are: "123x = -4x * 2 - 65" and "There are two ducks in front of a duck, two ducks behind a duck and a duck in the middle. How many ducks are there?"
 
 ## The effect of overrepresented prompts and judges