You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It has been noticed that LLM exhibits the Chain-of-Thought prompting property when trained long enough. For instance, if one asks a model to perform a reasoning task, than it might fail:
Model Input:
Q: Roger has 5 tennis balls. He buys 2 more cans of
tennis balls. Each can has 3 tennis balls. How many
tennis balls does he have now?
A: The answer is 11.
Q: The cafeteria had 23 apples. If they used 20 to
make lunch and bought 6 more, how many apples
do they have? Model Output:
A: The answer is 27.
But with an additional Chain-of-Thought prompt, the model is able to predict the correct answer:
Model Input:
Q: Roger has 5 tennis balls. He buys 2 more cans of
tennis balls. Each can has 3 tennis balls. How many
tennis balls does he have now?
A: Roger started with 5 balls. 2 cans of 3 tennis balls
each is 6 tennis balls. 5 + 6 = 11. The answer is 11.
Q: The cafeteria had 23 apples. If they used 20 to
make lunch and bought 6 more, how many apples
do they have? Model output:
A: The cafeteria had 23 apples originally. They used
20 to make lunch. So they had 23 - 20 = 3. They
bought 6 more apples, so they have 3 + 6 = 9. The
answer is 9.
Feature extraction
Neural networks extract low-level features closer to the input of a network and high-level features closer to the output layer. Moreover, it has been shown that it is possible to reconstruct a syntax tree using BERT's features, indicating that it extracts some "higher-level" knowledge from the training data. Unlike BERT, today's LLMs predict the next token, but it also might be reasonable to expect from them to have the similar property.
Problem Statement
Given a CoT prompt P. From this, a model starts to generate a sequence of steps or facts (usually, the model represents a single fact as one sentence):
P ==> F1.
F1 ==> F2.
F2 ==> F3.
...
F_(n-1) ==> F_n.
Given that the next fact is dependent on the previous one, can we see a similar dependency in the activations of a network? Namely, can we see that in most of cases the weights for the fact F_i are activated deeper in the network compared to the weights for the fact F_j, when j < i?
StrategyQA is a question-answering benchmark focusing on open-domain questions where the required reasoning steps are implicit in the question and should be inferred using a strategy. StrategyQA includes 2,780 examples, each consisting of a strategy question, its decomposition, and evidence paragraphs.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Contents
Overview
Problem statement
Datasets
Bibliography
Overview
Chain-of-Thought
It has been noticed that LLM exhibits the Chain-of-Thought prompting property when trained long enough. For instance, if one asks a model to perform a reasoning task, than it might fail:
But with an additional Chain-of-Thought prompt, the model is able to predict the correct answer:
Feature extraction
Neural networks extract low-level features closer to the input of a network and high-level features closer to the output layer. Moreover, it has been shown that it is possible to reconstruct a syntax tree using BERT's features, indicating that it extracts some "higher-level" knowledge from the training data. Unlike BERT, today's LLMs predict the next token, but it also might be reasonable to expect from them to have the similar property.
Problem Statement
Given a CoT prompt
P
. From this, a model starts to generate a sequence of steps or facts (usually, the model represents a single fact as one sentence):Given that the next fact is dependent on the previous one, can we see a similar dependency in the activations of a network? Namely, can we see that in most of cases the weights for the fact
F_i
are activated deeper in the network compared to the weights for the factF_j
, when j < i?Datasets
StrategtQA Dataset
Bibliography
Wei at al. -- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chi et al. -- Finding Universal Grammatical Relations in Multilingual BERT
Beta Was this translation helpful? Give feedback.
All reactions