You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear @XiangLi1999 and @ari-holtzman,
if I understand correctly the paper, in section 3.4, mentions that the amateur (student) model is conditioned on a context window which starts from the last token of the prompt. I cannot find any trace of such a choice in the code, for instance here and here the whole input is passed to the amateur model, as seen by the expert too.
I cannot find the relative study in the ablation script either.
Am I missing some argument/logic that sets the amateur's context window somewhere else in the code?
Best,
Marco
The text was updated successfully, but these errors were encountered:
Great, thanks!
Just one last clarification, I might be misunderstanding the code but it seems like the function is feeding to the amateur only the last generated token, so the amateur is computing $p(x_i|x_{i_1})$. Can you confirm it?
While section 3.4 of the paper seems to states that the amateur is conditioned on the last token of the prompt + all the generated tokens.
I think the code is doing what section 3.4 states, conditioning on last token in prompt + generated tokens. You can verify this by printing the past_key_values argument. This works because of the caching implementation in huggingface, once a token is generated, it will be encoded as past_key_values to save some redundant computation.
Dear @XiangLi1999 and @ari-holtzman,
if I understand correctly the paper, in section 3.4, mentions that the amateur (student) model is conditioned on a context window which starts from the last token of the prompt. I cannot find any trace of such a choice in the code, for instance here and here the whole input is passed to the amateur model, as seen by the expert too.
I cannot find the relative study in the ablation script either.
Am I missing some argument/logic that sets the amateur's context window somewhere else in the code?
Best,
Marco
The text was updated successfully, but these errors were encountered: