You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for the repo! I was wondering if you would please point out which lines of code are for the "rank classification" idea used for evaluating the multiple-choice style tasks?
The paper describes it like this on Page 6:
For tasks that involve choosing the correct completion from several options (e.g. multiple choice
question answering), we follow Brown et al. (2020) and use rank classification to evaluate our
model: we compute the log-likelihood of each of the target options under the fine-tuned model and
select the option with the highest log-likelihood as the prediction. For simplicity, we do not apply
length normalization to the log-likelihoods of the target options.
Thank you!
The text was updated successfully, but these errors were encountered:
However, I was wondering if you would please help give a short tutorial that how we can use the same idea to easily evaluate other LMs (say a fine-tuned BART) to make sure the comparisons are fair.
Hi,
Thanks for the repo! I was wondering if you would please point out which lines of code are for the "rank classification" idea used for evaluating the multiple-choice style tasks?
The paper describes it like this on Page 6:
Thank you!
The text was updated successfully, but these errors were encountered: