Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add XLSum evaluation / unify eval script #12

Open
wants to merge 166 commits into
base: sentence_retrieval_eval
Choose a base branch
from

Conversation

haileyschoelkopf
Copy link
Collaborator

Submitting a PR from fork because I may not have edit access to this repo.

In this PR: added adapters_eval.py , a script that can be used to evaluate on XLSum or XNLI based on the 'dataset' flag.
Also working on adding deepspeed compatibility via Huggingface Trainer / command line.

TODO/needs checking:

  • rouge compute_metrics function could be wrong. I will try to check this
  • make sure the logic within load_model for setting adapters to train / adding adapters is correct.
  • Has the FIXME in adapters_xnli_de.py been dealt with?

@yongzx
Copy link
Collaborator

yongzx commented May 11, 2022

Thanks Hailey!

(Referring to #11) Will resolve this PR once Vassilina and I have finalized on our evaluation script on XNLI. Apologies for the delay.

@yongzx
Copy link
Collaborator

yongzx commented Jul 7, 2022

@haileyschoelkopf Can you help review b0a23c5? Thank you!
I've tested it and the training and evaluation (on baseline BLOOM and GPT2 models) are working. The only minor issue is that the evaluation that uses model.generate takes quite long (even for num_beams = 1).

@haileyschoelkopf
Copy link
Collaborator Author

Yes I can! I might only get to it tomorrow though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants