Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QOL changes for generations #166

Merged
merged 21 commits into from
Jan 8, 2024
Merged

QOL changes for generations #166

merged 21 commits into from
Jan 8, 2024

Conversation

maxmatical
Copy link
Collaborator

@maxmatical maxmatical commented Nov 15, 2023

  1. append task_name for multiple tasks
  2. fix n_tasks logic from dataset index out of bounds

Next 2 changes related to #58
4. save intermediate generations with --save_every_k_samples
5. resume generation from intermediate generations with --load_generations_intermediate_paths

Tested with HumanEval with saving every 50 samples + loading from intermediate generations:

len(intermediate_generations) = 150
should be generating 14 new samples for new_generations
curr_sample_idx = 150
number of problems for this task is 14
len(dataloader)= 14
len(code_gens) = 14

len(new_generations) = 14
len(generations) after concatenating = 164

Verified:

  • loading form intermediate generations generates same output as with --limit_start 150 on HumanEval
  • Saved generations match final generations
  • Eval metrics unchanged
  1. (minor) add some typing + linting

@maxmatical maxmatical marked this pull request as ready for review November 16, 2023 17:39
Copy link
Contributor

@RaymondLi0 RaymondLi0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, thank you!
I am wondering whether there would be an issue with several restarts (see comment)

bigcode_eval/evaluator.py Show resolved Hide resolved
bigcode_eval/evaluator.py Show resolved Hide resolved
@@ -332,7 +334,8 @@ def complete_code(
gen_token_dict,
)
with open(intermediate_save_generations_path, "w") as fp:
json.dump(code_gens, fp)
intermediate_generations.extend(code_gens)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there are multiple saving steps, I think we'll add the same generations several times into intermediate_generations.
Also this list is also extended in evaluator.py

generations.extend(new_generations)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, made 2 changes in the new commit

  1. instead of extending on intermediate_generations when saving which end up duplicating new generations, extend on a deepcopy instead which prevents duplications
  2. instead of extending at evaluator, return intermediate_generations.extend(code_gens) in complete_code, which makes a bit more sense with this new logic. since we never mutate intermediate_generations when saving, this will return the non-duplicated generations

Copy link
Contributor

@RaymondLi0 RaymondLi0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small suggestion. Then good to merge! Thank you

bigcode_eval/utils.py Outdated Show resolved Hide resolved
@maxmatical maxmatical merged commit 199eeec into main Jan 8, 2024
1 check passed
@maxmatical maxmatical deleted the max/save-intermediate-gen branch January 8, 2024 15:56
phuonglvh pushed a commit to phuonglvh/bigcode-evaluation-harness that referenced this pull request Nov 15, 2024
…intermediate-gen

QOL changes for generations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants