Add test and train sets to in-loop oe-eval (for ladder work) #748

liujch1998 · 2024-11-18T23:23:13Z

Standardizing what we should eval for the ladder work:

10 tasks from OLMES
val sets, and (when applicable) test and/or train sets
No subsampling happens
rc_5shot, mc_5shot, and their bpb version (for MMLU, I kept a rc_var version)

Stats:

liujch1998 · 2024-11-18T23:33:58Z

olmo/eval/downstream.py

+    "arc_easy_train_rc_5shot": (
+        OEEvalTask,
+        {"dataset_path": "arc_easy", "dataset_name": "train_rc_5shot", "metric_type": "len_norm"},
+    ),  # this used to be acc


Can we safely change acc to len_norm here? I want to match what arc_challenge is doing.

OyvindTafjord

LGTM, I tried to look through the tasks, didn't spot anything suspicious. (The number of tasks is getting pretty unwieldy at this point, would be good to streamline in the future somehow)

liujch1998 added 2 commits November 18, 2024 23:17

Add test and train sets to in-loop oe-eval (for ladder work)

5368d86

Add comment

e3efe9e

liujch1998 marked this pull request as ready for review November 18, 2024 23:23

liujch1998 requested a review from OyvindTafjord November 18, 2024 23:23

liujch1998 commented Nov 18, 2024

View reviewed changes

Update changelog

964062e

OyvindTafjord approved these changes Nov 18, 2024

View reviewed changes

Change boolq rc back to acc

ee99d57

liujch1998 merged commit 9c677c9 into main Nov 19, 2024
12 checks passed

liujch1998 deleted the oeeval-ladder-testtrain branch November 19, 2024 00:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add test and train sets to in-loop oe-eval (for ladder work) #748

Add test and train sets to in-loop oe-eval (for ladder work) #748

liujch1998 commented Nov 18, 2024

liujch1998 Nov 18, 2024

OyvindTafjord left a comment

Add test and train sets to in-loop oe-eval (for ladder work) #748

Add test and train sets to in-loop oe-eval (for ladder work) #748

Conversation

liujch1998 commented Nov 18, 2024

liujch1998 Nov 18, 2024

Choose a reason for hiding this comment

OyvindTafjord left a comment

Choose a reason for hiding this comment