Open
Description
from the README from /scripts
.
datasets_mixer:
dataset_1: 0.5 # Use 50% of the training examples
dataset_2: 0.66 # Use 66% of the training examples
dataset_3: 0.10 # Use 10% of the training examples
dataset_splits:
- train_xxx # The training splits to mix
- test_xxx # The test splits to mix
From the comments, it looks like ONLY training samples from dataset_1
, dataset_2
, and dataset_3
are considered. There isn't explanation how each dataset contributes to the test_xxx
split.
However, the actual implementation seems like searching the test_xxx
split from all datasets specified:
alignment-handbook/src/alignment/data.py
Lines 225 to 230 in 70769f9
Could you please explain the relationships between multiple datasets and splits?
Thank you.
Metadata
Metadata
Assignees
Labels
No labels