You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[rank5]: Traceback (most recent call last):
[rank5]: File "run_sft.py", line 251, in <module>
[rank5]: main()
[rank5]: File "run_sft.py", line 86, in main
[rank5]: raw_datasets = get_datasets(
[rank5]: File "miniconda3/envs/handbook/lib/python3.10/site-packages/alignment/data.py", line 169, in get_datasets
[rank5]: raw_datasets = mix_datasets(
[rank5]: File "miniconda3/envs/handbook/lib/python3.10/site-packages/alignment/data.py", line 218, in mix_datasets
[rank5]: dataset = load_dataset(ds, ds_config, split=split)
[rank5]: File "miniconda3/envs/handbook/lib/python3.10/site-packages/datasets/load.py", line 2570, in load_dataset
[rank5]: raise ValueError(
[rank5]: ValueError: You are trying to load a dataset that was saved using `save_to_disk`. Please use `load_from_disk` instead.
Hi here @ganler, thanks for reporting! Do you want to open a PR to fix the data loading handling? Otherwise, feel free to ping us and we can have a look at it, but as you're pointing out, the following should do the work:
more interesting is that if we use "train" or "test" as splits, it can load the data that saved use save_to_disk function in wrong way. So change to ValueError is just a temp solution. Any suggestion on a better way to handle this problem?
e.g.,
alignment-handbook/src/alignment/data.py
Lines 216 to 221 in 606d2e9
Actual exception is
ValueError
:Dataset version:
Also tried the latest
2.19.2
and got the same error. Need to broaden the exceptions to capture.The text was updated successfully, but these errors were encountered: