Skip to content

Wrong exception handling when loading dataset from local disk #173

Open
@ganler

Description

@ganler

try:
# Try first if dataset on a Hub repo
dataset = load_dataset(ds, ds_config, split=split)
except DatasetGenerationError:
# If not, check local dataset
dataset = load_from_disk(os.path.join(ds, split))

Actual exception is ValueError:

[rank5]: Traceback (most recent call last):
[rank5]:   File "run_sft.py", line 251, in <module>
[rank5]:     main()
[rank5]:   File "run_sft.py", line 86, in main
[rank5]:     raw_datasets = get_datasets(
[rank5]:   File "miniconda3/envs/handbook/lib/python3.10/site-packages/alignment/data.py", line 169, in get_datasets
[rank5]:     raw_datasets = mix_datasets(
[rank5]:   File "miniconda3/envs/handbook/lib/python3.10/site-packages/alignment/data.py", line 218, in mix_datasets
[rank5]:     dataset = load_dataset(ds, ds_config, split=split)
[rank5]:   File "miniconda3/envs/handbook/lib/python3.10/site-packages/datasets/load.py", line 2570, in load_dataset
[rank5]:     raise ValueError(
[rank5]: ValueError: You are trying to load a dataset that was saved using `save_to_disk`. Please use `load_from_disk` instead.

Dataset version:

❯ pip show datasets
Name: datasets
Version: 2.19.1
Summary: HuggingFace community-driven open-source library of datasets
Home-page: https://github.com/huggingface/datasets
Author: HuggingFace Inc.
Author-email: [email protected]
License: Apache 2.0
Location: /home/ec2-user/miniconda3/envs/handbook/lib/python3.10/site-packages
Requires: aiohttp, dill, filelock, fsspec, huggingface-hub, multiprocess, numpy, packaging, pandas, pyarrow, pyarrow-hotfix, pyyaml, requests, tqdm, xxhash
Required-by: alignment-handbook, evaluate, trl

Also tried the latest 2.19.2 and got the same error. Need to broaden the exceptions to capture.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions