These are instructions on how to get and modify the datasets used in the benchmark, sorted by what dataset is used for training.
The train and i.i.d. test datasets will automatically be downloaded.
You can download CIFAR-10-C from here and CIFAR-100-C from here using these (or similar) commands:
cd $DATASET_ROOT_DIR
mkdir corrupt_cifar10
cd corrupt_cifar10
wget https://zenodo.org/record/2535967/files/CIFAR-10-C.tar
tar xf CIFAR-10-C.tar
mv CIFAR-10-C CIFAR-C
cd $DATASET_ROOT_DIR
mkdir corrupt_cifar100
cd corrupt_cifar100
wget https://zenodo.org/record/3555552/files/CIFAR-100-C.tar
tar xf CIFAR-100-C.tar
mv CIFAR-100-C CIFAR-C
You can download tinyimagenet from the imagenet website after requesting access. Preprocess the data with the following commands:
unzip tiny-imagenet-200.zip -d $DATASET_ROOT_DIR
cd $DATASET_ROOT_DIR
mv tiny-imagenet-200 tinyimagenet
cd tinyimagenet
rm -rf train test
cat val/val_annotations.txt | cut --output-delimiter=$'\n' -f 1,2 | xargs --verbose -n2 bash -c 'mkdir -p test/$1/images && mv val/images/$0 test/$1/images'
You can download the dataset from here using these (or similar) commands:
cd $DATASET_ROOT_DIR
mkdir tinyimagenet_resize
cd tinyimagenet_resize
wget https://www.dropbox.com/s/kp3my3412u5k9rl/Imagenet_resize.tar.gz
tar --strip-components=1 -xzf Imagenet_resize.tar.gz
You need to download the ImageNet ILSVRC2012 Task 1 & 2 dataset from their
website and move or symlink it to
$DATASET_ROOT_DIR/breeds/ILSVRC
.
These datasets will be downloaded automatically.
You can download the individual datasets from their respective websites:
- PH2 (extract to
$DATASET_ROOT_DIR/ph2
) - HAM10000 (extract to
$DATASET_ROOT_DIR/ham10000
) - derm7pt (extract to
$DATASET_ROOT_DIR/d7p
) - isic2020 (extract to
$DATASET_ROOT_DIR/isic_2020
)
Unpack them into their respective folders, then run the data preprocessing:
fd_shifts prepare --dataset dermoscopy
Download the dataset from their website and extract to $DATASET_ROOT_DIR/rxrx1
.
Then run the data preprocessing:
fd_shifts prepare --dataset microscopy
You can download the individual datasets from their respective websites:
Note
MIMIC requires you to apply for credentialed access.
- CheXpert (extract to
$DATASET_ROOT_DIR/chexpert
) - MIMIC (extract to
$DATASET_ROOT_DIR/mimic
) - NIH14 (extract to
$DATASET_ROOT_DIR/nih14
)
Unpack them into their respective folders, then run the data preprocessing:
fd_shifts prepare --dataset xray
Download the dataset from their website and extract to $DATASET_ROOT_DIR/lidc_idri
.
Prepare the dataset by following the instructions in the separate LIDC readme.
Finaly, run the data preprocessing:
fd_shifts prepare --dataset lung_ct