-
Notifications
You must be signed in to change notification settings - Fork 940
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs(datasets) Add recommended datasets list to index (#4627)
Co-authored-by: jafermarq <[email protected]>
- Loading branch information
1 parent
93d6ae4
commit d8b00b3
Showing
3 changed files
with
169 additions
and
153 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,153 @@ | ||
Image Datasets | ||
~~~~~~~~~~~~~~ | ||
|
||
.. list-table:: Image Datasets | ||
:widths: 40 40 20 | ||
:header-rows: 1 | ||
|
||
* - Name | ||
- Size | ||
- Image Shape | ||
* - `ylecun/mnist <https://huggingface.co/datasets/ylecun/mnist>`_ | ||
- train 60k; | ||
test 10k | ||
- 28x28 | ||
* - `uoft-cs/cifar10 <https://huggingface.co/datasets/uoft-cs/cifar10>`_ | ||
- train 50k; | ||
test 10k | ||
- 32x32x3 | ||
* - `uoft-cs/cifar100 <https://huggingface.co/datasets/uoft-cs/cifar100>`_ | ||
- train 50k; | ||
test 10k | ||
- 32x32x3 | ||
* - `zalando-datasets/fashion_mnist <https://huggingface.co/datasets/zalando-datasets/fashion_mnist>`_ | ||
- train 60k; | ||
test 10k | ||
- 28x28 | ||
* - `flwrlabs/femnist <https://huggingface.co/datasets/flwrlabs/femnist>`_ | ||
- train 814k | ||
- 28x28 | ||
* - `zh-plus/tiny-imagenet <https://huggingface.co/datasets/zh-plus/tiny-imagenet>`_ | ||
- train 100k; | ||
valid 10k | ||
- 64x64x3 | ||
* - `flwrlabs/usps <https://huggingface.co/datasets/flwrlabs/usps>`_ | ||
- train 7.3k; | ||
test 2k | ||
- 16x16 | ||
* - `flwrlabs/pacs <https://huggingface.co/datasets/flwrlabs/pacs>`_ | ||
- train 10k | ||
- 227x227 | ||
* - `flwrlabs/cinic10 <https://huggingface.co/datasets/flwrlabs/cinic10>`_ | ||
- train 90k; | ||
valid 90k; | ||
test 90k | ||
- 32x32x3 | ||
* - `flwrlabs/caltech101 <https://huggingface.co/datasets/flwrlabs/caltech101>`_ | ||
- train 8.7k | ||
- varies | ||
* - `flwrlabs/office-home <https://huggingface.co/datasets/flwrlabs/office-home>`_ | ||
- train 15.6k | ||
- varies | ||
* - `flwrlabs/fed-isic2019 <https://huggingface.co/datasets/flwrlabs/fed-isic2019>`_ | ||
- train 18.6k; | ||
test 4.7k | ||
- varies | ||
* - `ufldl-stanford/svhn <https://huggingface.co/datasets/ufldl-stanford/svhn>`_ | ||
- train 73.3k; | ||
test 26k; | ||
extra 531k | ||
- 32x32x3 | ||
* - `sasha/dog-food <https://huggingface.co/datasets/sasha/dog-food>`_ | ||
- train 2.1k; | ||
test 0.9k | ||
- varies | ||
* - `Mike0307/MNIST-M <https://huggingface.co/datasets/Mike0307/MNIST-M>`_ | ||
- train 59k; | ||
test 9k | ||
- 32x32 | ||
|
||
Audio Datasets | ||
~~~~~~~~~~~~~~ | ||
|
||
.. list-table:: Audio Datasets | ||
:widths: 35 30 15 | ||
:header-rows: 1 | ||
|
||
* - Name | ||
- Size | ||
- Subset | ||
* - `google/speech_commands <https://huggingface.co/datasets/google/speech_commands>`_ | ||
- train 64.7k | ||
- v0.01 | ||
* - `google/speech_commands <https://huggingface.co/datasets/google/speech_commands>`_ | ||
- train 105.8k | ||
- v0.02 | ||
* - `flwrlabs/ambient-acoustic-context <https://huggingface.co/datasets/flwrlabs/ambient-acoustic-context>`_ | ||
- train 70.3k | ||
- | ||
* - `fixie-ai/common_voice_17_0 <https://huggingface.co/datasets/fixie-ai/common_voice_17_0>`_ | ||
- varies | ||
- 14 versions | ||
* - `fixie-ai/librispeech_asr <https://huggingface.co/datasets/fixie-ai/librispeech_asr>`_ | ||
- varies | ||
- clean/other | ||
|
||
Tabular Datasets | ||
~~~~~~~~~~~~~~~~ | ||
|
||
|
||
.. list-table:: Tabular Datasets | ||
:widths: 35 30 | ||
:header-rows: 1 | ||
|
||
* - Name | ||
- Size | ||
* - `scikit-learn/adult-census-income <https://huggingface.co/datasets/scikit-learn/adult-census-income>`_ | ||
- train 32.6k | ||
* - `jlh/uci-mushrooms <https://huggingface.co/datasets/jlh/uci-mushrooms>`_ | ||
- train 8.1k | ||
* - `scikit-learn/iris <https://huggingface.co/datasets/scikit-learn/iris>`_ | ||
- train 150 | ||
|
||
Text Datasets | ||
~~~~~~~~~~~~~ | ||
|
||
.. list-table:: Text Datasets | ||
:widths: 40 30 30 | ||
:header-rows: 1 | ||
|
||
* - Name | ||
- Size | ||
- Category | ||
* - `sentiment140 <https://huggingface.co/datasets/sentiment140>`_ | ||
- train 1.6M; | ||
test 0.5k | ||
- Sentiment | ||
* - `google-research-datasets/mbpp <https://huggingface.co/datasets/google-research-datasets/mbpp>`_ | ||
- full 974; sanitized 427 | ||
- General | ||
* - `openai/openai_humaneval <https://huggingface.co/datasets/openai/openai_humaneval>`_ | ||
- test 164 | ||
- General | ||
* - `lukaemon/mmlu <https://huggingface.co/datasets/lukaemon/mmlu>`_ | ||
- varies | ||
- General | ||
* - `takala/financial_phrasebank <https://huggingface.co/datasets/takala/financial_phrasebank>`_ | ||
- train 4.8k | ||
- Financial | ||
* - `pauri32/fiqa-2018 <https://huggingface.co/datasets/pauri32/fiqa-2018>`_ | ||
- train 0.9k; validation 0.1k; test 0.2k | ||
- Financial | ||
* - `zeroshot/twitter-financial-news-sentiment <https://huggingface.co/datasets/zeroshot/twitter-financial-news-sentiment>`_ | ||
- train 9.5k; validation 2.4k | ||
- Financial | ||
* - `bigbio/pubmed_qa <https://huggingface.co/datasets/bigbio/pubmed_qa>`_ | ||
- train 2M; validation 11k | ||
- Medical | ||
* - `openlifescienceai/medmcqa <https://huggingface.co/datasets/openlifescienceai/medmcqa>`_ | ||
- train 183k; validation 4.3k; test 6.2k | ||
- Medical | ||
* - `bigbio/med_qa <https://huggingface.co/datasets/bigbio/med_qa>`_ | ||
- train 10.1k; test 1.3k; validation 1.3k | ||
- Medical |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters