docs(datasets) Add recommended datasets list to index (#4627)

Co-authored-by: jafermarq <[email protected]>
adap · Dec 4, 2024 · d8b00b3 · d8b00b3
1 parent 93d6ae4
commit d8b00b3
Show file tree

Hide file tree

Showing 3 changed files with 169 additions and 153 deletions.
diff --git a/datasets/doc/source/index.rst b/datasets/doc/source/index.rst
@@ -154,6 +154,20 @@ The Flower Community is growing quickly - we're a friendly group of researchers,
 
     Join us on Slack
 
+Recommended FL Datasets
+-----------------------
+
+Below we present a list of recommended datasets for federated learning research, which can be
+used with Flower Datasets ``flwr-datasets``.
+
+.. note::
+
+    All datasets from `HuggingFace Hub <https://huggingface.co/datasets>`_ can be used with our library. This page presents just a set of datasets we collected that you might find useful.
+
+For more information about any dataset, visit its page by clicking the dataset name. 
+
+.. include:: recommended-fl-datasets-tables.rst
+
 .. _demo:
 Demo
 ----

diff --git a/datasets/doc/source/recommended-fl-datasets-tables.rst b/datasets/doc/source/recommended-fl-datasets-tables.rst
@@ -0,0 +1,153 @@
+Image Datasets
+~~~~~~~~~~~~~~
+
+.. list-table:: Image Datasets
+   :widths: 40 40 20
+   :header-rows: 1
+
+   * - Name
+     - Size
+     - Image Shape
+   * - `ylecun/mnist <https://huggingface.co/datasets/ylecun/mnist>`_
+     - train 60k;  
+       test 10k
+     - 28x28
+   * - `uoft-cs/cifar10 <https://huggingface.co/datasets/uoft-cs/cifar10>`_
+     - train 50k;  
+       test 10k
+     - 32x32x3
+   * - `uoft-cs/cifar100 <https://huggingface.co/datasets/uoft-cs/cifar100>`_
+     - train 50k;  
+       test 10k
+     - 32x32x3
+   * - `zalando-datasets/fashion_mnist <https://huggingface.co/datasets/zalando-datasets/fashion_mnist>`_
+     - train 60k;  
+       test 10k
+     - 28x28
+   * - `flwrlabs/femnist <https://huggingface.co/datasets/flwrlabs/femnist>`_
+     - train 814k
+     - 28x28
+   * - `zh-plus/tiny-imagenet <https://huggingface.co/datasets/zh-plus/tiny-imagenet>`_
+     - train 100k;  
+       valid 10k
+     - 64x64x3
+   * - `flwrlabs/usps <https://huggingface.co/datasets/flwrlabs/usps>`_
+     - train 7.3k;  
+       test 2k
+     - 16x16
+   * - `flwrlabs/pacs <https://huggingface.co/datasets/flwrlabs/pacs>`_
+     - train 10k
+     - 227x227
+   * - `flwrlabs/cinic10 <https://huggingface.co/datasets/flwrlabs/cinic10>`_
+     - train 90k;  
+       valid 90k;  
+       test 90k
+     - 32x32x3
+   * - `flwrlabs/caltech101 <https://huggingface.co/datasets/flwrlabs/caltech101>`_
+     - train 8.7k
+     - varies
+   * - `flwrlabs/office-home <https://huggingface.co/datasets/flwrlabs/office-home>`_
+     - train 15.6k
+     - varies
+   * - `flwrlabs/fed-isic2019 <https://huggingface.co/datasets/flwrlabs/fed-isic2019>`_
+     - train 18.6k;  
+       test 4.7k
+     - varies
+   * - `ufldl-stanford/svhn <https://huggingface.co/datasets/ufldl-stanford/svhn>`_
+     - train 73.3k;  
+       test 26k;  
+       extra 531k
+     - 32x32x3
+   * - `sasha/dog-food <https://huggingface.co/datasets/sasha/dog-food>`_
+     - train 2.1k;  
+       test 0.9k
+     - varies
+   * - `Mike0307/MNIST-M <https://huggingface.co/datasets/Mike0307/MNIST-M>`_
+     - train 59k;  
+       test 9k
+     - 32x32
+
+Audio Datasets
+~~~~~~~~~~~~~~
+
+.. list-table:: Audio Datasets
+   :widths: 35 30 15
+   :header-rows: 1
+
+   * - Name
+     - Size
+     - Subset
+   * - `google/speech_commands <https://huggingface.co/datasets/google/speech_commands>`_
+     - train 64.7k
+     - v0.01
+   * - `google/speech_commands <https://huggingface.co/datasets/google/speech_commands>`_
+     - train 105.8k
+     - v0.02
+   * - `flwrlabs/ambient-acoustic-context <https://huggingface.co/datasets/flwrlabs/ambient-acoustic-context>`_
+     - train 70.3k
+     - 
+   * - `fixie-ai/common_voice_17_0 <https://huggingface.co/datasets/fixie-ai/common_voice_17_0>`_
+     - varies
+     - 14 versions
+   * - `fixie-ai/librispeech_asr <https://huggingface.co/datasets/fixie-ai/librispeech_asr>`_
+     - varies
+     - clean/other
+
+Tabular Datasets
+~~~~~~~~~~~~~~~~
+
+
+.. list-table:: Tabular Datasets
+   :widths: 35 30
+   :header-rows: 1
+
+   * - Name
+     - Size
+   * - `scikit-learn/adult-census-income <https://huggingface.co/datasets/scikit-learn/adult-census-income>`_
+     - train 32.6k
+   * - `jlh/uci-mushrooms <https://huggingface.co/datasets/jlh/uci-mushrooms>`_
+     - train 8.1k
+   * - `scikit-learn/iris <https://huggingface.co/datasets/scikit-learn/iris>`_
+     - train 150
+
+Text Datasets
+~~~~~~~~~~~~~
+
+.. list-table:: Text Datasets
+   :widths: 40 30 30
+   :header-rows: 1
+
+   * - Name
+     - Size
+     - Category
+   * - `sentiment140 <https://huggingface.co/datasets/sentiment140>`_
+     - train 1.6M;  
+       test 0.5k
+     - Sentiment
+   * - `google-research-datasets/mbpp <https://huggingface.co/datasets/google-research-datasets/mbpp>`_
+     - full 974; sanitized 427
+     - General
+   * - `openai/openai_humaneval <https://huggingface.co/datasets/openai/openai_humaneval>`_
+     - test 164
+     - General
+   * - `lukaemon/mmlu <https://huggingface.co/datasets/lukaemon/mmlu>`_
+     - varies
+     - General
+   * - `takala/financial_phrasebank <https://huggingface.co/datasets/takala/financial_phrasebank>`_
+     - train 4.8k
+     - Financial
+   * - `pauri32/fiqa-2018 <https://huggingface.co/datasets/pauri32/fiqa-2018>`_
+     - train 0.9k; validation 0.1k; test 0.2k
+     - Financial
+   * - `zeroshot/twitter-financial-news-sentiment <https://huggingface.co/datasets/zeroshot/twitter-financial-news-sentiment>`_
+     - train 9.5k; validation 2.4k
+     - Financial
+   * - `bigbio/pubmed_qa <https://huggingface.co/datasets/bigbio/pubmed_qa>`_
+     - train 2M; validation 11k
+     - Medical
+   * - `openlifescienceai/medmcqa <https://huggingface.co/datasets/openlifescienceai/medmcqa>`_
+     - train 183k; validation 4.3k; test 6.2k
+     - Medical
+   * - `bigbio/med_qa <https://huggingface.co/datasets/bigbio/med_qa>`_
+     - train 10.1k; test 1.3k; validation 1.3k
+     - Medical
diff --git a/datasets/doc/source/recommended-fl-datasets.rst b/datasets/doc/source/recommended-fl-datasets.rst
@@ -11,157 +11,6 @@ see the full FL example with Flower and Flower Datasets open the `quickstart-pyt
 
     All datasets from `HuggingFace Hub <https://huggingface.co/datasets>`_ can be used with our library. This page presents just a set of datasets we collected that you might find useful.
 
-For more information about any dataset, visit its page by clicking the dataset name. For more information how to use the
+For more information about any dataset, visit its page by clicking the dataset name. 
 
-Image Datasets
---------------
-
-.. list-table:: Image Datasets
-   :widths: 40 40 20
-   :header-rows: 1
-
-   * - Name
-     - Size
-     - Image Shape
-   * - `ylecun/mnist <https://huggingface.co/datasets/ylecun/mnist>`_
-     - train 60k;  
-       test 10k
-     - 28x28
-   * - `uoft-cs/cifar10 <https://huggingface.co/datasets/uoft-cs/cifar10>`_
-     - train 50k;  
-       test 10k
-     - 32x32x3
-   * - `uoft-cs/cifar100 <https://huggingface.co/datasets/uoft-cs/cifar100>`_
-     - train 50k;  
-       test 10k
-     - 32x32x3
-   * - `zalando-datasets/fashion_mnist <https://huggingface.co/datasets/zalando-datasets/fashion_mnist>`_
-     - train 60k;  
-       test 10k
-     - 28x28
-   * - `flwrlabs/femnist <https://huggingface.co/datasets/flwrlabs/femnist>`_
-     - train 814k
-     - 28x28
-   * - `zh-plus/tiny-imagenet <https://huggingface.co/datasets/zh-plus/tiny-imagenet>`_
-     - train 100k;  
-       valid 10k
-     - 64x64x3
-   * - `flwrlabs/usps <https://huggingface.co/datasets/flwrlabs/usps>`_
-     - train 7.3k;  
-       test 2k
-     - 16x16
-   * - `flwrlabs/pacs <https://huggingface.co/datasets/flwrlabs/pacs>`_
-     - train 10k
-     - 227x227
-   * - `flwrlabs/cinic10 <https://huggingface.co/datasets/flwrlabs/cinic10>`_
-     - train 90k;  
-       valid 90k;  
-       test 90k
-     - 32x32x3
-   * - `flwrlabs/caltech101 <https://huggingface.co/datasets/flwrlabs/caltech101>`_
-     - train 8.7k
-     - varies
-   * - `flwrlabs/office-home <https://huggingface.co/datasets/flwrlabs/office-home>`_
-     - train 15.6k
-     - varies
-   * - `flwrlabs/fed-isic2019 <https://huggingface.co/datasets/flwrlabs/fed-isic2019>`_
-     - train 18.6k;  
-       test 4.7k
-     - varies
-   * - `ufldl-stanford/svhn <https://huggingface.co/datasets/ufldl-stanford/svhn>`_
-     - train 73.3k;  
-       test 26k;  
-       extra 531k
-     - 32x32x3
-   * - `sasha/dog-food <https://huggingface.co/datasets/sasha/dog-food>`_
-     - train 2.1k;  
-       test 0.9k
-     - varies
-   * - `Mike0307/MNIST-M <https://huggingface.co/datasets/Mike0307/MNIST-M>`_
-     - train 59k;  
-       test 9k
-     - 32x32
-
-Audio Datasets
---------------
-
-.. list-table:: Audio Datasets
-   :widths: 35 30 15
-   :header-rows: 1
-
-   * - Name
-     - Size
-     - Subset
-   * - `google/speech_commands <https://huggingface.co/datasets/google/speech_commands>`_
-     - train 64.7k
-     - v0.01
-   * - `google/speech_commands <https://huggingface.co/datasets/google/speech_commands>`_
-     - train 105.8k
-     - v0.02
-   * - `flwrlabs/ambient-acoustic-context <https://huggingface.co/datasets/flwrlabs/ambient-acoustic-context>`_
-     - train 70.3k
-     - 
-   * - `fixie-ai/common_voice_17_0 <https://huggingface.co/datasets/fixie-ai/common_voice_17_0>`_
-     - varies
-     - 14 versions
-   * - `fixie-ai/librispeech_asr <https://huggingface.co/datasets/fixie-ai/librispeech_asr>`_
-     - varies
-     - clean/other
-
-Tabular Datasets
-----------------
-
-.. list-table:: Tabular Datasets
-   :widths: 35 30
-   :header-rows: 1
-
-   * - Name
-     - Size
-   * - `scikit-learn/adult-census-income <https://huggingface.co/datasets/scikit-learn/adult-census-income>`_
-     - train 32.6k
-   * - `jlh/uci-mushrooms <https://huggingface.co/datasets/jlh/uci-mushrooms>`_
-     - train 8.1k
-   * - `scikit-learn/iris <https://huggingface.co/datasets/scikit-learn/iris>`_
-     - train 150
-
-Text Datasets
--------------
-
-.. list-table:: Text Datasets
-   :widths: 40 30 30
-   :header-rows: 1
-
-   * - Name
-     - Size
-     - Category
-   * - `sentiment140 <https://huggingface.co/datasets/sentiment140>`_
-     - train 1.6M;  
-       test 0.5k
-     - Sentiment
-   * - `google-research-datasets/mbpp <https://huggingface.co/datasets/google-research-datasets/mbpp>`_
-     - full 974; sanitized 427
-     - General
-   * - `openai/openai_humaneval <https://huggingface.co/datasets/openai/openai_humaneval>`_
-     - test 164
-     - General
-   * - `lukaemon/mmlu <https://huggingface.co/datasets/lukaemon/mmlu>`_
-     - varies
-     - General
-   * - `takala/financial_phrasebank <https://huggingface.co/datasets/takala/financial_phrasebank>`_
-     - train 4.8k
-     - Financial
-   * - `pauri32/fiqa-2018 <https://huggingface.co/datasets/pauri32/fiqa-2018>`_
-     - train 0.9k; validation 0.1k; test 0.2k
-     - Financial
-   * - `zeroshot/twitter-financial-news-sentiment <https://huggingface.co/datasets/zeroshot/twitter-financial-news-sentiment>`_
-     - train 9.5k; validation 2.4k
-     - Financial
-   * - `bigbio/pubmed_qa <https://huggingface.co/datasets/bigbio/pubmed_qa>`_
-     - train 2M; validation 11k
-     - Medical
-   * - `openlifescienceai/medmcqa <https://huggingface.co/datasets/openlifescienceai/medmcqa>`_
-     - train 183k; validation 4.3k; test 6.2k
-     - Medical
-   * - `bigbio/med_qa <https://huggingface.co/datasets/bigbio/med_qa>`_
-     - train 10.1k; test 1.3k; validation 1.3k
-     - Medical
+.. include:: recommended-fl-datasets-tables.rst