Merge branch 'main' into clarify-the-int-partitioners-fds-param

adap · Jul 25, 2024 · 3df9364 · 3df9364
2 parents 4881e7a + 42f5c8d
commit 3df9364
Show file tree

Hide file tree

Showing 9 changed files with 107 additions and 133 deletions.
diff --git a/datasets/doc/source/conf.py b/datasets/doc/source/conf.py
@@ -110,9 +110,9 @@ def find_test_modules(package_path):
 
 # Sphinx redirects, implemented after the doc filename changes.
 # To prevent 404 errors and redirect to the new pages.
-# redirects = {
-# }
-
+redirects = {
+    "how-to-visualize-label-distribution.html": "tutorial-visualize-label-distribution.html",
+}
 
 # -- Options for HTML output -------------------------------------------------
 
@@ -180,3 +180,7 @@ def find_test_modules(package_path):
 # -- Options for MyST config  -------------------------------------
 # Enable this option to link to headers (`#`, `##`, or `###`)
 myst_heading_anchors = 3
+
+# -- Options for sphinx_copybutton -------------------------------------
+copybutton_exclude = '.linenos, .gp, .go'
+copybutton_prompt_text = ">>> "
diff --git a/datasets/doc/source/index.rst b/datasets/doc/source/index.rst
@@ -1,8 +1,7 @@
 Flower Datasets
 ===============
 
-Flower Datasets (``flwr-datasets``) is a library to quickly and easily create datasets for federated
-learning/analytics/evaluation. It is created by the ``Flower Labs`` team that also created `Flower <https://flower.ai>`_ - a Friendly Federated Learning Framework.
+Flower Datasets (``flower-datasets ``) is a library that enables the quick and easy creation of datasets for federated learning/analytics/evaluation. It enables heterogeneity (non-iidness) simulation and division of datasets with the preexisting notion of IDs. The library was created by the ``Flower Labs`` team that also created `Flower <https://flower.ai>`_ : A Friendly Federated Learning Framework.
 
 Flower Datasets Framework
 -------------------------
@@ -26,6 +25,8 @@ A learning-oriented series of tutorials is the best place to start.
    :caption: Tutorial
 
    tutorial-quickstart
+   tutorial-use-partitioners
+   tutorial-visualize-label-distribution
 
 How-to guides
 ~~~~~~~~~~~~~
@@ -41,7 +42,6 @@ Problem-oriented how-to guides show step-by-step how to achieve a specific goal.
    how-to-use-with-tensorflow
    how-to-use-with-numpy
    how-to-use-with-local-data
-   how-to-visualize-label-distribution
    how-to-disable-enable-progress-bar
 
 References
@@ -67,17 +67,20 @@ Main features
 -------------
 Flower Datasets library supports:
 
-- **downloading datasets** - choose the dataset from Hugging Face's ``dataset`` (`link <https://huggingface.co/datasets>`_)
-- **partitioning datasets** - choose one of the implemented partitioning scheme or create your own.
-- **creating centralized datasets** - leave parts of the dataset unpartitioned (e.g. for centralized evaluation)
-- **visualization of the partitioned datasets** - visualize the label distribution of the partitioned dataset (and compare the results on different parameters of the same partitioning schemes, different datasets, different partitioning schemes, or any mix of them)
+- **Downloading datasets** - choose the dataset from Hugging Face's ``dataset`` (`link <https://huggingface.co/datasets>`_)(*)
+- **Partitioning datasets** - choose one of the implemented partitioning scheme or create your own.
+- **Creating centralized datasets** - leave parts of the dataset unpartitioned (e.g. for centralized evaluation)
+- **Visualization of the partitioned datasets** - visualize the label distribution of the partitioned dataset (and compare the results on different parameters of the same partitioning schemes, different datasets, different partitioning schemes, or any mix of them)
+
+.. note::
+
+  (*) Once the dataset is available on HuggingFace Hub it can be **immediately** used in ``Flower Datasets`` (no approval from the Flower team needed, no custom code needed).
 
 
 .. image:: ./_static/readme/comparison_of_partitioning_schemes.png
   :align: center
   :alt: Comparison of Partitioning Schemes on CIFAR10
 
-
 Thanks to using Hugging Face's ``datasets`` used under the hood, Flower Datasets integrates with the following popular formats/frameworks:
 
 - Hugging Face

diff --git a/datasets/doc/source/tutorial-quickstart.ipynb b/datasets/doc/source/tutorial-quickstart.ipynb
@@ -15,7 +15,7 @@
    "id": "e0f34a29f74b13cb",
    "metadata": {},
    "source": [
-    "# Install Flower Datasets"
+    "## Install Flower Datasets"
    ]
   },
   {
@@ -45,7 +45,7 @@
    "id": "499dd2f0d23d871e",
    "metadata": {},
    "source": [
-    "# Choose the dataset\n",
+    "## Choose the dataset\n",
     "\n",
     "To choose the dataset, go to Hugging Face [Datasets Hub](https://huggingface.co/datasets) and search for your dataset by name. You will pass that names to the `dataset` parameter of `FederatedDataset`. Note that the name is case-sensitive.\n",
     "\n",
@@ -79,7 +79,7 @@
    "id": "e0c146753048fb2a",
    "metadata": {},
    "source": [
-    "# Partition the dataset\n",
+    "## Partition the dataset\n",
     "\n",
     "To partition a dataset (in a basic scenario), you need to choose two things:\n",
     "1) A dataset (identified by a name),\n",
@@ -99,7 +99,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": null,
    "id": "a759c5b6f25c9dd4",
    "metadata": {},
    "outputs": [],
@@ -131,15 +131,15 @@
    "id": "efa7dbb120505f1f",
    "metadata": {},
    "source": [
-    "# Investigate the partition"
+    "## Investigate the partition"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "bf986a1a9f0284cd",
    "metadata": {},
    "source": [
-    "## Features\n",
+    "### Features\n",
     "\n",
     "Now we will determine the names of the features of your dataset (you can alternatively do that directly on the Hugging Face\n",
     "website). The names can vary along different datasets e.g. \"img\" or \"image\", \"label\" or \"labels\". Additionally, if the label column is of [ClassLabel](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.ClassLabel) type, we will also see the names of labels."
@@ -164,7 +164,7 @@
     }
    ],
    "source": [
-    "# Note this dataset has \n",
+    "# Note this dataset has\n",
     "partition.features"
    ]
   },
@@ -173,7 +173,7 @@
    "id": "2e69ed05193a098a",
    "metadata": {},
    "source": [
-    "## Indexing\n",
+    "### Indexing\n",
     "\n",
     "To see the first sample of the partition, we can index it like a Python list."
    ]
@@ -388,7 +388,7 @@
    "id": "b5e683cfaddf92f",
    "metadata": {},
    "source": [
-    "# Use with PyTorch/NumPy/TensorFlow\n",
+    "## Use with PyTorch/NumPy/TensorFlow\n",
     "\n",
     "For more detailed instructions, go to:\n",
     "* [how-to-use-with-pytorch](https://flower.ai/docs/datasets/how-to-use-with-pytorch.html)\n",
@@ -401,7 +401,7 @@
    "id": "de14f09f0ee4f6ac",
    "metadata": {},
    "source": [
-    "## PyTorch\n",
+    "### PyTorch\n",
     "\n",
     "Transform the `Dataset` into the `DataLoader`, use the `PyTorch transforms` (`Compose` and all the others are possible)."
    ]
@@ -444,7 +444,7 @@
    "id": "71531613",
    "metadata": {},
    "source": [
-    "## NumPy\n",
+    "### NumPy\n",
     "\n",
     "NumPy can be used as input to the TensorFlow and scikit-learn models. The transformation is very simple."
    ]
@@ -465,7 +465,7 @@
    "id": "e4867834",
    "metadata": {},
    "source": [
-    "## TensorFlow Dataset\n",
+    "### TensorFlow Dataset\n",
     "\n",
     "Transformation to TensorFlow Dataset is a one-liner."
    ]
@@ -497,32 +497,23 @@
    "id": "61fd797c",
    "metadata": {},
    "source": [
-    "# Final remarks"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "91ad1252",
-   "metadata": {},
-   "source": [
+    "## Final remarks\n",
+    "\n",
     "Congratulations, you now know the basics of Flower Datasets and are ready to perform basic dataset preparation for Federated Learning."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ade71d23",
-   "metadata": {},
-   "source": [
-    "# Next Steps"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f54d8031",
+   "id": "cbdfe1b5",
    "metadata": {},
    "source": [
+    "## Next \n",
+    "\n",
     "This is the first quickstart tutorial from the Flower Datasets series. See other tutorials:\n",
-    "* [Visualize Label Distribution](https://flower.ai/docs/datasets/how-to-visualize-label-distribution.html)"
+    "\n",
+    "* [Use Partitioners](https://flower.ai/docs/datasets/tutorial-use-partitioners.html)\n",
+    "\n",
+    "* [Visualize Label Distribution](https://flower.ai/docs/datasets/tutorial-visualize-label-distribution.html)"
    ]
   }
  ],
@@ -531,18 +522,6 @@
    "display_name": "flwr",
    "language": "python",
    "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.10.12"
   }
  },
  "nbformat": 4,