Skip to content

Commit

Permalink
Merge branch 'main' into clarify-the-int-partitioners-fds-param
Browse files Browse the repository at this point in the history
  • Loading branch information
jafermarq authored Jul 25, 2024
2 parents 4881e7a + 42f5c8d commit 3df9364
Show file tree
Hide file tree
Showing 9 changed files with 107 additions and 133 deletions.
10 changes: 7 additions & 3 deletions datasets/doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,9 +110,9 @@ def find_test_modules(package_path):

# Sphinx redirects, implemented after the doc filename changes.
# To prevent 404 errors and redirect to the new pages.
# redirects = {
# }

redirects = {
"how-to-visualize-label-distribution.html": "tutorial-visualize-label-distribution.html",
}

# -- Options for HTML output -------------------------------------------------

Expand Down Expand Up @@ -180,3 +180,7 @@ def find_test_modules(package_path):
# -- Options for MyST config -------------------------------------
# Enable this option to link to headers (`#`, `##`, or `###`)
myst_heading_anchors = 3

# -- Options for sphinx_copybutton -------------------------------------
copybutton_exclude = '.linenos, .gp, .go'
copybutton_prompt_text = ">>> "
19 changes: 11 additions & 8 deletions datasets/doc/source/index.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
Flower Datasets
===============

Flower Datasets (``flwr-datasets``) is a library to quickly and easily create datasets for federated
learning/analytics/evaluation. It is created by the ``Flower Labs`` team that also created `Flower <https://flower.ai>`_ - a Friendly Federated Learning Framework.
Flower Datasets (``flower-datasets ``) is a library that enables the quick and easy creation of datasets for federated learning/analytics/evaluation. It enables heterogeneity (non-iidness) simulation and division of datasets with the preexisting notion of IDs. The library was created by the ``Flower Labs`` team that also created `Flower <https://flower.ai>`_ : A Friendly Federated Learning Framework.

Flower Datasets Framework
-------------------------
Expand All @@ -26,6 +25,8 @@ A learning-oriented series of tutorials is the best place to start.
:caption: Tutorial

tutorial-quickstart
tutorial-use-partitioners
tutorial-visualize-label-distribution

How-to guides
~~~~~~~~~~~~~
Expand All @@ -41,7 +42,6 @@ Problem-oriented how-to guides show step-by-step how to achieve a specific goal.
how-to-use-with-tensorflow
how-to-use-with-numpy
how-to-use-with-local-data
how-to-visualize-label-distribution
how-to-disable-enable-progress-bar

References
Expand All @@ -67,17 +67,20 @@ Main features
-------------
Flower Datasets library supports:

- **downloading datasets** - choose the dataset from Hugging Face's ``dataset`` (`link <https://huggingface.co/datasets>`_)
- **partitioning datasets** - choose one of the implemented partitioning scheme or create your own.
- **creating centralized datasets** - leave parts of the dataset unpartitioned (e.g. for centralized evaluation)
- **visualization of the partitioned datasets** - visualize the label distribution of the partitioned dataset (and compare the results on different parameters of the same partitioning schemes, different datasets, different partitioning schemes, or any mix of them)
- **Downloading datasets** - choose the dataset from Hugging Face's ``dataset`` (`link <https://huggingface.co/datasets>`_)(*)
- **Partitioning datasets** - choose one of the implemented partitioning scheme or create your own.
- **Creating centralized datasets** - leave parts of the dataset unpartitioned (e.g. for centralized evaluation)
- **Visualization of the partitioned datasets** - visualize the label distribution of the partitioned dataset (and compare the results on different parameters of the same partitioning schemes, different datasets, different partitioning schemes, or any mix of them)

.. note::

(*) Once the dataset is available on HuggingFace Hub it can be **immediately** used in ``Flower Datasets`` (no approval from the Flower team needed, no custom code needed).


.. image:: ./_static/readme/comparison_of_partitioning_schemes.png
:align: center
:alt: Comparison of Partitioning Schemes on CIFAR10


Thanks to using Hugging Face's ``datasets`` used under the hood, Flower Datasets integrates with the following popular formats/frameworks:

- Hugging Face
Expand Down
63 changes: 21 additions & 42 deletions datasets/doc/source/tutorial-quickstart.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
"id": "e0f34a29f74b13cb",
"metadata": {},
"source": [
"# Install Flower Datasets"
"## Install Flower Datasets"
]
},
{
Expand Down Expand Up @@ -45,7 +45,7 @@
"id": "499dd2f0d23d871e",
"metadata": {},
"source": [
"# Choose the dataset\n",
"## Choose the dataset\n",
"\n",
"To choose the dataset, go to Hugging Face [Datasets Hub](https://huggingface.co/datasets) and search for your dataset by name. You will pass that names to the `dataset` parameter of `FederatedDataset`. Note that the name is case-sensitive.\n",
"\n",
Expand Down Expand Up @@ -79,7 +79,7 @@
"id": "e0c146753048fb2a",
"metadata": {},
"source": [
"# Partition the dataset\n",
"## Partition the dataset\n",
"\n",
"To partition a dataset (in a basic scenario), you need to choose two things:\n",
"1) A dataset (identified by a name),\n",
Expand All @@ -99,7 +99,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": null,
"id": "a759c5b6f25c9dd4",
"metadata": {},
"outputs": [],
Expand Down Expand Up @@ -131,15 +131,15 @@
"id": "efa7dbb120505f1f",
"metadata": {},
"source": [
"# Investigate the partition"
"## Investigate the partition"
]
},
{
"cell_type": "markdown",
"id": "bf986a1a9f0284cd",
"metadata": {},
"source": [
"## Features\n",
"### Features\n",
"\n",
"Now we will determine the names of the features of your dataset (you can alternatively do that directly on the Hugging Face\n",
"website). The names can vary along different datasets e.g. \"img\" or \"image\", \"label\" or \"labels\". Additionally, if the label column is of [ClassLabel](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.ClassLabel) type, we will also see the names of labels."
Expand All @@ -164,7 +164,7 @@
}
],
"source": [
"# Note this dataset has \n",
"# Note this dataset has\n",
"partition.features"
]
},
Expand All @@ -173,7 +173,7 @@
"id": "2e69ed05193a098a",
"metadata": {},
"source": [
"## Indexing\n",
"### Indexing\n",
"\n",
"To see the first sample of the partition, we can index it like a Python list."
]
Expand Down Expand Up @@ -388,7 +388,7 @@
"id": "b5e683cfaddf92f",
"metadata": {},
"source": [
"# Use with PyTorch/NumPy/TensorFlow\n",
"## Use with PyTorch/NumPy/TensorFlow\n",
"\n",
"For more detailed instructions, go to:\n",
"* [how-to-use-with-pytorch](https://flower.ai/docs/datasets/how-to-use-with-pytorch.html)\n",
Expand All @@ -401,7 +401,7 @@
"id": "de14f09f0ee4f6ac",
"metadata": {},
"source": [
"## PyTorch\n",
"### PyTorch\n",
"\n",
"Transform the `Dataset` into the `DataLoader`, use the `PyTorch transforms` (`Compose` and all the others are possible)."
]
Expand Down Expand Up @@ -444,7 +444,7 @@
"id": "71531613",
"metadata": {},
"source": [
"## NumPy\n",
"### NumPy\n",
"\n",
"NumPy can be used as input to the TensorFlow and scikit-learn models. The transformation is very simple."
]
Expand All @@ -465,7 +465,7 @@
"id": "e4867834",
"metadata": {},
"source": [
"## TensorFlow Dataset\n",
"### TensorFlow Dataset\n",
"\n",
"Transformation to TensorFlow Dataset is a one-liner."
]
Expand Down Expand Up @@ -497,32 +497,23 @@
"id": "61fd797c",
"metadata": {},
"source": [
"# Final remarks"
]
},
{
"cell_type": "markdown",
"id": "91ad1252",
"metadata": {},
"source": [
"## Final remarks\n",
"\n",
"Congratulations, you now know the basics of Flower Datasets and are ready to perform basic dataset preparation for Federated Learning."
]
},
{
"cell_type": "markdown",
"id": "ade71d23",
"metadata": {},
"source": [
"# Next Steps"
]
},
{
"cell_type": "markdown",
"id": "f54d8031",
"id": "cbdfe1b5",
"metadata": {},
"source": [
"## Next \n",
"\n",
"This is the first quickstart tutorial from the Flower Datasets series. See other tutorials:\n",
"* [Visualize Label Distribution](https://flower.ai/docs/datasets/how-to-visualize-label-distribution.html)"
"\n",
"* [Use Partitioners](https://flower.ai/docs/datasets/tutorial-use-partitioners.html)\n",
"\n",
"* [Visualize Label Distribution](https://flower.ai/docs/datasets/tutorial-visualize-label-distribution.html)"
]
}
],
Expand All @@ -531,18 +522,6 @@
"display_name": "flwr",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
Expand Down
Loading

0 comments on commit 3df9364

Please sign in to comment.