diff --git a/.github/workflows/e2e.yml b/.github/workflows/e2e.yml index 241ec8057c7c..f5ed1d99012a 100644 --- a/.github/workflows/e2e.yml +++ b/.github/workflows/e2e.yml @@ -213,7 +213,7 @@ jobs: needs: wheel strategy: matrix: - framework: ["numpy", "pytorch", "tensorflow", "hf", "jax", "sklearn"] + framework: ["numpy", "pytorch", "tensorflow", "huggingface", "jax", "sklearn"] name: Template / ${{ matrix.framework }} diff --git a/datasets/doc/source/_static/tutorial-quickstart/choose-hf-dataset.png b/datasets/doc/source/_static/tutorial-quickstart/choose-hf-dataset.png new file mode 100644 index 000000000000..ffce2008e178 Binary files /dev/null and b/datasets/doc/source/_static/tutorial-quickstart/choose-hf-dataset.png differ diff --git a/datasets/doc/source/_static/tutorial-quickstart/copy-dataset-name.png b/datasets/doc/source/_static/tutorial-quickstart/copy-dataset-name.png new file mode 100644 index 000000000000..df6deb7cc997 Binary files /dev/null and b/datasets/doc/source/_static/tutorial-quickstart/copy-dataset-name.png differ diff --git a/datasets/doc/source/_static/tutorial-quickstart/partitioner-flexibility.png b/datasets/doc/source/_static/tutorial-quickstart/partitioner-flexibility.png new file mode 100644 index 000000000000..53148d6360f8 Binary files /dev/null and b/datasets/doc/source/_static/tutorial-quickstart/partitioner-flexibility.png differ diff --git a/datasets/doc/source/how-to-use-partitioners.ipynb b/datasets/doc/source/how-to-use-partitioners.ipynb new file mode 100644 index 000000000000..4621fdee15ea --- /dev/null +++ b/datasets/doc/source/how-to-use-partitioners.ipynb @@ -0,0 +1,373 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# How to use `Partitioner`s\n", + "\n", + "The aim of this tutorial is to make you familiar with the available `Partitioner`s that `Flower Datasets` have out-of-the-box." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Install" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "! pip install -q \"flwr-datasets[vision]\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# What is `Partitioner`?\n", + "\n", + "`Partitioner` is an object responsible for dividing a dataset according to a chosen strategy. There are many `Partitioner`s that you can use (see the full list [here](https://flower.ai/docs/datasets/ref-api/flwr_datasets.partitioner.html)) and all of them inherit from the `Partitioner` object which is an abstract class providing basic structure and methods that need to be implemented for any new `Partitioner` to integrate with the rest of `Flower Datasets` code. The creation of different `Partitioner` differs, but the behavior is the same = they produce the same type of objects.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## `IidPartitioner` Creation\n", + "\n", + "Let's create (instantiate) the most basic partitioner, [`IidPartitioner`](https://flower.ai/docs/datasets/ref-api/flwr_datasets.partitioner.IidPartitioner.html#flwr_datasets.partitioner.IidPartitioner) and learn how it interacts with `FederatedDataset`." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "from flwr_datasets.partitioner import IidPartitioner\n", + "\n", + "# Set the partitioner to create 10 partitions\n", + "partitioner = IidPartitioner(num_partitions=10)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Right now the partitioner does not have access to any data therefore it has nothing to partition. `FederatedDataset` is responsible for assigning data to a `partitioner`(s).\n", + "\n", + "What **part** of the data is assigned to partitioner?\n", + "\n", + "In centralized (traditional) ML, there exist a strong concept of the splits of the dataset. Typically you can hear about train/valid/test splits. In FL research, if we don't have an already divided datasets (e.g. by `user_id`), we simulate such division using a centralized dataset. The goal of that operation is to simulate an FL scenario where the data is spread across clients. In Flower Datasets you decide what split of the dataset will be partitioned. You can also resplit the datasets such that you use a more non-custom split, or merge the whole train and test split into a single dataset. That's not a part of this tutorial (if you are curious how to do that see [Divider docs](https://flower.ai/docs/datasets/ref-api/flwr_datasets.preprocessor.Divider.html), [Merger docs](https://flower.ai/docs/datasets/ref-api/flwr_datasets.preprocessor.Merger.html) and `preprocessor` parameter docs of [FederatedDataset](https://flower.ai/docs/datasets/ref-api/flwr_datasets.FederatedDataset.html)).\n", + "\n", + "Let's see how you specify the split for partitioning." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## How do you specify the split to partition?\n", + "\n", + "The specification of the split happens as you specify the `partitioners` argument for `FederatedDataset`. It maps `partition_id: str` to the partitioner that will be used for that split of the data. In the example below we're using the `train` split of the `cifar10` dataset to partition.\n", + "\n", + "> If you're unsure why/how we chose the name of the `dataset` and how to customize it, see the [first tutorial]((https://flower.ai/docs/datasets/quickstart-tutorial.html))." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Dataset({\n", + " features: ['img', 'label'],\n", + " num_rows: 5000\n", + "})" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from flwr_datasets import FederatedDataset\n", + "\n", + "# Create the federated dataset passing the partitioner\n", + "fds = FederatedDataset(dataset=\"uoft-cs/cifar10\", partitioners={\"train\": partitioner})\n", + "\n", + "# Load the first partition\n", + "iid_partition = fds.load_partition(partition_id=0)\n", + "iid_partition" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'img': [,\n", + " ,\n", + " ],\n", + " 'label': [1, 2, 6]}" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Let's take a look at the first three samples\n", + "iid_partition[:3]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Use Different `Partitioners`" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Why would you need to use different `Partitioner`s?**\n", + "\n", + "There are a few ways that the data partitioning is simulated in the literature. `Flower Datasets` let's you work with many different approaches that have been proposed so far. It enables you to simulate partitions with different properties and different levels of heterogeneity and use those settings to evaluate your Federated Learning algorithms.\n", + "\n", + "\n", + "**How to use different `Partitioner`s?**\n", + "\n", + "To use a different `Partitioner` you just need to create a different object (note it has typically different parameters that you need to specify). Then you pass it as before to the `FederatedDataset`.\n", + "\n", + "
\n", + " \"Partitioner\n", + "
\n", + "See the only changing part in yellow.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Creating non-IID partitions: Use ``PathologicalPartitioner``\n", + "\n", + "Now, we are going to create partitions that have only a subset of labels in each partition by using [`PathologicalPartitioner`](https://flower.ai/docs/datasets/ref-api/flwr_datasets.partitioner.PathologicalPartitioner.html#flwr_datasets.partitioner.PathologicalPartitioner). In this scenario we have the exact control about the number of unique labels on each partition. The smaller the number is the more heterogenous the division gets. Let's have a look at how it works with `num_classes_per_partition=2`." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Dataset({\n", + " features: ['img', 'label'],\n", + " num_rows: 2501\n", + "})" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from flwr_datasets.partitioner import PathologicalPartitioner\n", + "\n", + "# Set the partitioner to create 10 partitions with 2 classes per partition\n", + "# Partition using column \"label\" (a column in the huggingface representation of CIFAR-10)\n", + "pathological_partitioner = PathologicalPartitioner(\n", + " num_partitions=10, partition_by=\"label\", num_classes_per_partition=2\n", + ")\n", + "\n", + "# Create the federated dataset passing the partitioner\n", + "fds = FederatedDataset(\n", + " dataset=\"uoft-cs/cifar10\", partitioners={\"train\": pathological_partitioner}\n", + ")\n", + "\n", + "# Load the first partition\n", + "partition_pathological = fds.load_partition(partition_id=0)\n", + "partition_pathological" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'img': [,\n", + " ,\n", + " ],\n", + " 'label': [0, 0, 7]}" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Let's take a look at the first three samples\n", + "partition_pathological[:3]" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([0, 7])" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import numpy as np\n", + "\n", + "# We can use `np.unique` to get a list of the unique labels that are present\n", + "# in this data partition. As expected, there are just two labels. This means\n", + "# that this partition has only images with numbers 0 and 7.\n", + "np.unique(partition_pathological[\"label\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Creating non-IID partitions: Use ``DirichletPartitioner``\n", + "\n", + "With the [`DirichletParitioner`](https://flower.ai/docs/datasets/ref-api/flwr_datasets.partitioner.DirichletPartitioner.html#flwr_datasets.partitioner.DirichletPartitioner), the primary tool for controlling heterogeneity is the `alpha` parameter; the smaller the value gets, the more heterogeneous the federated datasets are. Instead of choosing the exact number of classes on each partition, here we sample the probability distribution from the Dirichlet distribution, which tells how the samples associated with each class will be divided." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Dataset({\n", + " features: ['img', 'label'],\n", + " num_rows: 5433\n", + "})" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from flwr_datasets.partitioner import DirichletPartitioner\n", + "\n", + "# Set the partitioner to create 10 partitions with alpha 0.1 (so fairly non-IID)\n", + "# Partition using column \"label\" (a column in the huggingface representation of CIFAR-10)\n", + "dirichlet_partitioner = DirichletPartitioner(num_partitions=10, alpha=0.1, partition_by=\"label\")\n", + "\n", + "# Create the federated dataset passing the partitioner\n", + "fds = FederatedDataset(\n", + " dataset=\"uoft-cs/cifar10\", partitioners={\"train\": dirichlet_partitioner}\n", + ")\n", + "\n", + "# Load the first partition\n", + "partition_from_dirichlet = fds.load_partition(partition_id=0)\n", + "partition_from_dirichlet" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'img': [,\n", + " ,\n", + " ,\n", + " ,\n", + " ],\n", + " 'label': [4, 4, 0, 1, 4]}" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Let's take a look at the first five samples\n", + "partition_from_dirichlet[:5]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Final remarks\n", + "Congratulations, you now know how to use different `Partitioner`s with `FederatedDataset` in Flower Datasets.\n", + "\n", + "# Next Steps\n", + "This is the second quickstart tutorial from the Flower Datasets series. See next tutorials:\n", + "\n", + "* [Visualize Label Distribution](https://flower.ai/docs/datasets/how-to-visualize-label-distribution.html)\n", + "\n", + "Previous tutorials:\n", + "* [Quickstart Basics](https://flower.ai/docs/datasets/quickstart-tutorial.html)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "flwr", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/datasets/doc/source/tutorial-quickstart.ipynb b/datasets/doc/source/tutorial-quickstart.ipynb new file mode 100644 index 000000000000..d8bc49102a7a --- /dev/null +++ b/datasets/doc/source/tutorial-quickstart.ipynb @@ -0,0 +1,550 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "79c178bc47ac1b2f", + "metadata": {}, + "source": [ + "# Quickstart\n", + "\n", + "Start with `Flower Datasets` as fast as possible by learning the essentials." + ] + }, + { + "cell_type": "markdown", + "id": "e0f34a29f74b13cb", + "metadata": {}, + "source": [ + "# Install Flower Datasets" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "initial_id", + "metadata": {}, + "outputs": [], + "source": [ + "! pip install -q \"flwr-datasets[vision]\"" + ] + }, + { + "cell_type": "markdown", + "id": "f19a191a", + "metadata": {}, + "source": [ + "If you want to use audio datasets install:\n", + "\n", + "```bash\n", + "! pip install -q \"flwr-datasets[audio]\"\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "499dd2f0d23d871e", + "metadata": {}, + "source": [ + "# Choose the dataset\n", + "\n", + "To choose the dataset, go to Hugging Face [Datasets Hub](https://huggingface.co/datasets) and search for your dataset by name. You will pass that names to the `dataset` parameter of `FederatedDataset`. Note that the name is case-sensitive.\n", + "\n", + "
\n", + " \"Choose\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "a9d449e6", + "metadata": {}, + "source": [ + "Note that once the dataset is available on HuggingFace Hub it can be immediately used in `Flower Datasets` (no approval from Flower team is needed, no custom code needed). \n", + "\n", + "Here is how it looks for `CIFAR10` dataset." + ] + }, + { + "cell_type": "markdown", + "id": "b7d66b23efb1a289", + "metadata": {}, + "source": [ + "
\n", + " \"Choose\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "e0c146753048fb2a", + "metadata": {}, + "source": [ + "# Partition the dataset\n", + "\n", + "To partition a dataset (in a basic scenario), you need to choose two things:\n", + "1) A dataset (identified by a name),\n", + "2) A partitioning scheme (by selecting one of the supported partitioning schemes, [see all of them here](https://flower.ai/docs/datasets/ref-api/flwr_datasets.partitioner.html), or creating a custom partitioning scheme).\n", + "\n", + "\n", + "\n", + "**1) Dataset choice**\n", + "\n", + "We will pass the name of the dataset to `FederatedDataset(dataset=\"some-name\", other-parameters)`. In this example it will be: `FederatedDataset(dataset=\"uoft-cs/cifar10\", other-parameters)`\n", + "\n", + "**2) Partitioner choice**\n", + "\n", + "We will partition the dataset in an IID manner using `IidPartitioner` ([link to the docs](https://flower.ai/docs/datasets/ref-api/flwr_datasets.partitioner.IidPartitioner.html#flwr_datasets.partitioner.IidPartitioner)). \n", + "Only the train split of the dataset will be processed. In general, we do `FederatedDataset(dataset=\"some-name\", partitioners={\"split-name\": partitioning_scheme})`, which for this example looks like:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "a759c5b6f25c9dd4", + "metadata": {}, + "outputs": [], + "source": [ + "from flwr_datasets import FederatedDataset\n", + "from flwr_datasets.partitioner import IidPartitioner\n", + "\n", + "fds = FederatedDataset(\n", + " dataset=\"uoft-cs/cifar10\", partitioners={\"train\": IidPartitioner(num_partitions=10)}\n", + ")\n", + "\n", + "# Load the first partition of the \"train\" split\n", + "partition = fds.load_partition(0, \"train\")\n", + "# You can access the whole \"test\" split of the base dataset (it hasn't been partitioned)\n", + "centralized_dataset = fds.load_split(\"test\")" + ] + }, + { + "cell_type": "markdown", + "id": "de75d15c3f5b2383", + "metadata": {}, + "source": [ + "Now we have 10 partitions created from the train split of the CIFAR10 dataset and the test split\n", + "for the centralized evaluation. Later we will convert the type of the dataset from Hugging Face's `Dataset` type to the format required by PyTorch/TensorFlow frameworks." + ] + }, + { + "cell_type": "markdown", + "id": "efa7dbb120505f1f", + "metadata": {}, + "source": [ + "# Investigate the partition" + ] + }, + { + "cell_type": "markdown", + "id": "bf986a1a9f0284cd", + "metadata": {}, + "source": [ + "## Features\n", + "\n", + "Now we will determine the names of the features of your dataset (you can alternatively do that directly on the Hugging Face\n", + "website). The names can vary along different datasets e.g. \"img\" or \"image\", \"label\" or \"labels\". Additionally, if the label column is of [ClassLabel](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.ClassLabel) type, we will also see the names of labels." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f7ff7cecdda8a931", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'img': Image(mode=None, decode=True, id=None),\n", + " 'label': ClassLabel(names=['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'], id=None)}" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Note this dataset has \n", + "partition.features" + ] + }, + { + "cell_type": "markdown", + "id": "2e69ed05193a098a", + "metadata": {}, + "source": [ + "## Indexing\n", + "\n", + "To see the first sample of the partition, we can index it like a Python list." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2f2097d4c5121a1b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'img': ,\n", + " 'label': 1}" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "partition[0]" + ] + }, + { + "cell_type": "markdown", + "id": "a10ad2b97c4dd92a", + "metadata": {}, + "source": [ + "Then we can additionally choose the specific column." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "aa7f0e2e29841f54", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "partition[0][\"label\"]" + ] + }, + { + "cell_type": "markdown", + "id": "3fe1cef9a121dbc5", + "metadata": {}, + "source": [ + "We can also use slicing (take a few samples). Let's take the first 3 samples of the first partition:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "779818b365682c60", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'img': [,\n", + " ,\n", + " ],\n", + " 'label': [1, 2, 6]}" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "partition[:3]" + ] + }, + { + "cell_type": "markdown", + "id": "a354aa36fc586438", + "metadata": {}, + "source": [ + "We get a dictionary where the keys are the names of the columns and the values are list of the corresponding values of each row of the dataset. So to take the first 3 labels we can do:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "25fca62a8f2fbe51", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[1, 2, 6]" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "partition[:3][\"label\"]" + ] + }, + { + "cell_type": "markdown", + "id": "4e4790671ffe2142", + "metadata": {}, + "source": [ + "Note that the indexing by column first is also possible but discouraged because the whole column will be loaded into the memory." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7836fe6d65c673b2", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[1, 2, 6]" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "partition[\"label\"][:3]" + ] + }, + { + "cell_type": "markdown", + "id": "c3c46099625437fc", + "metadata": {}, + "source": [ + "You can also select a subset of the dataset and keep the same type (dataset.Dataset) instead of receiving a dictionary of values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "708abab74de3d5a1", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Dataset({\n", + " features: ['img', 'label'],\n", + " num_rows: 3\n", + "})" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "partition.select([0, 1, 2])" + ] + }, + { + "cell_type": "markdown", + "id": "462f707b4f078a8d", + "metadata": {}, + "source": [ + "And this dataset contains the same samples as we saw before." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "19d2e3cc74d93c4d", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'img': [,\n", + " ,\n", + " ],\n", + " 'label': [1, 2, 6]}" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "partition.select([0, 1, 2])[:]" + ] + }, + { + "cell_type": "markdown", + "id": "b5e683cfaddf92f", + "metadata": {}, + "source": [ + "# Use with PyTorch/NumPy/TensorFlow\n", + "\n", + "For more detailed instructions, go to:\n", + "* [how-to-use-with-pytorch](https://flower.ai/docs/datasets/how-to-use-with-pytorch.html)\n", + "* [how-to-use-with-numpy](https://flower.ai/docs/datasets/how-to-use-with-numpy.html)\n", + "* [how-to-use-with-tensorflow](https://flower.ai/docs/datasets/how-to-use-with-tensorflow.html)" + ] + }, + { + "cell_type": "markdown", + "id": "de14f09f0ee4f6ac", + "metadata": {}, + "source": [ + "## PyTorch\n", + "\n", + "Transform the `Dataset` into the `DataLoader`, use the `PyTorch transforms` (`Compose` and all the others are possible)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a94321ee", + "metadata": {}, + "outputs": [], + "source": [ + "! pip install -q torch torchvision" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "544c0e73054f3445", + "metadata": {}, + "outputs": [], + "source": [ + "from torch.utils.data import DataLoader\n", + "from torchvision.transforms import ToTensor\n", + "\n", + "transforms = ToTensor()\n", + "\n", + "\n", + "def apply_transforms(batch):\n", + " # For CIFAR-10 the \"img\" column contains the images we want to apply the transforms to\n", + " batch[\"img\"] = [transforms(img) for img in batch[\"img\"]]\n", + " return batch\n", + "\n", + "\n", + "partition_torch = partition.with_transform(apply_transforms)\n", + "dataloader = DataLoader(partition_torch, batch_size=64)" + ] + }, + { + "cell_type": "markdown", + "id": "71531613", + "metadata": {}, + "source": [ + "## NumPy\n", + "\n", + "NumPy can be used as input to the TensorFlow and scikit-learn models. The transformation is very simple." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6b98b3e1", + "metadata": {}, + "outputs": [], + "source": [ + "partition_np = partition.with_format(\"numpy\")\n", + "X_train, y_train = partition_np[\"img\"], partition_np[\"label\"]" + ] + }, + { + "cell_type": "markdown", + "id": "e4867834", + "metadata": {}, + "source": [ + "## TensorFlow Dataset\n", + "\n", + "Transformation to TensorFlow Dataset is a one-liner." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a69ce677", + "metadata": {}, + "outputs": [], + "source": [ + "! pip install -q tensorflow" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "db86f1aa", + "metadata": {}, + "outputs": [], + "source": [ + "tf_dataset = partition.to_tf_dataset(\n", + " columns=\"img\", label_cols=\"label\", batch_size=64, shuffle=True\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "61fd797c", + "metadata": {}, + "source": [ + "# Final remarks" + ] + }, + { + "cell_type": "markdown", + "id": "91ad1252", + "metadata": {}, + "source": [ + "Congratulations, you now know the basics of Flower Datasets and are ready to perform basic dataset preparation for Federated Learning." + ] + }, + { + "cell_type": "markdown", + "id": "ade71d23", + "metadata": {}, + "source": [ + "# Next Steps" + ] + }, + { + "cell_type": "markdown", + "id": "f54d8031", + "metadata": {}, + "source": [ + "This is the first quickstart tutorial from the Flower Datasets series. See other tutorials:\n", + "* [Visualize Label Distribution](https://flower.ai/docs/datasets/how-to-visualize-label-distribution.html)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "flwr", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.12" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/datasets/doc/source/tutorial-quickstart.rst b/datasets/doc/source/tutorial-quickstart.rst deleted file mode 100644 index e820e116fc61..000000000000 --- a/datasets/doc/source/tutorial-quickstart.rst +++ /dev/null @@ -1,99 +0,0 @@ -Quickstart -========== - -Run Flower Datasets as fast as possible by learning only the essentials. - -Install Federated Datasets --------------------------- -On the command line, run - -.. code-block:: bash - - python -m pip install "flwr-datasets[vision]" - -Install the ML framework ------------------------- -TensorFlow - -.. code-block:: bash - - pip install tensorflow - -PyTorch - -.. code-block:: bash - - pip install torch torchvision - -Choose the dataset ------------------- -Choose the dataset by going to Hugging Face `Datasets Hub `_ and searching for your -dataset by name that you will pass to the `dataset` parameter of `FederatedDataset`. Note that the name is case sensitive. - -Partition the dataset ---------------------- -To iid partition your dataset, choose the split you want to partition and the number of partitions:: - - from flwr_datasets import FederatedDataset - - fds = FederatedDataset(dataset="cifar10", partitioners={"train": 10}) - partition = fds.load_partition(0, "train") - centralized_dataset = fds.load_split("test") - -Now you're ready to go. You have ten partitions created from the train split of the CIFAR10 dataset and the test split -for the centralized evaluation. We will convert the type of the dataset from Hugging Face's `Dataset` type to the one -supported by your framework. - -Display the features --------------------- -Determine the names of the features of your dataset (you can alternatively do that directly on the Hugging Face -website). The names can vary along different datasets e.g. "img" or "image", "label" or "labels". You will also see -the names of label categories. Type:: - - partition.features - -In case of CIFAR10, you should see the following output. - -.. code-block:: none - - {'img': Image(decode=True, id=None), - 'label': ClassLabel(names=['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', - 'frog', 'horse', 'ship', 'truck'], id=None)} - -Note that the image is denoted by "img" which is crucial for the next steps (conversion you the ML -framework of your choice). - -Conversion ----------- -For more detailed instructions, go to :doc:`how-to-use-with-pytorch`, :doc:`how-to-use-with-numpy`, or -:doc:`how-to-use-with-tensorflow`. - -PyTorch DataLoader -^^^^^^^^^^^^^^^^^^ -Transform the Dataset into the DataLoader, use the PyTorch transforms (`Compose` and all the others are also -possible):: - - from torch.utils.data import DataLoader - from torchvision.transforms import ToTensor - - transforms = ToTensor() - def apply_transforms(batch): - batch["img"] = [transforms(img) for img in batch["img"]] - return batch - partition_torch = partition.with_transform(apply_transforms) - dataloader = DataLoader(partition_torch, batch_size=64) - -NumPy -^^^^^ -NumPy can be used as input to the TensorFlow and scikit-learn models and it is very straightforward:: - - partition_np = partition.with_format("numpy") - X_train, y_train = partition_np["img"], partition_np["label"] - -TensorFlow Dataset -^^^^^^^^^^^^^^^^^^ -Transformation to TensorFlow Dataset is a one-liner:: - - tf_dataset = partition.to_tf_dataset(columns="img", label_cols="label", batch_size=64, - shuffle=True) - diff --git a/doc/locales/fr/LC_MESSAGES/framework-docs.po b/doc/locales/fr/LC_MESSAGES/framework-docs.po index 6624d91f9e64..efa10a69531c 100644 --- a/doc/locales/fr/LC_MESSAGES/framework-docs.po +++ b/doc/locales/fr/LC_MESSAGES/framework-docs.po @@ -650,8 +650,8 @@ msgid "``pip install -U --pre flwr`` (without extras)" msgstr "``pip install -U --pre flwr`` (sans les extras)" #: ../../source/contributor-how-to-install-development-versions.rst:33 -msgid "``pip install -U --pre flwr[simulation]`` (with extras)" -msgstr "``pip install -U --pre flwr[simulation]`` (avec les extras)" +msgid "``pip install -U --pre 'flwr[simulation]'`` (with extras)" +msgstr "``pip install -U --pre 'flwr[simulation]'`` (avec les extras)" #: ../../source/contributor-how-to-install-development-versions.rst:35 msgid "" @@ -676,10 +676,10 @@ msgstr "" #: ../../source/contributor-how-to-install-development-versions.rst:40 msgid "" -"``pip install flwr[simulation]@git+https://github.com/adap/flower.git`` " +"``pip install 'flwr[simulation]@git+https://github.com/adap/flower.git'`` " "(with extras)" msgstr "" -"``pip install flwr[simulation]@git+https://github.com/adap/flower.git`` " +"``pip install 'flwr[simulation]@git+https://github.com/adap/flower.git'`` " "(avec les extras)" #: ../../source/contributor-how-to-install-development-versions.rst:42 @@ -698,11 +698,11 @@ msgstr "" #: ../../source/contributor-how-to-install-development-versions.rst:45 msgid "" -"``pip install flwr[simulation]@git+https://github.com/adap/flower.git" -"@branch-name`` (with extras)" +"``pip install 'flwr[simulation]@git+https://github.com/adap/flower.git``" +"@branch-name'`` (with extras)" msgstr "" -"``pip install flwr[simulation]@git+https://github.com/adap/flower.git" -"@nom-de-branche`` (avec des extras)" +"``pip install 'flwr[simulation]@git+https://github.com/adap/flower.git``" +"@nom-de-la-branche'`` (avec des extras)" #: ../../source/contributor-how-to-install-development-versions.rst:49 msgid "Open Jupyter Notebooks on Google Colab" @@ -6963,10 +6963,10 @@ msgstr "" #: ../../source/how-to-upgrade-to-flower-1.0.rst:15 msgid "" -"``python -m pip install -U flwr[simulation]`` (when using " +"``python -m pip install -U 'flwr[simulation]'`` (when using " "``start_simulation``)" msgstr "" -"``python -m pip install -U flwr[simulation]`` (lors de l'utilisation de " +"``python -m pip install -U 'flwr[simulation]'`` (lors de l'utilisation de " "``start_simulation``)" #: ../../source/how-to-upgrade-to-flower-1.0.rst:17 @@ -20007,12 +20007,12 @@ msgid "" "Simulations (using the Virtual Client Engine through `start_simulation`) " "now work more smoothly on Jupyter Notebooks (incl. Google Colab) after " "installing Flower with the `simulation` extra (`pip install " -"flwr[simulation]`)." +"'flwr[simulation]'`)." msgstr "" "Les simulations (utilisant le moteur de client virtuel via " "`start_simulation`) fonctionnent maintenant plus facilement sur les " "Notebooks Jupyter (y compris Google Colab) après avoir installé Flower " -"avec l'option `simulation` (`pip install flwr[simulation]`)." +"avec l'option `simulation` (`pip install 'flwr[simulation]'`)." #: ../../source/ref-changelog.md:887 msgid "" diff --git a/doc/locales/ko/LC_MESSAGES/framework-docs.po b/doc/locales/ko/LC_MESSAGES/framework-docs.po index f01f9eaf7bd9..d0ba4f6ed5a1 100644 --- a/doc/locales/ko/LC_MESSAGES/framework-docs.po +++ b/doc/locales/ko/LC_MESSAGES/framework-docs.po @@ -666,8 +666,8 @@ msgid "``pip install -U --pre flwr`` (without extras)" msgstr "``pip install -U --pre flwr`` (extras 제외)" #: ../../source/contributor-how-to-install-development-versions.rst:33 -msgid "``pip install -U --pre flwr[simulation]`` (with extras)" -msgstr "``pip install -U --pre flwr[simulation]`` (extras 포함)" +msgid "``pip install -U --pre 'flwr[simulation]'`` (with extras)" +msgstr "``pip install -U --pre 'flwr[simulation]'`` (extras 포함)" #: ../../source/contributor-how-to-install-development-versions.rst:35 msgid "" @@ -689,10 +689,10 @@ msgstr "" #: ../../source/contributor-how-to-install-development-versions.rst:40 msgid "" -"``pip install flwr[simulation]@git+https://github.com/adap/flower.git`` " +"``pip install 'flwr[simulation]@git+https://github.com/adap/flower.git'`` " "(with extras)" msgstr "" -"``pip install flwr[simulation]@git+https://github.com/adap/flower.git`` " +"``pip install 'flwr[simulation]@git+https://github.com/adap/flower.git'`` " "(extras 포함)" #: ../../source/contributor-how-to-install-development-versions.rst:42 @@ -709,11 +709,11 @@ msgstr "" #: ../../source/contributor-how-to-install-development-versions.rst:45 msgid "" -"``pip install flwr[simulation]@git+https://github.com/adap/flower.git@branch-" -"name`` (with extras)" +"``pip install 'flwr[simulation]@git+https://github.com/adap/flower.git@branch-" +"name'`` (with extras)" msgstr "" -"``pip install flwr[simulation]@git+https://github.com/adap/flower.git@branch-" -"name`` (extras 포함)" +"``pip install 'flwr[simulation]@git+https://github.com/adap/flower.git@branch-" +"name'`` (extras 포함)" #: ../../source/contributor-how-to-install-development-versions.rst:49 msgid "Open Jupyter Notebooks on Google Colab" @@ -7021,10 +7021,10 @@ msgstr "" #: ../../source/how-to-upgrade-to-flower-1.0.rst:15 msgid "" -"``python -m pip install -U flwr[simulation]`` (when using " +"``python -m pip install -U 'flwr[simulation]'`` (when using " "``start_simulation``)" msgstr "" -"``python -m pip install -U flwr[simulation]``(``start_simulation`` 사용 시)" +"``python -m pip install -U 'flwr[simulation]'``(``start_simulation`` 사용 시)" #: ../../source/how-to-upgrade-to-flower-1.0.rst:17 msgid "" @@ -18594,7 +18594,7 @@ msgid "" "Simulations (using the Virtual Client Engine through `start_simulation`) now " "work more smoothly on Jupyter Notebooks (incl. Google Colab) after " "installing Flower with the `simulation` extra (`pip install " -"flwr[simulation]`)." +"'flwr[simulation]'`)." msgstr "" #: ../../source/ref-changelog.md:887 diff --git a/doc/locales/pt_BR/LC_MESSAGES/framework-docs.po b/doc/locales/pt_BR/LC_MESSAGES/framework-docs.po index 49bb01908421..e50c290432cc 100644 --- a/doc/locales/pt_BR/LC_MESSAGES/framework-docs.po +++ b/doc/locales/pt_BR/LC_MESSAGES/framework-docs.po @@ -674,7 +674,7 @@ msgid "``pip install -U --pre flwr`` (without extras)" msgstr "" #: ../../source/contributor-how-to-install-development-versions.rst:33 -msgid "``pip install -U --pre flwr[simulation]`` (with extras)" +msgid "``pip install -U --pre 'flwr[simulation]'`` (with extras)" msgstr "" #: ../../source/contributor-how-to-install-development-versions.rst:35 @@ -695,7 +695,7 @@ msgstr "" #: ../../source/contributor-how-to-install-development-versions.rst:40 msgid "" -"``pip install flwr[simulation]@git+https://github.com/adap/flower.git`` " +"``pip install 'flwr[simulation]@git+https://github.com/adap/flower.git'`` " "(with extras)" msgstr "" @@ -711,8 +711,8 @@ msgstr "" #: ../../source/contributor-how-to-install-development-versions.rst:45 msgid "" -"``pip install flwr[simulation]@git+https://github.com/adap/flower.git" -"@branch-name`` (with extras)" +"``pip install 'flwr[simulation]@git+https://github.com/adap/flower.git" +"@branch-name'`` (with extras)" msgstr "" #: ../../source/contributor-how-to-install-development-versions.rst:49 @@ -5758,7 +5758,7 @@ msgstr "" #: ../../source/how-to-upgrade-to-flower-1.0.rst:15 msgid "" -"``python -m pip install -U flwr[simulation]`` (when using " +"``python -m pip install -U 'flwr[simulation]'`` (when using " "``start_simulation``)" msgstr "" @@ -17221,7 +17221,7 @@ msgid "" "Simulations (using the Virtual Client Engine through `start_simulation`) " "now work more smoothly on Jupyter Notebooks (incl. Google Colab) after " "installing Flower with the `simulation` extra (`pip install " -"flwr[simulation]`)." +"'flwr[simulation]'`)." msgstr "" #: ../../source/ref-changelog.md:887 diff --git a/doc/locales/zh_Hans/LC_MESSAGES/framework-docs.po b/doc/locales/zh_Hans/LC_MESSAGES/framework-docs.po index d07217ea35f7..e9279db19043 100644 --- a/doc/locales/zh_Hans/LC_MESSAGES/framework-docs.po +++ b/doc/locales/zh_Hans/LC_MESSAGES/framework-docs.po @@ -664,11 +664,11 @@ msgstr "从 PyPI 安装 ``flwr`` 预发行版:" #: ../../source/contributor-how-to-install-development-versions.rst:32 msgid "``pip install -U --pre flwr`` (without extras)" -msgstr "`pip install -U -pre flwr``(不含额外功能)" +msgstr "``pip install -U -pre flwr``(不含额外功能)" #: ../../source/contributor-how-to-install-development-versions.rst:33 -msgid "``pip install -U --pre flwr[simulation]`` (with extras)" -msgstr "`pip install -U -pre flwr[simulation]``(包含额外功能)" +msgid "``pip install -U --pre 'flwr[simulation]'`` (with extras)" +msgstr "``pip install -U -pre 'flwr[simulation]'``(包含额外功能)" #: ../../source/contributor-how-to-install-development-versions.rst:35 msgid "" @@ -684,15 +684,15 @@ msgstr "从 GitHub 的默认分支 (``main`) 安装 ``flwr``:" msgid "" "``pip install flwr@git+https://github.com/adap/flower.git`` (without " "extras)" -msgstr "`pip install flwr@git+https://github.com/adap/flower.git`` (不含额外功能)" +msgstr "``pip install flwr@git+https://github.com/adap/flower.git`` (不含额外功能)" #: ../../source/contributor-how-to-install-development-versions.rst:40 msgid "" -"``pip install flwr[simulation]@git+https://github.com/adap/flower.git`` " +"``pip install 'flwr[simulation]@git+https://github.com/adap/flower.git'`` " "(with extras)" msgstr "" -"`pip install " -"flwr[simulation]@git+https://github.com/adap/flower.git``(带附加功能)" +"``pip install " +"'flwr[simulation]@git+https://github.com/adap/flower.git'``(带附加功能)" #: ../../source/contributor-how-to-install-development-versions.rst:42 msgid "Install ``flwr`` from a specific GitHub branch (``branch-name``):" @@ -703,14 +703,14 @@ msgid "" "``pip install flwr@git+https://github.com/adap/flower.git@branch-name`` " "(without extras)" msgstr "" -"`pip install flwr@git+https://github.com/adap/flower.git@branch-name`` " +"``pip install flwr@git+https://github.com/adap/flower.git@branch-name`` " "(不含附加功能)" #: ../../source/contributor-how-to-install-development-versions.rst:45 msgid "" -"``pip install flwr[simulation]@git+https://github.com/adap/flower.git" -"@branch-name`` (with extras)" -msgstr "`pip安装flwr[模拟]@git+https://github.com/adap/flower.git@分支名``(带附加功能)" +"``pip install 'flwr[simulation]@git+https://github.com/adap/flower.git" +"@branch-name'`` (with extras)" +msgstr "``pip install 'flwr[simulation]@git+https://github.com/adap/flower.git@分支名'``(带附加功能)" #: ../../source/contributor-how-to-install-development-versions.rst:49 msgid "Open Jupyter Notebooks on Google Colab" @@ -6561,9 +6561,9 @@ msgstr "`python -m pip install -U flwr``(当使用`start_server`和`start_clie #: ../../source/how-to-upgrade-to-flower-1.0.rst:15 msgid "" -"``python -m pip install -U flwr[simulation]`` (when using " +"``python -m pip install -U 'flwr[simulation]'`` (when using " "``start_simulation``)" -msgstr "`python -m pip install -U flwr[simulation]``(当使用`start_simulation``时)" +msgstr "``python -m pip install -U 'flwr[simulation]'``(当使用`start_simulation``时)" #: ../../source/how-to-upgrade-to-flower-1.0.rst:17 msgid "" @@ -21363,10 +21363,10 @@ msgid "" "Simulations (using the Virtual Client Engine through `start_simulation`) " "now work more smoothly on Jupyter Notebooks (incl. Google Colab) after " "installing Flower with the `simulation` extra (`pip install " -"flwr[simulation]`)." +"'flwr[simulation]'`)." msgstr "" "通过 `start_simulation` 在 Jupyter 笔记本(包括 Google Colab)上安装 Flower 并附加 " -"`simulation` (`pip install flwr[simulation]`)后,模拟(通过 `start_simulation` " +"`simulation` (`pip install 'flwr[simulation]'`)后,模拟(通过 `start_simulation` " "使用虚拟客户端引擎)现在可以更流畅地运行。" #: ../../source/ref-changelog.md:887 diff --git a/doc/source/contributor-how-to-install-development-versions.rst b/doc/source/contributor-how-to-install-development-versions.rst index 15e2939ef138..0f0773c85e73 100644 --- a/doc/source/contributor-how-to-install-development-versions.rst +++ b/doc/source/contributor-how-to-install-development-versions.rst @@ -30,19 +30,19 @@ Using pip (recommended on Colab) Install a ``flwr`` pre-release from PyPI: - ``pip install -U --pre flwr`` (without extras) -- ``pip install -U --pre flwr[simulation]`` (with extras) +- ``pip install -U --pre 'flwr[simulation]'`` (with extras) Python packages can be installed from git repositories. Use one of the following commands to install the Flower directly from GitHub. Install ``flwr`` from the default GitHub branch (``main``): - ``pip install flwr@git+https://github.com/adap/flower.git`` (without extras) -- ``pip install flwr[simulation]@git+https://github.com/adap/flower.git`` (with extras) +- ``pip install 'flwr[simulation]@git+https://github.com/adap/flower.git'`` (with extras) Install ``flwr`` from a specific GitHub branch (``branch-name``): - ``pip install flwr@git+https://github.com/adap/flower.git@branch-name`` (without extras) -- ``pip install flwr[simulation]@git+https://github.com/adap/flower.git@branch-name`` (with extras) +- ``pip install 'flwr[simulation]@git+https://github.com/adap/flower.git@branch-name'`` (with extras) Open Jupyter Notebooks on Google Colab diff --git a/doc/source/how-to-install-flower.rst b/doc/source/how-to-install-flower.rst index b00e2ae803ab..b9107995c226 100644 --- a/doc/source/how-to-install-flower.rst +++ b/doc/source/how-to-install-flower.rst @@ -68,7 +68,7 @@ New (possibly unstable) versions of Flower are sometimes available as pre-releas For simulations that use the Virtual Client Engine, ``flwr`` pre-releases should be installed with the ``simulation`` extra:: - python -m pip install -U --pre flwr[simulation] + python -m pip install -U --pre 'flwr[simulation]' Install nightly release ~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/doc/source/how-to-upgrade-to-flower-1.0.rst b/doc/source/how-to-upgrade-to-flower-1.0.rst index 3a55a1a953f5..c0721b0f3736 100644 --- a/doc/source/how-to-upgrade-to-flower-1.0.rst +++ b/doc/source/how-to-upgrade-to-flower-1.0.rst @@ -12,7 +12,7 @@ Here's how to update an existing installation to Flower 1.0 using either pip or - pip: add ``-U`` when installing. - ``python -m pip install -U flwr`` (when using ``start_server`` and ``start_client``) - - ``python -m pip install -U flwr[simulation]`` (when using ``start_simulation``) + - ``python -m pip install -U 'flwr[simulation]'`` (when using ``start_simulation``) - Poetry: update the ``flwr`` dependency in ``pyproject.toml`` and then reinstall (don't forget to delete ``poetry.lock`` via ``rm poetry.lock`` before running ``poetry install``). diff --git a/doc/source/how-to-upgrade-to-flower-next.rst b/doc/source/how-to-upgrade-to-flower-next.rst index a17756247566..e1e94f095b60 100644 --- a/doc/source/how-to-upgrade-to-flower-next.rst +++ b/doc/source/how-to-upgrade-to-flower-next.rst @@ -55,7 +55,7 @@ or if you need Flower Next with simulation: .. code-block:: bash - $ python -m pip install -U flwr[simulation] + $ python -m pip install -U "flwr[simulation]" Ensure you set the following version constraint in your ``requirements.txt`` diff --git a/doc/source/ref-changelog.md b/doc/source/ref-changelog.md index 58fc8b4f69b1..d28446c4dd06 100644 --- a/doc/source/ref-changelog.md +++ b/doc/source/ref-changelog.md @@ -882,7 +882,7 @@ We would like to give our **special thanks** to all the contributors who made Fl - **Improved Virtual Client Engine compatibility with Jupyter Notebook / Google Colab** ([#866](https://github.com/adap/flower/pull/866), [#872](https://github.com/adap/flower/pull/872), [#833](https://github.com/adap/flower/pull/833), [#1036](https://github.com/adap/flower/pull/1036)) - Simulations (using the Virtual Client Engine through `start_simulation`) now work more smoothly on Jupyter Notebooks (incl. Google Colab) after installing Flower with the `simulation` extra (`pip install flwr[simulation]`). + Simulations (using the Virtual Client Engine through `start_simulation`) now work more smoothly on Jupyter Notebooks (incl. Google Colab) after installing Flower with the `simulation` extra (`pip install 'flwr[simulation]'`). - **New Jupyter Notebook code example** ([#833](https://github.com/adap/flower/pull/833)) diff --git a/src/docker/base/alpine/Dockerfile b/src/docker/base/alpine/Dockerfile index 9e58d82e3bda..441e0fdd9b85 100644 --- a/src/docker/base/alpine/Dockerfile +++ b/src/docker/base/alpine/Dockerfile @@ -26,7 +26,7 @@ ARG PYTHON_VERSION=3.11 ARG DISTRO=alpine ARG DISTRO_VERSION=3.19 -FROM python:${PYTHON_VERSION}-${DISTRO}${DISTRO_VERSION} as compile +FROM python:${PYTHON_VERSION}-${DISTRO}${DISTRO_VERSION} AS compile # Install system dependencies RUN apk add --no-cache \ @@ -49,7 +49,11 @@ RUN pip install -U --no-cache-dir \ setuptools==${SETUPTOOLS_VERSION} \ ${FLWR_PACKAGE}==${FLWR_VERSION} -FROM python:${PYTHON_VERSION}-${DISTRO}${DISTRO_VERSION} as base +FROM python:${PYTHON_VERSION}-${DISTRO}${DISTRO_VERSION} AS base + +# Upgrade system Python pip and setuptools +# hadolint ignore=DL3013 +RUN pip install -U --no-cache-dir pip setuptools # required by the grpc package RUN apk add --no-cache \ diff --git a/src/docker/base/ubuntu/Dockerfile b/src/docker/base/ubuntu/Dockerfile index 960ed07edf96..31cc8381b7c5 100644 --- a/src/docker/base/ubuntu/Dockerfile +++ b/src/docker/base/ubuntu/Dockerfile @@ -16,7 +16,7 @@ # hadolint global ignore=DL3008 ARG DISTRO=ubuntu ARG DISTRO_VERSION=22.04 -FROM $DISTRO:$DISTRO_VERSION as python +FROM $DISTRO:$DISTRO_VERSION AS python ENV DEBIAN_FRONTEND=noninteractive @@ -50,9 +50,12 @@ RUN LATEST=$(pyenv latest -k ${PYTHON_VERSION}) \ ENV PATH=/usr/local/bin/python/bin:$PATH -# Use a virtual environment to ensure that Python packages are installed in the same location -# regardless of whether the subsequent image build is run with the app or the root user -RUN python -m venv /python/venv +# Upgrade system Python pip and setuptools +# hadolint ignore=DL3013 +RUN pip install -U --no-cache-dir pip setuptools \ + # Use a virtual environment to ensure that Python packages are installed in the same location + # regardless of whether the subsequent image build is run with the app or the root user + && python -m venv /python/venv ENV PATH=/python/venv/bin:$PATH ARG PIP_VERSION @@ -64,7 +67,7 @@ RUN pip install -U --no-cache-dir \ setuptools==${SETUPTOOLS_VERSION} \ ${FLWR_PACKAGE}==${FLWR_VERSION} -FROM $DISTRO:$DISTRO_VERSION as base +FROM $DISTRO:$DISTRO_VERSION AS base COPY --from=python /usr/local/bin/python /usr/local/bin/python diff --git a/src/py/flwr/cli/config_utils.py b/src/py/flwr/cli/config_utils.py index d150e1b5f53d..74eda81f5c16 100644 --- a/src/py/flwr/cli/config_utils.py +++ b/src/py/flwr/cli/config_utils.py @@ -77,6 +77,9 @@ def load_and_validate( A tuple with the optional config in case it exists and is valid and associated errors and warnings. """ + if path is None: + path = Path.cwd() / "pyproject.toml" + config = load(path) if config is None: @@ -86,7 +89,7 @@ def load_and_validate( ] return (None, errors, []) - is_valid, errors, warnings = validate(config, check_module) + is_valid, errors, warnings = validate(config, check_module, path.parent) if not is_valid: return (None, errors, warnings) @@ -94,14 +97,8 @@ def load_and_validate( return (config, errors, warnings) -def load(path: Optional[Path] = None) -> Optional[Dict[str, Any]]: +def load(toml_path: Path) -> Optional[Dict[str, Any]]: """Load pyproject.toml and return as dict.""" - if path is None: - cur_dir = Path.cwd() - toml_path = cur_dir / "pyproject.toml" - else: - toml_path = path - if not toml_path.is_file(): return None @@ -167,7 +164,9 @@ def validate_fields(config: Dict[str, Any]) -> Tuple[bool, List[str], List[str]] def validate( - config: Dict[str, Any], check_module: bool = True + config: Dict[str, Any], + check_module: bool = True, + project_dir: Optional[Union[str, Path]] = None, ) -> Tuple[bool, List[str], List[str]]: """Validate pyproject.toml.""" is_valid, errors, warnings = validate_fields(config) @@ -176,16 +175,15 @@ def validate( return False, errors, warnings # Validate serverapp - is_valid, reason = object_ref.validate( - config["tool"]["flwr"]["app"]["components"]["serverapp"], check_module - ) + serverapp_ref = config["tool"]["flwr"]["app"]["components"]["serverapp"] + is_valid, reason = object_ref.validate(serverapp_ref, check_module, project_dir) + if not is_valid and isinstance(reason, str): return False, [reason], [] # Validate clientapp - is_valid, reason = object_ref.validate( - config["tool"]["flwr"]["app"]["components"]["clientapp"], check_module - ) + clientapp_ref = config["tool"]["flwr"]["app"]["components"]["clientapp"] + is_valid, reason = object_ref.validate(clientapp_ref, check_module, project_dir) if not is_valid and isinstance(reason, str): return False, [reason], [] diff --git a/src/py/flwr/cli/config_utils_test.py b/src/py/flwr/cli/config_utils_test.py index 077f254fb914..cad6714521e3 100644 --- a/src/py/flwr/cli/config_utils_test.py +++ b/src/py/flwr/cli/config_utils_test.py @@ -79,7 +79,7 @@ def test_load_pyproject_toml_load_from_cwd(tmp_path: Path) -> None: f.write(textwrap.dedent(pyproject_toml_content)) # Execute - config = load() + config = load(toml_path=Path.cwd() / "pyproject.toml") # Assert assert config == expected_config @@ -135,7 +135,7 @@ def test_load_pyproject_toml_from_path(tmp_path: Path) -> None: } # Current directory - origin = os.getcwd() + origin = Path.cwd() try: # Change into the temporary directory @@ -144,7 +144,7 @@ def test_load_pyproject_toml_from_path(tmp_path: Path) -> None: f.write(textwrap.dedent(pyproject_toml_content)) # Execute - config = load(path=tmp_path / "pyproject.toml") + config = load(toml_path=tmp_path / "pyproject.toml") # Assert assert config == expected_config diff --git a/src/py/flwr/cli/new/new.py b/src/py/flwr/cli/new/new.py index 306f20efccfa..237b8847e193 100644 --- a/src/py/flwr/cli/new/new.py +++ b/src/py/flwr/cli/new/new.py @@ -38,7 +38,7 @@ class MlFramework(str, Enum): PYTORCH = "PyTorch" TENSORFLOW = "TensorFlow" JAX = "JAX" - HUGGINGFACE = "HF" + HUGGINGFACE = "HuggingFace" MLX = "MLX" SKLEARN = "sklearn" FLOWERTUNE = "FlowerTune" diff --git a/src/py/flwr/cli/new/templates/app/code/client.hf.py.tpl b/src/py/flwr/cli/new/templates/app/code/client.huggingface.py.tpl similarity index 93% rename from src/py/flwr/cli/new/templates/app/code/client.hf.py.tpl rename to src/py/flwr/cli/new/templates/app/code/client.huggingface.py.tpl index e79952ea09ae..5a2037897d4d 100644 --- a/src/py/flwr/cli/new/templates/app/code/client.hf.py.tpl +++ b/src/py/flwr/cli/new/templates/app/code/client.huggingface.py.tpl @@ -50,8 +50,8 @@ def client_fn(context: Context): CHECKPOINT, num_labels=2 ).to(DEVICE) - partition_id = int(context.node_config["partition-id"]) - num_partitions = int(context.node_config["num-partitions"]) + partition_id = context.node_config["partition-id"] + num_partitions = context.node_config["num-partitions"] trainloader, valloader = load_data(partition_id, num_partitions) local_epochs = context.run_config["local-epochs"] diff --git a/src/py/flwr/cli/new/templates/app/code/client.mlx.py.tpl b/src/py/flwr/cli/new/templates/app/code/client.mlx.py.tpl index 3817b325f5e3..a28c59eda232 100644 --- a/src/py/flwr/cli/new/templates/app/code/client.mlx.py.tpl +++ b/src/py/flwr/cli/new/templates/app/code/client.mlx.py.tpl @@ -70,8 +70,8 @@ class FlowerClient(NumPyClient): def client_fn(context: Context): - partition_id = int(context.node_config["partition-id"]) - num_partitions = int(context.node_config["num-partitions"]) + partition_id = context.node_config["partition-id"] + num_partitions = context.node_config["num-partitions"] data = load_data(partition_id, num_partitions) num_layers = context.run_config["num-layers"] diff --git a/src/py/flwr/cli/new/templates/app/code/client.pytorch.py.tpl b/src/py/flwr/cli/new/templates/app/code/client.pytorch.py.tpl index 6fb201e20b28..ec7f5ffd9b00 100644 --- a/src/py/flwr/cli/new/templates/app/code/client.pytorch.py.tpl +++ b/src/py/flwr/cli/new/templates/app/code/client.pytorch.py.tpl @@ -42,8 +42,8 @@ class FlowerClient(NumPyClient): def client_fn(context: Context): # Load model and data net = Net().to(DEVICE) - partition_id = int(context.node_config["partition-id"]) - num_partitions = int(context.node_config["num-partitions"]) + partition_id = context.node_config["partition-id"] + num_partitions = context.node_config["num-partitions"] trainloader, valloader = load_data(partition_id, num_partitions) local_epochs = context.run_config["local-epochs"] diff --git a/src/py/flwr/cli/new/templates/app/code/client.sklearn.py.tpl b/src/py/flwr/cli/new/templates/app/code/client.sklearn.py.tpl index 9642ae490155..af1cdb512bee 100644 --- a/src/py/flwr/cli/new/templates/app/code/client.sklearn.py.tpl +++ b/src/py/flwr/cli/new/templates/app/code/client.sklearn.py.tpl @@ -69,8 +69,8 @@ class FlowerClient(NumPyClient): def client_fn(context: Context): - partition_id = int(context.node_config["partition-id"]) - num_partitions = int(context.node_config["num-partitions"]) + partition_id = context.node_config["partition-id"] + num_partitions = context.node_config["num-partitions"] fds = FederatedDataset(dataset="mnist", partitioners={"train": num_partitions}) dataset = fds.load_partition(partition_id, "train").with_format("numpy") diff --git a/src/py/flwr/cli/new/templates/app/code/client.tensorflow.py.tpl b/src/py/flwr/cli/new/templates/app/code/client.tensorflow.py.tpl index 54a7c28dedf9..2e1d55d82aa0 100644 --- a/src/py/flwr/cli/new/templates/app/code/client.tensorflow.py.tpl +++ b/src/py/flwr/cli/new/templates/app/code/client.tensorflow.py.tpl @@ -44,8 +44,8 @@ def client_fn(context: Context): # Load model and data net = load_model() - partition_id = int(context.node_config["partition-id"]) - num_partitions = int(context.node_config["num-partitions"]) + partition_id = context.node_config["partition-id"] + num_partitions = context.node_config["num-partitions"] x_train, y_train, x_test, y_test = load_data(partition_id, num_partitions) epochs = context.run_config["local-epochs"] batch_size = context.run_config["batch-size"] diff --git a/src/py/flwr/cli/new/templates/app/code/flwr_tune/app.py.tpl b/src/py/flwr/cli/new/templates/app/code/flwr_tune/app.py.tpl index a0f781df04a1..637658c5b23c 100644 --- a/src/py/flwr/cli/new/templates/app/code/flwr_tune/app.py.tpl +++ b/src/py/flwr/cli/new/templates/app/code/flwr_tune/app.py.tpl @@ -9,8 +9,8 @@ from hydra import compose, initialize from hydra.utils import instantiate from flwr.client import ClientApp -from flwr.common import ndarrays_to_parameters -from flwr.server import ServerApp, ServerConfig +from flwr.common import Context, ndarrays_to_parameters +from flwr.server import ServerApp, ServerAppComponents, ServerConfig from $import_name.client_app import gen_client_fn, get_parameters from $import_name.dataset import get_tokenizer_and_data_collator_and_propt_formatting @@ -67,20 +67,23 @@ init_model = get_model(cfg.model) init_model_parameters = get_parameters(init_model) init_model_parameters = ndarrays_to_parameters(init_model_parameters) -# Instantiate strategy according to config. Here we pass other arguments -# that are only defined at runtime. -strategy = instantiate( - cfg.strategy, - on_fit_config_fn=get_on_fit_config(), - fit_metrics_aggregation_fn=fit_weighted_average, - initial_parameters=init_model_parameters, - evaluate_fn=get_evaluate_fn( - cfg.model, cfg.train.save_every_round, cfg_static.num_rounds, save_path - ), -) +def server_fn(context: Context): + # Instantiate strategy according to config. Here we pass other arguments + # that are only defined at runtime. + strategy = instantiate( + cfg.strategy, + on_fit_config_fn=get_on_fit_config(), + fit_metrics_aggregation_fn=fit_weighted_average, + initial_parameters=init_model_parameters, + evaluate_fn=get_evaluate_fn( + cfg.model, cfg.train.save_every_round, cfg_static.num_rounds, save_path + ), + ) + + config = ServerConfig(num_rounds=cfg_static.num_rounds) + + return ServerAppComponents(strategy=strategy, config=config) + # ServerApp for Flower Next -server = ServerApp( - config=ServerConfig(num_rounds=cfg_static.num_rounds), - strategy=strategy, -) +server = ServerApp(server_fn=server_fn) diff --git a/src/py/flwr/cli/new/templates/app/code/flwr_tune/client.py.tpl b/src/py/flwr/cli/new/templates/app/code/flwr_tune/client.py.tpl index c0d5842964fd..2472e23ece44 100644 --- a/src/py/flwr/cli/new/templates/app/code/flwr_tune/client.py.tpl +++ b/src/py/flwr/cli/new/templates/app/code/flwr_tune/client.py.tpl @@ -10,6 +10,7 @@ from transformers import TrainingArguments from trl import SFTTrainer from flwr.client import NumPyClient +from flwr.common import Context from flwr.common.typing import NDArrays, Scalar from $import_name.dataset import reformat from $import_name.models import cosine_annealing, get_model @@ -102,13 +103,14 @@ def gen_client_fn( model_cfg: DictConfig, train_cfg: DictConfig, save_path: str, -) -> Callable[[str], FlowerClient]: # pylint: disable=too-many-arguments +) -> Callable[[Context], FlowerClient]: # pylint: disable=too-many-arguments """Generate the client function that creates the Flower Clients.""" - def client_fn(cid: str) -> FlowerClient: + def client_fn(context: Context) -> FlowerClient: """Create a Flower client representing a single organization.""" # Let's get the partition corresponding to the i-th client - client_trainset = fds.load_partition(int(cid), "train") + partition_id = context.node_config["partition-id"] + client_trainset = fds.load_partition(partition_id, "train") client_trainset = reformat(client_trainset, llm_task="$llm_challenge_str") return FlowerClient( diff --git a/src/py/flwr/cli/new/templates/app/code/server.hf.py.tpl b/src/py/flwr/cli/new/templates/app/code/server.huggingface.py.tpl similarity index 100% rename from src/py/flwr/cli/new/templates/app/code/server.hf.py.tpl rename to src/py/flwr/cli/new/templates/app/code/server.huggingface.py.tpl diff --git a/src/py/flwr/cli/new/templates/app/code/task.hf.py.tpl b/src/py/flwr/cli/new/templates/app/code/task.huggingface.py.tpl similarity index 88% rename from src/py/flwr/cli/new/templates/app/code/task.hf.py.tpl rename to src/py/flwr/cli/new/templates/app/code/task.huggingface.py.tpl index eb43acfce976..51a21dd17418 100644 --- a/src/py/flwr/cli/new/templates/app/code/task.hf.py.tpl +++ b/src/py/flwr/cli/new/templates/app/code/task.huggingface.py.tpl @@ -10,15 +10,27 @@ from torch.utils.data import DataLoader from transformers import AutoTokenizer, DataCollatorWithPadding from flwr_datasets import FederatedDataset +from flwr_datasets.partitioner import IidPartitioner + warnings.filterwarnings("ignore", category=UserWarning) DEVICE = torch.device("cpu") CHECKPOINT = "distilbert-base-uncased" # transformer model checkpoint +fds = None # Cache FederatedDataset + + def load_data(partition_id: int, num_partitions: int): """Load IMDB data (training and eval)""" - fds = FederatedDataset(dataset="imdb", partitioners={"train": num_partitions}) + # Only initialize `FederatedDataset` once + global fds + if fds is None: + partitioner = IidPartitioner(num_partitions=num_partitions) + fds = FederatedDataset( + dataset="stanfordnlp/imdb", + partitioners={"train": partitioner}, + ) partition = fds.load_partition(partition_id) # Divide data: 80% train, 20% test partition_train_test = partition.train_test_split(test_size=0.2, seed=42) diff --git a/src/py/flwr/cli/new/templates/app/code/task.mlx.py.tpl b/src/py/flwr/cli/new/templates/app/code/task.mlx.py.tpl index 88053b0cd590..1759fe8c0b42 100644 --- a/src/py/flwr/cli/new/templates/app/code/task.mlx.py.tpl +++ b/src/py/flwr/cli/new/templates/app/code/task.mlx.py.tpl @@ -5,10 +5,12 @@ import mlx.nn as nn import numpy as np from datasets.utils.logging import disable_progress_bar from flwr_datasets import FederatedDataset +from flwr_datasets.partitioner import IidPartitioner disable_progress_bar() + class MLP(nn.Module): """A simple MLP.""" @@ -43,8 +45,19 @@ def batch_iterate(batch_size, X, y): yield X[ids], y[ids] +fds = None # Cache FederatedDataset + + def load_data(partition_id: int, num_partitions: int): - fds = FederatedDataset(dataset="mnist", partitioners={"train": num_partitions}) + # Only initialize `FederatedDataset` once + global fds + if fds is None: + partitioner = IidPartitioner(num_partitions=num_partitions) + fds = FederatedDataset( + dataset="ylecun/mnist", + partitioners={"train": partitioner}, + trust_remote_code=True, + ) partition = fds.load_partition(partition_id) partition_splits = partition.train_test_split(test_size=0.2, seed=42) diff --git a/src/py/flwr/cli/new/templates/app/code/task.pytorch.py.tpl b/src/py/flwr/cli/new/templates/app/code/task.pytorch.py.tpl index d5971ffb6ce5..bd2fad5be589 100644 --- a/src/py/flwr/cli/new/templates/app/code/task.pytorch.py.tpl +++ b/src/py/flwr/cli/new/templates/app/code/task.pytorch.py.tpl @@ -6,9 +6,10 @@ import torch import torch.nn as nn import torch.nn.functional as F from torch.utils.data import DataLoader -from torchvision.datasets import CIFAR10 from torchvision.transforms import Compose, Normalize, ToTensor from flwr_datasets import FederatedDataset +from flwr_datasets.partitioner import IidPartitioner + DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") @@ -34,9 +35,19 @@ class Net(nn.Module): return self.fc3(x) +fds = None # Cache FederatedDataset + + def load_data(partition_id: int, num_partitions: int): """Load partition CIFAR10 data.""" - fds = FederatedDataset(dataset="cifar10", partitioners={"train": num_partitions}) + # Only initialize `FederatedDataset` once + global fds + if fds is None: + partitioner = IidPartitioner(num_partitions=num_partitions) + fds = FederatedDataset( + dataset="uoft-cs/cifar10", + partitioners={"train": partitioner}, + ) partition = fds.load_partition(partition_id) # Divide data on each node: 80% train, 20% test partition_train_test = partition.train_test_split(test_size=0.2, seed=42) diff --git a/src/py/flwr/cli/new/templates/app/code/task.tensorflow.py.tpl b/src/py/flwr/cli/new/templates/app/code/task.tensorflow.py.tpl index fa07f93713ed..c495774ffeb3 100644 --- a/src/py/flwr/cli/new/templates/app/code/task.tensorflow.py.tpl +++ b/src/py/flwr/cli/new/templates/app/code/task.tensorflow.py.tpl @@ -4,11 +4,13 @@ import os import tensorflow as tf from flwr_datasets import FederatedDataset +from flwr_datasets.partitioner import IidPartitioner # Make TensorFlow log less verbose os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3" + def load_model(): # Load model and data (MobileNetV2, CIFAR-10) model = tf.keras.applications.MobileNetV2((32, 32, 3), classes=10, weights=None) @@ -16,9 +18,19 @@ def load_model(): return model +fds = None # Cache FederatedDataset + + def load_data(partition_id, num_partitions): # Download and partition dataset - fds = FederatedDataset(dataset="cifar10", partitioners={"train": num_partitions}) + # Only initialize `FederatedDataset` once + global fds + if fds is None: + partitioner = IidPartitioner(num_partitions=num_partitions) + fds = FederatedDataset( + dataset="uoft-cs/cifar10", + partitioners={"train": partitioner}, + ) partition = fds.load_partition(partition_id, "train") partition.set_format("numpy") diff --git a/src/py/flwr/cli/new/templates/app/pyproject.hf.toml.tpl b/src/py/flwr/cli/new/templates/app/pyproject.huggingface.toml.tpl similarity index 100% rename from src/py/flwr/cli/new/templates/app/pyproject.hf.toml.tpl rename to src/py/flwr/cli/new/templates/app/pyproject.huggingface.toml.tpl diff --git a/src/py/flwr/cli/run/run.py b/src/py/flwr/cli/run/run.py index 04949d255647..ae49981d765e 100644 --- a/src/py/flwr/cli/run/run.py +++ b/src/py/flwr/cli/run/run.py @@ -37,18 +37,21 @@ def run( app_dir: Annotated[ Path, - typer.Argument(help="Path of the Flower project to run"), + typer.Argument(help="Path of the Flower project to run."), ] = Path("."), - federation_name: Annotated[ + federation: Annotated[ Optional[str], - typer.Argument(help="Name of the federation to run the app on"), + typer.Argument(help="Name of the federation to run the app on."), ] = None, config_overrides: Annotated[ Optional[List[str]], typer.Option( "--run-config", "-c", - help="Override configuration key-value pairs", + help="Override configuration key-value pairs, should be of the format:\n\n" + "`--run-config key1=value1,key2=value2 --run-config key3=value3`\n\n" + "Note that `key1`, `key2`, and `key3` in this example need to exist " + "inside the `pyproject.toml` in order to be properly overriden.", ), ] = None, ) -> None: @@ -78,11 +81,9 @@ def run( typer.secho("Success", fg=typer.colors.GREEN) - federation_name = federation_name or config["tool"]["flwr"]["federations"].get( - "default" - ) + federation = federation or config["tool"]["flwr"]["federations"].get("default") - if federation_name is None: + if federation is None: typer.secho( "❌ No federation name was provided and the project's `pyproject.toml` " "doesn't declare a default federation (with a SuperExec address or an " @@ -93,13 +94,13 @@ def run( raise typer.Exit(code=1) # Validate the federation exists in the configuration - federation = config["tool"]["flwr"]["federations"].get(federation_name) - if federation is None: + federation_config = config["tool"]["flwr"]["federations"].get(federation) + if federation_config is None: available_feds = { fed for fed in config["tool"]["flwr"]["federations"] if fed != "default" } typer.secho( - f"❌ There is no `{federation_name}` federation declared in the " + f"❌ There is no `{federation}` federation declared in " "`pyproject.toml`.\n The following federations were found:\n\n" + "\n".join(available_feds), fg=typer.colors.RED, @@ -107,14 +108,14 @@ def run( ) raise typer.Exit(code=1) - if "address" in federation: - _run_with_superexec(federation, app_dir, config_overrides) + if "address" in federation_config: + _run_with_superexec(federation_config, app_dir, config_overrides) else: - _run_without_superexec(app_dir, federation, federation_name, config_overrides) + _run_without_superexec(app_dir, federation_config, federation, config_overrides) def _run_with_superexec( - federation: Dict[str, Any], + federation_config: Dict[str, Any], app_dir: Optional[Path], config_overrides: Optional[List[str]], ) -> None: @@ -123,8 +124,8 @@ def on_channel_state_change(channel_connectivity: str) -> None: """Log channel connectivity.""" log(DEBUG, channel_connectivity) - insecure_str = federation.get("insecure") - if root_certificates := federation.get("root-certificates"): + insecure_str = federation_config.get("insecure") + if root_certificates := federation_config.get("root-certificates"): root_certificates_bytes = Path(root_certificates).read_bytes() if insecure := bool(insecure_str): typer.secho( @@ -152,7 +153,7 @@ def on_channel_state_change(channel_connectivity: str) -> None: raise typer.Exit(code=1) channel = create_channel( - server_address=federation["address"], + server_address=federation_config["address"], insecure=insecure, root_certificates=root_certificates_bytes, max_message_length=GRPC_MAX_MESSAGE_LENGTH, @@ -168,7 +169,9 @@ def on_channel_state_change(channel_connectivity: str) -> None: override_config=user_config_to_proto( parse_config_args(config_overrides, separator=",") ), - federation_config=user_config_to_proto(flatten_dict(federation.get("options"))), + federation_config=user_config_to_proto( + flatten_dict(federation_config.get("options")) + ), ) res = stub.StartRun(req) typer.secho(f"🎊 Successfully started run {res.run_id}", fg=typer.colors.GREEN) @@ -176,18 +179,18 @@ def on_channel_state_change(channel_connectivity: str) -> None: def _run_without_superexec( app_path: Optional[Path], - federation: Dict[str, Any], - federation_name: str, + federation_config: Dict[str, Any], + federation: str, config_overrides: Optional[List[str]], ) -> None: try: - num_supernodes = federation["options"]["num-supernodes"] + num_supernodes = federation_config["options"]["num-supernodes"] except KeyError as err: typer.secho( "❌ The project's `pyproject.toml` needs to declare the number of" " SuperNodes in the simulation. To simulate 10 SuperNodes," " use the following notation:\n\n" - f"[tool.flwr.federations.{federation_name}]\n" + f"[tool.flwr.federations.{federation}]\n" "options.num-supernodes = 10\n", fg=typer.colors.RED, bold=True, diff --git a/src/py/flwr/client/supernode/app.py b/src/py/flwr/client/supernode/app.py index f3fb0e97805a..2c60f803f960 100644 --- a/src/py/flwr/client/supernode/app.py +++ b/src/py/flwr/client/supernode/app.py @@ -62,8 +62,8 @@ def run_supernode() -> None: root_certificates = _get_certificates(args) load_fn = _get_load_client_app_fn( default_app_ref=getattr(args, "client-app"), - dir_arg=args.dir, - flwr_dir_arg=args.flwr_dir, + project_dir=args.dir, + flwr_dir=args.flwr_dir, multi_app=True, ) authentication_keys = _try_setup_client_authentication(args) @@ -100,7 +100,7 @@ def run_client_app() -> None: root_certificates = _get_certificates(args) load_fn = _get_load_client_app_fn( default_app_ref=getattr(args, "client-app"), - dir_arg=args.dir, + project_dir=args.dir, multi_app=False, ) authentication_keys = _try_setup_client_authentication(args) @@ -176,9 +176,9 @@ def _get_certificates(args: argparse.Namespace) -> Optional[bytes]: def _get_load_client_app_fn( default_app_ref: str, - dir_arg: str, + project_dir: str, multi_app: bool, - flwr_dir_arg: Optional[str] = None, + flwr_dir: Optional[str] = None, ) -> Callable[[str, str], ClientApp]: """Get the load_client_app_fn function. @@ -189,38 +189,21 @@ def _get_load_client_app_fn( If `multi_app` is False, it ignores `fab_id` and `fab_version` and loads a default ClientApp. """ - # Find the Flower directory containing Flower Apps (only for multi-app) - if not multi_app: - flwr_dir = Path("") - else: - if flwr_dir_arg is None: - flwr_dir = get_flwr_dir() - else: - flwr_dir = Path(flwr_dir_arg).absolute() - - inserted_path = None - if not multi_app: log( DEBUG, "Flower SuperNode will load and validate ClientApp `%s`", default_app_ref, ) - # Insert sys.path - dir_path = Path(dir_arg).absolute() - sys.path.insert(0, str(dir_path)) - inserted_path = str(dir_path) - valid, error_msg = validate(default_app_ref) + valid, error_msg = validate(default_app_ref, project_dir=project_dir) if not valid and error_msg: raise LoadClientAppError(error_msg) from None def _load(fab_id: str, fab_version: str) -> ClientApp: + runtime_project_dir = Path(project_dir).absolute() # If multi-app feature is disabled if not multi_app: - # Get sys path to be inserted - dir_path = Path(dir_arg).absolute() - # Set app reference client_app_ref = default_app_ref # If multi-app feature is enabled but the fab id is not specified @@ -231,43 +214,29 @@ def _load(fab_id: str, fab_version: str) -> ClientApp: ) from None log(WARN, "FAB ID is not provided; the default ClientApp will be loaded.") - # Get sys path to be inserted - dir_path = Path(dir_arg).absolute() # Set app reference client_app_ref = default_app_ref # If multi-app feature is enabled else: try: - project_dir = get_project_dir(fab_id, fab_version, flwr_dir) - config = get_project_config(project_dir) + runtime_project_dir = get_project_dir( + fab_id, fab_version, get_flwr_dir(flwr_dir) + ) + config = get_project_config(runtime_project_dir) except Exception as e: raise LoadClientAppError("Failed to load ClientApp") from e - # Get sys path to be inserted - dir_path = Path(project_dir).absolute() - # Set app reference client_app_ref = config["tool"]["flwr"]["app"]["components"]["clientapp"] - # Set sys.path - nonlocal inserted_path - if inserted_path != str(dir_path): - # Remove the previously inserted path - if inserted_path is not None: - sys.path.remove(inserted_path) - # Insert the new path - sys.path.insert(0, str(dir_path)) - - inserted_path = str(dir_path) - # Load ClientApp log( DEBUG, "Loading ClientApp `%s`", client_app_ref, ) - client_app = load_app(client_app_ref, LoadClientAppError, dir_path) + client_app = load_app(client_app_ref, LoadClientAppError, runtime_project_dir) if not isinstance(client_app, ClientApp): raise LoadClientAppError( diff --git a/src/py/flwr/common/object_ref.py b/src/py/flwr/common/object_ref.py index ac52be160c2e..9723c14037a0 100644 --- a/src/py/flwr/common/object_ref.py +++ b/src/py/flwr/common/object_ref.py @@ -33,22 +33,41 @@ """ +_current_sys_path: Optional[str] = None + + def validate( module_attribute_str: str, check_module: bool = True, + project_dir: Optional[Union[str, Path]] = None, ) -> Tuple[bool, Optional[str]]: """Validate object reference. - The object reference string should have the form :. Valid - examples include `client:app` and `project.package.module:wrapper.app`. It must - refer to a module on the PYTHONPATH and the module needs to have the specified - attribute. + Parameters + ---------- + module_attribute_str : str + The reference to the object. It should have the form `:`. + Valid examples include `client:app` and `project.package.module:wrapper.app`. + It must refer to a module on the PYTHONPATH or in the provided `project_dir` + and the module needs to have the specified attribute. + check_module : bool (default: True) + Flag indicating whether to verify the existence of the module and the + specified attribute within it. + project_dir : Optional[Union[str, Path]] (default: None) + The directory containing the module. If None, the current working directory + is used. If `check_module` is True, the `project_dir` will be inserted into + the system path, and the previously inserted `project_dir` will be removed. Returns ------- Tuple[bool, Optional[str]] A boolean indicating whether an object reference is valid and the reason why it might not be. + + Note + ---- + This function will modify `sys.path` by inserting the provided `project_dir` + and removing the previously inserted `project_dir`. """ module_str, _, attributes_str = module_attribute_str.partition(":") if not module_str: @@ -63,6 +82,9 @@ def validate( ) if check_module: + # Set the system path + _set_sys_path(project_dir) + # Load module module = find_spec(module_str) if module and module.origin: @@ -89,18 +111,40 @@ def load_app( # pylint: disable= too-many-branches ) -> Any: """Return the object specified in a module attribute string. - The module/attribute string should have the form :. Valid - examples include `client:app` and `project.package.module:wrapper.app`. It must - refer to a module on the PYTHONPATH, the module needs to have the specified - attribute. + Parameters + ---------- + module_attribute_str : str + The reference to the object. It should have the form `:`. + Valid examples include `client:app` and `project.package.module:wrapper.app`. + It must refer to a module on the PYTHONPATH or in the provided `project_dir` + and the module needs to have the specified attribute. + error_type : Type[Exception] + The type of exception to be raised if the provided `module_attribute_str` is + in an invalid format. + project_dir : Optional[Union[str, Path]], optional (default=None) + The directory containing the module. If None, the current working directory + is used. The `project_dir` will be inserted into the system path, and the + previously inserted `project_dir` will be removed. + + Returns + ------- + Any + The object specified by the module attribute string. + + Note + ---- + This function will modify `sys.path` by inserting the provided `project_dir` + and removing the previously inserted `project_dir`. """ - valid, error_msg = validate(module_attribute_str) + valid, error_msg = validate(module_attribute_str, check_module=False) if not valid and error_msg: raise error_type(error_msg) from None module_str, _, attributes_str = module_attribute_str.partition(":") try: + _set_sys_path(project_dir) + if module_str not in sys.modules: module = importlib.import_module(module_str) # Hack: `tabnet` does not work with `importlib.reload` @@ -116,19 +160,15 @@ def load_app( # pylint: disable= too-many-branches module = sys.modules[module_str] else: module = sys.modules[module_str] + if project_dir is None: - path: Optional[str] = getattr(module, "__file__", None) - if path is not None: - project_dir = str(Path(path).parent) - else: - project_dir = str(Path(project_dir).absolute()) + project_dir = Path.cwd() # Reload cached modules in the project directory - if project_dir is not None: - for m in list(sys.modules.values()): - path = getattr(m, "__file__", None) - if path is not None and path.startswith(project_dir): - importlib.reload(m) + for m in list(sys.modules.values()): + path: Optional[str] = getattr(m, "__file__", None) + if path is not None and path.startswith(str(project_dir)): + importlib.reload(m) except ModuleNotFoundError as err: raise error_type( @@ -140,15 +180,38 @@ def load_app( # pylint: disable= too-many-branches try: for attribute_str in attributes_str.split("."): attribute = getattr(attribute, attribute_str) - except AttributeError: + except AttributeError as err: raise error_type( f"Unable to load attribute {attributes_str} from module {module_str}" f"{OBJECT_REF_HELP_STR}", - ) from None + ) from err return attribute +def _set_sys_path(directory: Optional[Union[str, Path]]) -> None: + """Set the system path.""" + if directory is None: + directory = Path.cwd() + else: + directory = Path(directory).absolute() + + # If the directory has already been added to `sys.path`, return + if str(directory) in sys.path: + return + + # Remove the old path if it exists and is not `""`. + global _current_sys_path # pylint: disable=global-statement + if _current_sys_path is not None: + sys.path.remove(_current_sys_path) + + # Add the new path to sys.path + sys.path.insert(0, str(directory)) + + # Update the current_sys_path + _current_sys_path = str(directory) + + def _find_attribute_in_module(file_path: str, attribute_name: str) -> bool: """Check if attribute_name exists in module's abstract symbolic tree.""" with open(file_path, encoding="utf-8") as file: diff --git a/src/py/flwr/server/run_serverapp.py b/src/py/flwr/server/run_serverapp.py index b6baca0dff54..3f062351e48d 100644 --- a/src/py/flwr/server/run_serverapp.py +++ b/src/py/flwr/server/run_serverapp.py @@ -57,9 +57,6 @@ def run( "but not both." ) - if server_app_dir is not None: - sys.path.insert(0, str(Path(server_app_dir).absolute())) - # Load ServerApp if needed def _load() -> ServerApp: if server_app_attr: diff --git a/src/py/flwr/server/superlink/fleet/vce/backend/__init__.py b/src/py/flwr/server/superlink/fleet/vce/backend/__init__.py index d751cf4bcae1..a8c671810a51 100644 --- a/src/py/flwr/server/superlink/fleet/vce/backend/__init__.py +++ b/src/py/flwr/server/superlink/fleet/vce/backend/__init__.py @@ -38,7 +38,7 @@ To install the necessary dependencies, install `flwr` with the `simulation` extra: - pip install -U flwr["simulation"] + pip install -U "flwr[simulation]" """ diff --git a/src/py/flwr/server/superlink/fleet/vce/vce_api.py b/src/py/flwr/server/superlink/fleet/vce/vce_api.py index 320f839e9e01..f11576d63396 100644 --- a/src/py/flwr/server/superlink/fleet/vce/vce_api.py +++ b/src/py/flwr/server/superlink/fleet/vce/vce_api.py @@ -72,8 +72,8 @@ def _register_node_states( node_states[node_id] = NodeState( node_id=node_id, node_config={ - PARTITION_ID_KEY: str(partition_id), - NUM_PARTITIONS_KEY: str(num_partitions), + PARTITION_ID_KEY: partition_id, + NUM_PARTITIONS_KEY: num_partitions, }, ) @@ -347,8 +347,8 @@ def _load() -> ClientApp: if client_app_attr: app = _get_load_client_app_fn( default_app_ref=client_app_attr, - dir_arg=app_dir, - flwr_dir_arg=flwr_dir, + project_dir=app_dir, + flwr_dir=flwr_dir, multi_app=True, )(run.fab_id, run.fab_version) diff --git a/src/py/flwr/simulation/__init__.py b/src/py/flwr/simulation/__init__.py index 5db90a352e3f..a171347b1507 100644 --- a/src/py/flwr/simulation/__init__.py +++ b/src/py/flwr/simulation/__init__.py @@ -28,7 +28,7 @@ To install the necessary dependencies, install `flwr` with the `simulation` extra: - pip install -U flwr["simulation"] + pip install -U "flwr[simulation]" """ def start_simulation(*args, **kwargs): # type: ignore diff --git a/src/py/flwr/simulation/run_simulation.py b/src/py/flwr/simulation/run_simulation.py index 7cebb90451d6..51799074ef6f 100644 --- a/src/py/flwr/simulation/run_simulation.py +++ b/src/py/flwr/simulation/run_simulation.py @@ -32,7 +32,11 @@ from flwr.common import EventType, event, log from flwr.common.config import get_fused_config_from_dir, parse_config_args from flwr.common.constant import RUN_ID_NUM_BYTES -from flwr.common.logger import set_logger_propagation, update_console_handler +from flwr.common.logger import ( + set_logger_propagation, + update_console_handler, + warn_deprecated_feature_with_example, +) from flwr.common.typing import Run, UserConfig from flwr.server.driver import Driver, InMemoryDriver from flwr.server.run_serverapp import run as run_server_app @@ -93,6 +97,14 @@ def run_simulation_from_cli() -> None: """Run Simulation Engine from the CLI.""" args = _parse_args_run_simulation().parse_args() + if args.enable_tf_gpu_growth: + warn_deprecated_feature_with_example( + "Passing `--enable-tf-gpu-growth` is deprecated.", + example_message="Instead, set the `TF_FORCE_GPU_ALLOW_GROWTH` environmnet " + "variable to true.", + code_example='TF_FORCE_GPU_ALLOW_GROWTH="true" flower-simulation <...>', + ) + # We are supporting two modes for the CLI entrypoint: # 1) Running an app dir containing a `pyproject.toml` # 2) Running any ClientApp and SeverApp w/o pyproject.toml being present @@ -223,6 +235,15 @@ def run_simulation( When disabled, only INFO, WARNING and ERROR log messages will be shown. If enabled, DEBUG-level logs will be displayed. """ + if enable_tf_gpu_growth: + warn_deprecated_feature_with_example( + "Passing `enable_tf_gpu_growth=True` is deprecated.", + example_message="Instead, set the `TF_FORCE_GPU_ALLOW_GROWTH` environmnet " + "variable to true.", + code_example='import os;os.environ["TF_FORCE_GPU_ALLOW_GROWTH"]="true"' + "\n\tflwr.simulation.run_simulationt(...)", + ) + _run_simulation( num_supernodes=num_supernodes, client_app=client_app, @@ -264,7 +285,7 @@ def server_th_with_start_checks( """ try: if tf_gpu_growth: - log(INFO, "Enabling GPU growth for Tensorflow on the main thread.") + log(INFO, "Enabling GPU growth for Tensorflow on the server thread.") enable_gpu_growth() # Run ServerApp @@ -475,6 +496,14 @@ def _run_simulation( if "init_args" not in backend_config: backend_config["init_args"] = {} + # Set default client_resources if not passed + if "client_resources" not in backend_config: + backend_config["client_resources"] = {"num_cpus": 2, "num_gpus": 0} + + # Initialization of backend config to enable GPU growth globally when set + if "actor" not in backend_config: + backend_config["actor"] = {"tensorflow": 0} + # Set logging level logger = logging.getLogger("flwr") if verbose_logging: @@ -580,8 +609,7 @@ def _parse_args_run_simulation() -> argparse.ArgumentParser: parser.add_argument( "--backend-config", type=str, - default='{"client_resources": {"num_cpus":2, "num_gpus":0.0},' - '"actor": {"tensorflow": 0}}', + default="{}", help='A JSON formatted stream, e.g \'{"":, "":}\' to ' "configure a backend. Values supported in are those included by " "`flwr.common.typing.ConfigsRecordValues`. ", diff --git a/src/py/flwr/superexec/app.py b/src/py/flwr/superexec/app.py index 9f1753ce041b..2ad5f12d227f 100644 --- a/src/py/flwr/superexec/app.py +++ b/src/py/flwr/superexec/app.py @@ -93,7 +93,9 @@ def _parse_args_run_superexec() -> argparse.ArgumentParser: ) parser.add_argument( "--executor-config", - help="Key-value pairs for the executor config, separated by commas.", + help="Key-value pairs for the executor config, separated by commas. " + 'For example:\n\n`--executor-config superlink="superlink:9091",' + 'root-certificates="certificates/superlink-ca.crt"`', ) parser.add_argument( "--insecure", @@ -163,11 +165,8 @@ def _load_executor( args: argparse.Namespace, ) -> Executor: """Get the executor plugin.""" - if args.executor_dir is not None: - sys.path.insert(0, args.executor_dir) - executor_ref: str = args.executor - valid, error_msg = validate(executor_ref) + valid, error_msg = validate(executor_ref, project_dir=args.executor_dir) if not valid and error_msg: raise LoadExecutorError(error_msg) from None diff --git a/src/py/flwr/superexec/deployment.py b/src/py/flwr/superexec/deployment.py index bd27d6b21017..fd09b512a52c 100644 --- a/src/py/flwr/superexec/deployment.py +++ b/src/py/flwr/superexec/deployment.py @@ -168,8 +168,6 @@ def start_run( # Execute the command proc = subprocess.Popen( # pylint: disable=consider-using-with command, - stdout=subprocess.PIPE, - stderr=subprocess.PIPE, text=True, ) log(INFO, "Started run %s", str(run_id)) diff --git a/src/py/flwr/superexec/simulation.py b/src/py/flwr/superexec/simulation.py index be49c83be716..d4cc489e24ab 100644 --- a/src/py/flwr/superexec/simulation.py +++ b/src/py/flwr/superexec/simulation.py @@ -32,6 +32,25 @@ from .executor import Executor, RunTracker +def _user_config_to_str(user_config: UserConfig) -> str: + """Convert override user config to string.""" + user_config_list_str = [] + for key, value in user_config.items(): + if isinstance(value, bool): + user_config_list_str.append(f"{key}={str(value).lower()}") + elif isinstance(value, (int, float)): + user_config_list_str.append(f"{key}={value}") + elif isinstance(value, str): + user_config_list_str.append(f'{key}="{value}"') + else: + raise ValueError( + "Only types `bool`, `float`, `int` and `str` are supported" + ) + + user_config_str = ",".join(user_config_list_str) + return user_config_str + + class SimulationEngine(Executor): """Simulation engine executor. @@ -62,13 +81,11 @@ def set_config( - "num-supernodes": int Number of nodes to register for the simulation. """ - if not config: - return if num_supernodes := config.get("num-supernodes"): if not isinstance(num_supernodes, int): raise ValueError("The `num-supernodes` value should be of type `int`.") self.num_supernodes = num_supernodes - else: + elif self.num_supernodes is None: log( ERROR, "To start a run with the simulation plugin, please specify " @@ -88,6 +105,16 @@ def start_run( federation_config: UserConfig, ) -> Optional[RunTracker]: """Start run using the Flower Simulation Engine.""" + if self.num_supernodes is None: + raise ValueError( + "Error in `SuperExec` (`SimulationEngine` executor):\n\n" + "`num-supernodes` must not be `None`, it must be a valid " + "positive integer. In order to start this simulation executor " + "with a specified number of `SuperNodes`, you can either provide " + "a `--executor` that has been initialized with a number of nodes " + "to the `flower-superexec` CLI, or `--executor-config num-supernodes=N`" + "to the `flower-superexec` CLI." + ) try: # Install FAB to flwr dir @@ -129,12 +156,12 @@ def start_run( ] if override_config: - command.extend(["--run-config", f"{override_config}"]) + override_config_str = _user_config_to_str(override_config) + command.extend(["--run-config", f"{override_config_str}"]) # Start Simulation - proc = subprocess.run( # pylint: disable=consider-using-with + proc = subprocess.Popen( # pylint: disable=consider-using-with command, - check=True, text=True, ) @@ -142,7 +169,7 @@ def start_run( return RunTracker( run_id=run_id, - proc=proc, # type:ignore + proc=proc, ) # pylint: disable-next=broad-except