diff --git a/doc/source/how-to-federated-analytics.rst b/doc/source/how-to-federated-analytics.rst new file mode 100644 index 000000000000..f51b82790c16 --- /dev/null +++ b/doc/source/how-to-federated-analytics.rst @@ -0,0 +1,12 @@ +.. _federated-analytics: + + +Federated Analytics +=================== + +.. meta:: + :description: Check out this how-to for using Flower to perform Federated Analytics. + +Let's build a federated analytics system using Flower! + +Please refer to the `full code example `_ to learn more. diff --git a/doc/source/index.rst b/doc/source/index.rst index 5df591d6ce05..1a0cbc22ed83 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -50,7 +50,6 @@ A learning-oriented series of federated learning tutorials, the best place to st tutorial-quickstart-tensorflow tutorial-quickstart-huggingface tutorial-quickstart-jax - tutorial-quickstart-pandas tutorial-quickstart-fastai tutorial-quickstart-pytorch-lightning tutorial-quickstart-mxnet @@ -93,6 +92,7 @@ Problem-oriented how-to guides show step-by-step how to achieve a specific goal. how-to-upgrade-to-flower-1.0 how-to-use-built-in-middleware-layers how-to-run-flower-using-docker + how-to-federated-analytics .. toctree:: :maxdepth: 1 diff --git a/doc/source/tutorial-quickstart-pandas.rst b/doc/source/tutorial-quickstart-pandas.rst deleted file mode 100644 index bb9cb1b28b54..000000000000 --- a/doc/source/tutorial-quickstart-pandas.rst +++ /dev/null @@ -1,12 +0,0 @@ -.. _quickstart-pandas: - - -Quickstart Pandas -================= - -.. meta:: - :description: Check out this Federated Learning quickstart tutorial for using Flower with Pandas to perform Federated Analytics. - -Let's build a federated analytics system using Pandas and Flower! - -Please refer to the `full code example `_ to learn more. diff --git a/examples/federated-analytics/README.md b/examples/federated-analytics/README.md new file mode 100644 index 000000000000..7544a433e709 --- /dev/null +++ b/examples/federated-analytics/README.md @@ -0,0 +1,86 @@ +# Flower Federated Analytics Example + +This introductory Flower example demonstrates a Federated Analytics application. It will help you understand how to adapt Flower to your Federated Analytics use cases through a custom strategy. This example uses [Flower Datasets](https://flower.dev/docs/datasets/) to +download, partition and preprocess the dataset. + +In this example, we use the Iris dataset splitted between two clients. The subset of each client contains only the features sepal length, and sepal width. Then, a federated analytics task is performed to calculated for each client and each feature its 10-bins histogram, then those values are aggregated and a global histogram is obtained for sepal length, and sepal width. + +To learn more about Federated Analytics you can check [this article](https://ai.googleblog.com/2020/05/federated-analytics-collaborative-data.html) by Google. There is also a previous Flower blog post about [this example](https://flower.dev/blog/2023-01-24-federated-analytics-pandas/). + +Running this example in itself is quite easy. + +## Project Setup + +Start by cloning the example project. We prepared a single-line command that you can copy into your shell which will checkout the example for you: + +```shell +$ git clone --depth=1 https://github.com/adap/flower.git _tmp && mv _tmp/examples/federated-analytics . && rm -rf _tmp && cd federated-analytics +``` + +This will create a new directory called `federated-analytics` containing the following files: + +```shell +-- pyproject.toml +-- requirements.txt +-- client.py +-- server.py +-- start.sh +-- README.md +``` + +### Installing Dependencies + +Project dependencies (such as `flwr`) are defined in `pyproject.toml` and `requirements.txt`. We recommend [Poetry](https://python-poetry.org/docs/) to install those dependencies and manage your virtual environment ([Poetry installation](https://python-poetry.org/docs/#installation)) or [pip](https://pip.pypa.io/en/latest/development/), but feel free to use a different way of installing dependencies and managing virtual environments if you have other preferences. + +#### Poetry + +```shell +poetry install +poetry shell +``` + +Poetry will install all your dependencies in a newly created virtual environment. To verify that everything works correctly you can run the following command: + +```shell +poetry run python3 -c "import flwr" +``` + +If you don't see any errors you're good to go! + +#### pip + +Write the command below in your terminal to install the dependencies according to the configuration file requirements.txt. + +```shell +pip install -r requirements.txt +``` + +## Run Federated Analytics with Flower + +After all dependencies installed, you are ready to run this example with the `run.sh` script. + +```shell +$ ./run.sh +``` + +If you don't plan on using the `run.sh` script that automates the run. You can simply start the server in a terminal as follows: + +```shell +$ python3 server.py +``` + +Now you are ready to start the Flower clients which will participate in the learning. To do so simply open two more terminal windows and run the following commands. + +Start client 1 in the first terminal: + +```shell +$ python3 client.py --node-id 0 +``` + +Start client 2 in the second terminal: + +```shell +$ python3 client.py --node-id 1 +``` + +You will see that the server is printing aggregated statistics about the dataset distributed amongst clients. Have a look to the [Flower Quickstarter documentation](https://flower.dev/docs/quickstart-pandas.html) for a detailed explanation. diff --git a/examples/quickstart-pandas/client.py b/examples/federated-analytics/client.py similarity index 74% rename from examples/quickstart-pandas/client.py rename to examples/federated-analytics/client.py index 8585922e4572..02d05253ee7a 100644 --- a/examples/quickstart-pandas/client.py +++ b/examples/federated-analytics/client.py @@ -2,7 +2,6 @@ from typing import Dict, List, Tuple import numpy as np -import pandas as pd import flwr as fl @@ -12,14 +11,14 @@ column_names = ["sepal_length", "sepal_width"] -def compute_hist(df: pd.DataFrame, col_name: str) -> np.ndarray: - freqs, _ = np.histogram(df[col_name]) +def compute_hist(column: List[np.ndarray]) -> np.ndarray: + freqs, _ = np.histogram(column) return freqs # Define Flower client class FlowerClient(fl.client.NumPyClient): - def __init__(self, X: pd.DataFrame): + def __init__(self, X: List[np.ndarray]): self.X = X def fit( @@ -27,12 +26,12 @@ def fit( ) -> Tuple[List[np.ndarray], int, Dict]: hist_list = [] # Execute query locally - for c in self.X.columns: - hist = compute_hist(self.X, c) + for column in range(len(column_names)): + hist = compute_hist(X[column]) hist_list.append(hist) return ( hist_list, - len(self.X), + len(self.X[0]), # get the length of one column {}, ) @@ -54,9 +53,12 @@ def fit( # Load the partition data fds = FederatedDataset(dataset="hitorilabs/iris", partitioners={"train": N_CLIENTS}) - dataset = fds.load_partition(partition_id, "train").with_format("pandas")[:] + dataset = fds.load_partition(partition_id, "train") + + X = [] # Use just the specified columns - X = dataset[column_names] + for column in column_names: + X.append(dataset[column]) # Start Flower client fl.client.start_client( diff --git a/examples/quickstart-pandas/pyproject.toml b/examples/federated-analytics/pyproject.toml similarity index 77% rename from examples/quickstart-pandas/pyproject.toml rename to examples/federated-analytics/pyproject.toml index 6229210d6488..d5de5c474394 100644 --- a/examples/quickstart-pandas/pyproject.toml +++ b/examples/federated-analytics/pyproject.toml @@ -3,9 +3,9 @@ requires = ["poetry-core>=1.4.0"] build-backend = "poetry.core.masonry.api" [tool.poetry] -name = "quickstart-pandas" +name = "federated-analytics" version = "0.1.0" -description = "Pandas Federated Analytics Quickstart with Flower" +description = "Federated Analytics with Flower" authors = ["Ragy Haddad "] maintainers = ["The Flower Authors "] @@ -14,4 +14,3 @@ python = ">=3.8,<3.11" flwr = ">=1.0,<2.0" flwr-datasets = { extras = ["vision"], version = ">=0.0.2,<1.0.0" } numpy = "1.23.2" -pandas = "2.0.0" diff --git a/examples/quickstart-pandas/requirements.txt b/examples/federated-analytics/requirements.txt similarity index 82% rename from examples/quickstart-pandas/requirements.txt rename to examples/federated-analytics/requirements.txt index d44a3c6adab9..2e462c87c4c6 100644 --- a/examples/quickstart-pandas/requirements.txt +++ b/examples/federated-analytics/requirements.txt @@ -1,4 +1,3 @@ flwr>=1.0, <2.0 flwr-datasets[vision]>=0.0.2, <1.0.0 numpy==1.23.2 -pandas==2.0.0 diff --git a/examples/quickstart-pandas/run.sh b/examples/federated-analytics/run.sh similarity index 100% rename from examples/quickstart-pandas/run.sh rename to examples/federated-analytics/run.sh diff --git a/examples/quickstart-pandas/server.py b/examples/federated-analytics/server.py similarity index 100% rename from examples/quickstart-pandas/server.py rename to examples/federated-analytics/server.py diff --git a/examples/quickstart-pandas/README.md b/examples/quickstart-pandas/README.md deleted file mode 100644 index a25e6ea6ee36..000000000000 --- a/examples/quickstart-pandas/README.md +++ /dev/null @@ -1,82 +0,0 @@ -# Flower Example using Pandas - -This introductory example to Flower uses Pandas, but deep knowledge of Pandas is not necessarily required to run the example. However, it will help you understand how to adapt Flower to your use case. This example uses [Flower Datasets](https://flower.dev/docs/datasets/) to -download, partition and preprocess the dataset. -Running this example in itself is quite easy. - -## Project Setup - -Start by cloning the example project. We prepared a single-line command that you can copy into your shell which will checkout the example for you: - -```shell -$ git clone --depth=1 https://github.com/adap/flower.git _tmp && mv _tmp/examples/quickstart-pandas . && rm -rf _tmp && cd quickstart-pandas -``` - -This will create a new directory called `quickstart-pandas` containing the following files: - -```shell --- pyproject.toml --- requirements.txt --- client.py --- server.py --- start.sh --- README.md -``` - -If you don't plan on using the `run.sh` script that automates the run, you should first download the data and put it in a `data` folder, this can be done by executing: - -```shell -$ mkdir -p ./data -$ python -c "from sklearn.datasets import load_iris; load_iris(as_frame=True)['data'].to_csv('./data/client.csv')" -``` - -### Installing Dependencies - -Project dependencies (such as `pandas` and `flwr`) are defined in `pyproject.toml` and `requirements.txt`. We recommend [Poetry](https://python-poetry.org/docs/) to install those dependencies and manage your virtual environment ([Poetry installation](https://python-poetry.org/docs/#installation)) or [pip](https://pip.pypa.io/en/latest/development/), but feel free to use a different way of installing dependencies and managing virtual environments if you have other preferences. - -#### Poetry - -```shell -poetry install -poetry shell -``` - -Poetry will install all your dependencies in a newly created virtual environment. To verify that everything works correctly you can run the following command: - -```shell -poetry run python3 -c "import flwr" -``` - -If you don't see any errors you're good to go! - -#### pip - -Write the command below in your terminal to install the dependencies according to the configuration file requirements.txt. - -```shell -pip install -r requirements.txt -``` - -## Run Federated Analytics with Pandas and Flower - -Afterwards you are ready to start the Flower server as well as the clients. You can simply start the server in a terminal as follows: - -```shell -$ python3 server.py -``` - -Now you are ready to start the Flower clients which will participate in the learning. To do so simply open two more terminal windows and run the following commands. - -Start client 1 in the first terminal: - -```shell -$ python3 client.py --node-id 0 -``` - -Start client 2 in the second terminal: - -```shell -$ python3 client.py --node-id 1 -``` - -You will see that the server is printing aggregated statistics about the dataset distributed amongst clients. Have a look to the [Flower Quickstarter documentation](https://flower.dev/docs/quickstart-pandas.html) for a detailed explanation.