Skip to content

Commit

Permalink
refactor(examples) Update XGBoost comprehensive example (#4234)
Browse files Browse the repository at this point in the history
Co-authored-by: jafermarq <[email protected]>
  • Loading branch information
yan-gao-GY and jafermarq authored Oct 6, 2024
1 parent f696628 commit 849ab1d
Show file tree
Hide file tree
Showing 15 changed files with 461 additions and 859 deletions.
137 changes: 47 additions & 90 deletions examples/xgboost-comprehensive/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,23 +4,21 @@ dataset: [HIGGS]
framework: [xgboost]
---

# Flower Example using XGBoost (Comprehensive)
# Federated Learning with XGBoost and Flower (Comprehensive Example)

This example demonstrates a comprehensive federated learning setup using Flower with XGBoost.
We use [HIGGS](https://archive.ics.uci.edu/dataset/280/higgs) dataset to perform a binary classification task. This examples uses [Flower Datasets](https://flower.ai/docs/datasets/) to retrieve, partition and preprocess the data for each Flower client.
It differs from the [xgboost-quickstart](https://github.com/adap/flower/tree/main/examples/xgboost-quickstart) example in the following ways:

- Arguments parsers of server and clients for hyperparameters selection.
- Customised FL settings.
- Customised number of partitions.
- Customised partitioner type (uniform, linear, square, exponential).
- Centralised/distributed evaluation.
- Bagging/cyclic training methods.
- You can run it with Flower Simulation
- Support of scaled learning rate.

## Training Strategies

This example provides two training strategies, **bagging aggregation** and **cyclic training**.
This example provides two training strategies, [**bagging aggregation**](https://flower.ai/docs/framework/tutorial-quickstart-xgboost.html#tree-based-bagging-aggregation) ([docs](https://flower.ai/docs/framework/ref-api/flwr.server.strategy.FedXgbBagging.html)) and [**cyclic training**](https://flower.ai/docs/framework/tutorial-quickstart-xgboost.html#cyclic_training) ([docs](https://flower.ai/docs/framework/ref-api/flwr.server.strategy.FedXgbCyclic.html)).

### Bagging Aggregation

Expand All @@ -43,127 +41,86 @@ Instead of aggregating multiple clients,
there is only one single client participating in the training per round in the cyclic training scenario.
The trained local XGBoost trees will be passed to the next client as an initialised model for next round's boosting.

## Project Setup
## Set up the project

Start by cloning the example project. We prepared a single-line command that you can copy into your shell which will checkout the example for you:
### Clone the project

```shell
git clone --depth=1 https://github.com/adap/flower.git && mv flower/examples/xgboost-comprehensive . && rm -rf flower && cd xgboost-comprehensive
```

This will create a new directory called `xgboost-comprehensive` containing the following files:

```
-- README.md <- Your're reading this right now
-- server.py <- Defines the server-side logic
-- client.py <- Defines the client-side logic
-- dataset.py <- Defines the functions of data loading and partitioning
-- utils.py <- Defines the arguments parser and hyper-parameters
-- client_utils.py <- Defines the client utility functions
-- server_utils.py <- Defines the server utility functions
-- sim.py <- Example of using Flower simulation
-- run_bagging.sh <- Commands to run bagging experiments
-- run_cyclic.sh <- Commands to run cyclic experiments
-- pyproject.toml <- Example dependencies (if you use Poetry)
-- requirements.txt <- Example dependencies
```

### Installing Dependencies

Project dependencies (such as `xgboost` and `flwr`) are defined in `pyproject.toml` and `requirements.txt`. We recommend [Poetry](https://python-poetry.org/docs/) to install those dependencies and manage your virtual environment ([Poetry installation](https://python-poetry.org/docs/#installation)) or [pip](https://pip.pypa.io/en/latest/development/), but feel free to use a different way of installing dependencies and managing virtual environments if you have other preferences.

#### Poetry
Start by cloning the example project:

```shell
poetry install
poetry shell
git clone --depth=1 https://github.com/adap/flower.git _tmp \
&& mv _tmp/examples/xgboost-comprehensive . \
&& rm -rf _tmp \
&& cd xgboost-comprehensive
```

Poetry will install all your dependencies in a newly created virtual environment. To verify that everything works correctly you can run the following command:

```shell
poetry run python -c "import flwr"
```

If you don't see any errors you're good to go!

#### pip

Write the command below in your terminal to install the dependencies according to the configuration file requirements.txt.
This will create a new directory called `xgboost-comprehensive` with the following structure:

```shell
pip install -r requirements.txt
xgboost-comprehensive
├── xgboost_comprehensive
│ ├── __init__.py
│ ├── client_app.py # Defines your ClientApp
│ ├── server_app.py # Defines your ServerApp
│ └── task.py # Defines your model, training and data loading
├── pyproject.toml # Project metadata like dependencies and configs
└── README.md
```

## Run Federated Learning with XGBoost and Flower
### Install dependencies and project

You can run this example in two ways: either by manually launching the server, and then several clients that connect to it; or by launching a Flower simulation. Both run the same workload, yielding identical results. The former is ideal for deployments on different machines, while the latter makes it easy to simulate large client cohorts in a resource-aware manner. You can read more about how Flower Simulation works in the [Documentation](https://flower.ai/docs/framework/how-to-run-simulations.html). The commands shown below assume you have activated your environment (if you decide to use Poetry, you can activate it via `poetry shell`).
Install the dependencies defined in `pyproject.toml` as well as the `xgboost_comprehensive` package.

### Independent Client/Server Setup

We have two scripts to run bagging and cyclic (client-by-client) experiments.
The included `run_bagging.sh` or `run_cyclic.sh` will start the Flower server (using `server.py`),
sleep for 15 seconds to ensure that the server is up,
and then start 5 Flower clients (using `client.py`) with a small subset of the data from exponential partition distribution.

You can simply start everything in a terminal as follows:

```shell
./run_bagging.sh
```bash
pip install -e .
```

Or

```shell
./run_cyclic.sh
```
## Run the project

The script starts processes in the background so that you don't have to open six terminal windows.
You can run your Flower project in both _simulation_ and _deployment_ mode without making changes to the code. If you are starting with Flower, we recommend you using the _simulation_ mode as it requires fewer components to be launched manually. By default, `flwr run` will make use of the Simulation Engine.

You can also run the example without the scripts. First, launch the server:
### Run with the Simulation Engine

```bash
python server.py --train-method=bagging/cyclic --pool-size=N --num-clients-per-round=N
flwr run .
```

Then run at least two clients (each on a new terminal or computer in your network) passing different `PARTITION_ID` and all using the same `N` (denoting the total number of clients or data partitions):
You can also override some of the settings for your `ClientApp` and `ServerApp` defined in `pyproject.toml`. For example:

```bash
python client.py --train-method=bagging/cyclic --partition-id=PARTITION_ID --num-partitions=N
```

### Flower Simulation Setup

We also provide an example code (`sim.py`) to use the simulation capabilities of Flower to simulate federated XGBoost training on either a single machine or a cluster of machines. With default arguments, each client will use 2 CPUs.

To run bagging aggregation with 5 clients for 30 rounds evaluated on centralised test set:
# To run bagging aggregation for 5 rounds evaluated on centralised test set
flwr run . --run-config "train-method='bagging' num-server-rounds=5 centralised-eval=true"

```shell
python sim.py --train-method=bagging --pool-size=5 --num-clients-per-round=5 --num-rounds=30 --centralised-eval
# To run cyclic training with linear partitioner type evaluated on centralised test set:
flwr run . --run-config "train-method='cyclic' partitioner-type='linear' centralised-eval-client=true"
```

To run cyclic training with 5 clients for 30 rounds evaluated on centralised test set:
> \[!TIP\]
> For a more detailed walk-through check our [XGBoost tutorial](https://flower.ai/docs/framework/tutorial-quickstart-xgboost.html).
> To extend the aggregation strategy for saving, logging, or other functions, please refer to our [advanced-pytorch](https://github.com/adap/flower/tree/main/examples/advanced-pytorch) example.
```shell
python sim.py --train-method=cyclic --pool-size=5 --num-rounds=30 --centralised-eval-client
```
### Run with the Deployment Engine

In addition, we provide more options to customise the experimental settings, including data partitioning and centralised/distributed evaluation (see `utils.py`).
Check the [tutorial](https://flower.ai/docs/framework/tutorial-quickstart-xgboost.html) for a detailed explanation.
> \[!NOTE\]
> An update to this example will show how to run this Flower application with the Deployment Engine and TLS certificates, or with Docker.
### Expected Experimental Results
## Expected Experimental Results

#### Bagging aggregation experiment
### Bagging aggregation experiment

![](_static/xgboost_flower_auc_bagging.png)
<div style="text-align: center;">
<img src="_static/xgboost_flower_auc_bagging.png" alt="XGBoost with Flower and Bagging strategy" width="700"/>
</div>

The figure above shows the centralised tested AUC performance over FL rounds with bagging aggregation strategy on 4 experimental settings.
One can see that all settings obtain stable performance boost over FL rounds (especially noticeable at the start of training).
As expected, uniform client distribution shows higher AUC values than square/exponential setup.

#### Cyclic training experiment
### Cyclic training experiment

![](_static/xgboost_flower_auc_cyclic.png)
<div style="text-align: center;">
<img src="_static/xgboost_flower_auc_cyclic.png" alt="XGBoost with Flower and Cyclic strategy" width="700"/>
</div>

This figure shows the cyclic training results on centralised test set.
The models with cyclic training requires more rounds to converge
Expand Down
81 changes: 0 additions & 81 deletions examples/xgboost-comprehensive/client.py

This file was deleted.

74 changes: 0 additions & 74 deletions examples/xgboost-comprehensive/dataset.py

This file was deleted.

Loading

0 comments on commit 849ab1d

Please sign in to comment.