Skip to content

Commit

Permalink
Merge branch 'main' into add-ci-e2e-test-containers
Browse files Browse the repository at this point in the history
  • Loading branch information
chongshenng committed Jun 25, 2024
2 parents 5d30d3c + f4ce64c commit 28a9470
Show file tree
Hide file tree
Showing 181 changed files with 11,920 additions and 7,718 deletions.
3 changes: 3 additions & 0 deletions .github/workflows/datasets.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@ concurrency:
group: ${{ github.workflow }}-${{ github.ref == 'refs/heads/main' && github.run_id || github.event.pull_request.number || github.ref }}
cancel-in-progress: true

env:
FLWR_TELEMETRY_ENABLED: 0

defaults:
run:
working-directory: datasets
Expand Down
3 changes: 3 additions & 0 deletions .github/workflows/e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,9 @@ jobs:
- name: Run driver test with client authentication
if: ${{ matrix.directory == 'bare-client-auth' }}
run: ./../test_driver.sh bare client-auth
- name: Run reconnection test with SQLite database
if: ${{ matrix.directory == 'bare' }}
run: ./../test_reconnection.sh sqlite
- name: Cache save Python location
id: cache-save-python
uses: actions/cache/save@v4
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/framework.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ jobs:

steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Bootstrap
uses: ./.github/actions/bootstrap
with:
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,7 @@ Other [examples](https://github.com/adap/flower/tree/main/examples):
- [Flower with KaplanMeierFitter from the lifelines library](https://github.com/adap/flower/tree/main/examples/federated-kaplan-meier-fitter)
- [Sample Level Privacy with Opacus](https://github.com/adap/flower/tree/main/examples/opacus)
- [Sample Level Privacy with TensorFlow-Privacy](https://github.com/adap/flower/tree/main/examples/tensorflow-privacy)
- [Flower with a Tabular Dataset](https://github.com/adap/flower/tree/main/examples/fl-tabular)

## Community

Expand Down
61 changes: 61 additions & 0 deletions benchmarks/flowertune-llm/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
![](_static/flower_llm.jpg)

# FlowerTune LLM Leaderboard

This repository guides you through the process of federated LLM instruction tuning with a
pre-trained [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.3) model across 4 domains --- general NLP, finance, medical and code.

Please follow the instructions to run and evaluate the federated LLMs.

## Create a new project

As the first step, please register a Flower account on [Flower website](https://flower.ai/login).
Assuming `flwr` package is already installed on your system (check [here](https://flower.ai/docs/framework/how-to-install-flower.html) for `flwr` installation).
We provide a single-line command to create a new project directory based on your selected challenge:

```shell
flwr new --framework=flwrtune --username=your_flower_account
```

Then you will see a prompt to ask your project name and the choice of LLM challenges from the set of general NLP, finance, medical and code.
Type your project name and select your preferred challenge,
and then a new project directory will be generated automatically.

### Structure

After running `flwr new`, you will see a new directory generated with the following structure:

```bash
<project-name>
├── README.md # <- Instructions
├── pyproject.toml # <- Environment dependencies
└── <project_name>
├── app.py # <- Flower ClientApp/ServerApp build
├── client.py # <- Flower client constructor
├── server.py # <- Sever-related functions
├── models.py # <- Model build
├── dataset.py # <- Dataset and tokenizer build
├── conf/config.yaml # <- User configuration
└── conf/static_config.yaml # <- Static configuration
```

This can serve as the starting point for you to build up your own federated LLM fine-tuning methods.
Please note that any modification to the content of `conf/static_config.yaml` is strictly prohibited for those who wish to participate in the [LLM Leaderboard](https://flower.ai/benchmarks/llm-leaderboard).
Otherwise, the submission will not be considered.

## Run FlowerTune LLM challenges

With a new project directory created, running a baseline challenge can be done by:

1. Navigate inside the directory that you just created.


2. Follow the `Environments setup` section of `README.md` in the project directory to install project dependencies.


3. Run the challenge as indicated in the `Running the challenge` section in the `README.md`.

## Evaluate pre-trained LLMs

After the LLM fine-tuning finished, evaluate the performance of your pre-trained LLMs
following the `README.md` in `evaluation` directory.
Binary file added benchmarks/flowertune-llm/_static/flower_llm.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
72 changes: 36 additions & 36 deletions datasets/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,21 @@
[![Slack](https://img.shields.io/badge/Chat-Slack-red)](https://flower.ai/join-slack)

Flower Datasets (`flwr-datasets`) is a library to quickly and easily create datasets for federated learning, federated evaluation, and federated analytics. It was created by the `Flower Labs` team that also created Flower: A Friendly Federated Learning Framework.


> [!TIP]
> For complete documentation that includes API docs, how-to guides and tutorials please visit the [Flower Datasets Documentation](https://flower.ai/docs/datasets/) and for full FL example see the [Flower Examples page](https://github.com/adap/flower/tree/main/examples).
## Installation

For a complete installation guide visit the [Flower Datasets Documenation](https://flower.ai/docs/datasets/)

```bash
pip install flwr-datasets[vision]
```

## Overview

Flower Datasets library supports:
* **downloading datasets** - choose the dataset from Hugging Face's `datasets`,
* **partitioning datasets** - customize the partitioning scheme,
Expand All @@ -21,43 +36,28 @@ Thanks to using Hugging Face's `datasets` used under the hood, Flower Datasets i
* Jax,
* Arrow.

Create **custom partitioning schemes** or choose from the **implemented partitioning schemes**:
Create **custom partitioning schemes** or choose from the **implemented [partitioning schemes](https://flower.ai/docs/datasets/ref-api/flwr_datasets.partitioner.html#module-flwr_datasets.partitioner)**:

* Partitioner (the abstract base class) `Partitioner`
* IID partitioning `IidPartitioner(num_partitions)`
* Natural ID partitioner `NaturalIdPartitioner`
* Dirichlet partitioning `DirichletPartitioner(num_partitions, partition_by, alpha)`
* InnerDirichlet partitioning `InnerDirichletPartitioner(partition_sizes, partition_by, alpha)`
* Natural ID partitioner `NaturalIdPartitioner(partition_by)`
* Size partitioner (the abstract base class for the partitioners dictating the division based the number of samples) `SizePartitioner`
* Linear partitioner `LinearPartitioner`
* Square partitioner `SquarePartitioner`
* Exponential partitioner `ExponentialPartitioner`
* more to come in future releases.

# Installation

## With pip
* Linear partitioner `LinearPartitioner(num_partitions)`
* Square partitioner `SquarePartitioner(num_partitions)`
* Exponential partitioner `ExponentialPartitioner(num_partitions)`
* more to come in the future releases (contributions are welcome).
<p align="center">
<img src="./doc/source/_static/readme/comparison_of_partitioning_schemes.png" alt="Comparison of partitioning schemes."/>
<br>
<em>Comparison of Partitioning Schemes on CIFAR10</em>
</p>

Flower Datasets can be installed from PyPi

```bash
pip install flwr-datasets
```

Install with an extension:

* for image datasets:

```bash
pip install flwr-datasets[vision]
```

* for audio datasets:

```bash
pip install flwr-datasets[audio]
```
PS: This plot was generated using a library function (see [flwr_datasets.visualization](https://flower.ai/docs/datasets/ref-api/flwr_datasets.visualization.html) package for more).

If you plan to change the type of the dataset to run the code with your ML framework, make sure to have it installed too.

# Usage
## Usage

Flower Datasets exposes the `FederatedDataset` abstraction to represent the dataset needed for federated learning/evaluation/analytics. It has two powerful methods that let you handle the dataset preprocessing: `load_partition(partition_id, split)` and `load_split(split)`.

Expand All @@ -67,16 +67,16 @@ Here's a basic quickstart example of how to partition the MNIST dataset:
from flwr_datasets import FederatedDataset
# The train split of the MNIST dataset will be partitioned into 100 partitions
mnist_fds = FederatedDataset("mnist", partitioners={"train": 100}
fds = FederatedDataset("mnist", partitioners={"train": 100})
mnist_partition_0 = mnist_fds.load_partition(0, "train")
partition = fds.load_partition(0)
centralized_data = mnist_fds.load_split("test")
centralized_data = fds.load_split("test")
```

For more details, please refer to the specific how-to guides or tutorial. They showcase customization and more advanced features.

# Future release
## Future release

Here are a few of the things that we will work on in future releases:

Expand All @@ -85,6 +85,6 @@ Here are a few of the things that we will work on in future releases:
* ✅ More out-of-the-box `Partitioner`s.
* ✅ Passing `Partitioner`s via `FederatedDataset`'s `partitioners` argument.
* ✅ Customization of the dataset splitting before the partitioning.
* Simplification of the dataset transformation to the popular frameworks/types.
* Simplification of the dataset transformation to the popular frameworks/types.
* Creation of the synthetic data,
* Support for Vertical FL.
11 changes: 9 additions & 2 deletions datasets/dev/format.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,16 @@ set -e
cd "$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"/../

# Python
echo "Formatting started"
echo "Formatting started: Python"
python -m isort flwr_datasets/
python -m black -q flwr_datasets/
python -m docformatter -i -r flwr_datasets/
python -m ruff check --fix flwr_datasets/
echo "Formatting done"
echo "Formatting done: Python"

# Notebooks
echo "Formatting started: Notebooks"
python -m black --ipynb -q doc/source/*.ipynb
KEYS="metadata.celltoolbar metadata.language_info metadata.toc metadata.notify_time metadata.varInspector metadata.accelerator metadata.vscode cell.metadata.id cell.metadata.heading_collapsed cell.metadata.hidden cell.metadata.code_folding cell.metadata.tags cell.metadata.init_cell cell.metadata.vscode cell.metadata.pycharm"
python -m nbstripout --keep-output doc/source/*.ipynb --extra-keys "$KEYS"
echo "Formatting done: Notebooks"
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion datasets/doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
author = "The Flower Authors"

# The full version, including alpha/beta/rc tags
release = "0.1.0"
release = "0.2.0"


# -- General configuration ---------------------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion datasets/doc/source/how-to-install-flwr-datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,5 +42,5 @@ If everything worked, it should print the version of Flower Datasets to the comm

.. code-block:: none
0.0.1
0.2.0
1,122 changes: 1,122 additions & 0 deletions datasets/doc/source/how-to-visualize-label-distribution.ipynb

Large diffs are not rendered by default.

56 changes: 34 additions & 22 deletions datasets/doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,15 @@ learning/analytics/evaluation. It is created by the ``Flower Labs`` team that al
Flower Datasets Framework
-------------------------

Install
~~~~~~~

.. code-block:: bash
python -m pip install "flwr-datasets[vision]"
Check out all the details on how to install Flower Datasets in :doc:`how-to-install-flwr-datasets`.

Tutorials
~~~~~~~~~

Expand All @@ -32,6 +41,7 @@ Problem-oriented how-to guides show step-by-step how to achieve a specific goal.
how-to-use-with-tensorflow
how-to-use-with-numpy
how-to-use-with-local-data
how-to-visualize-label-distribution
how-to-disable-enable-progress-bar

References
Expand All @@ -47,15 +57,26 @@ Information-oriented API reference and other reference material.

flwr_datasets

.. toctree::
:maxdepth: 1
:caption: Reference docs

ref-telemetry

Main features
-------------
Flower Datasets library supports:

- **downloading datasets** - choose the dataset from Hugging Face's ``dataset``
- **partitioning datasets** - customize the partitioning scheme
- **downloading datasets** - choose the dataset from Hugging Face's ``dataset`` (`link <https://huggingface.co/datasets>`_)
- **partitioning datasets** - choose one of the implemented partitioning scheme or create your own.
- **creating centralized datasets** - leave parts of the dataset unpartitioned (e.g. for centralized evaluation)
- **visualization of the partitioned datasets** - visualize the label distribution of the partitioned dataset (and compare the results on different parameters of the same partitioning schemes, different datasets, different partitioning schemes, or any mix of them)


.. image:: ./_static/readme/comparison_of_partitioning_schemes.png
:align: center
:alt: Comparison of Partitioning Schemes on CIFAR10


Thanks to using Hugging Face's ``datasets`` used under the hood, Flower Datasets integrates with the following popular formats/frameworks:

Expand All @@ -67,28 +88,19 @@ Thanks to using Hugging Face's ``datasets`` used under the hood, Flower Datasets
- Jax
- Arrow

Install
-------

The simplest install is

.. code-block:: bash
python -m pip install flwr-datasets
If you plan to use the image datasets

.. code-block:: bash
python -m pip install flwr-datasets[vision]
If you plan to use the audio datasets

.. code-block:: bash
Here are a few of the ``Partitioner`` s that are available: (for a full list see `link <ref-api/flwr_datasets.partitioner.html#module-flwr_datasets.partitioner>`_ )

python -m pip install flwr-datasets[audio]
* Partitioner (the abstract base class) ``Partitioner``
* IID partitioning ``IidPartitioner(num_partitions)``
* Dirichlet partitioning ``DirichletPartitioner(num_partitions, partition_by, alpha)``
* InnerDirichlet partitioning ``InnerDirichletPartitioner(partition_sizes, partition_by, alpha)``
* Natural ID partitioner ``NaturalIdPartitioner(partition_by)``
* Size partitioner (the abstract base class for the partitioners dictating the division based the number of samples) ``SizePartitioner``
* Linear partitioner ``LinearPartitioner(num_partitions)``
* Square partitioner ``SquarePartitioner(num_partitions)``
* Exponential partitioner ``ExponentialPartitioner(num_partitions)``
* more to come in the future releases (contributions are welcome).

Check out the full details on the download in :doc:`how-to-install-flwr-datasets`.

How To Use the library
----------------------
Expand Down
Loading

0 comments on commit 28a9470

Please sign in to comment.