Skip to content

Commit

Permalink
Merge branch 'main' into migrate-opacus-example
Browse files Browse the repository at this point in the history
  • Loading branch information
mohammadnaseri authored Oct 2, 2024
2 parents da4f4ab + fdfae1b commit daf2e2f
Show file tree
Hide file tree
Showing 18 changed files with 404 additions and 131 deletions.
58 changes: 58 additions & 0 deletions .github/workflows/e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,64 @@ jobs:
short_sha: ${{ steps.upload.outputs.SHORT_SHA }}
dir: ${{ steps.upload.outputs.DIR }}

superexec:
runs-on: ubuntu-22.04
timeout-minutes: 10
needs: wheel
strategy:
fail-fast: false
matrix:
python-version: ["3.9", "3.10", "3.11"]
directory: [e2e-bare-auth]
connection: [secure, insecure]
engine: [deployment-engine, simulation-engine]
authentication: [no-auth, client-auth]
exclude:
- connection: insecure
authentication: client-auth
name: |
SuperExec /
Python ${{ matrix.python-version }} /
${{ matrix.connection }} /
${{ matrix.authentication }} /
${{ matrix.engine }}
defaults:
run:
working-directory: e2e/${{ matrix.directory }}
steps:
- uses: actions/checkout@v4
- name: Bootstrap
uses: ./.github/actions/bootstrap
with:
python-version: ${{ matrix.python-version }}
poetry-skip: 'true'
- name: Install Flower from repo
if: ${{ github.repository != 'adap/flower' || github.event.pull_request.head.repo.fork || github.actor == 'dependabot[bot]' }}
working-directory: ./
run: |
if [[ "${{ matrix.engine }}" == "simulation-engine" ]]; then
python -m pip install ".[simulation]"
else
python -m pip install .
fi
- name: Download and install Flower wheel from artifact store
if: ${{ github.repository == 'adap/flower' && !github.event.pull_request.head.repo.fork && github.actor != 'dependabot[bot]' }}
run: |
# Define base URL for wheel file
WHEEL_URL="https://${{ env.ARTIFACT_BUCKET }}/py/${{ needs.wheel.outputs.dir }}/${{ needs.wheel.outputs.short_sha }}/${{ needs.wheel.outputs.whl_path }}"
if [[ "${{ matrix.engine }}" == "simulation-engine" ]]; then
python -m pip install "flwr[simulation] @ ${WHEEL_URL}"
else
python -m pip install "${WHEEL_URL}"
fi
- name: >
Run SuperExec test /
${{ matrix.connection }} /
${{ matrix.authentication }} /
${{ matrix.engine }}
working-directory: e2e/${{ matrix.directory }}
run: ./../test_superexec.sh "${{ matrix.connection }}" "${{ matrix.authentication}}" "${{ matrix.engine }}"
frameworks:
runs-on: ubuntu-22.04
timeout-minutes: 10
Expand Down
6 changes: 3 additions & 3 deletions benchmarks/flowertune-llm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,13 @@ As the first step, please register for a Flower account on [flower.ai/login](htt
Then, create a new Python environment and install Flower.

> [!TIP]
> We recommend using `pyenv` with the `virtualenv` plugin to create your environment. Other managers, such as Conda, will likely work as well. Check the [documentation](https://flower.ai/docs/framework/how-to-install-flower.html) for alternative ways to install Flower.
> We recommend using `pyenv` with the `virtualenv` plugin to create your environment with Python >= 3.10.0. Other managers, such as Conda, will likely work as well. Check the [documentation](https://flower.ai/docs/framework/how-to-install-flower.html) for alternative ways to install Flower.
```shell
pip install flwr
```

In the new environment, create a new Flower project using the `FlowerTune` template. You will be prompted for a name to give to your project, your username, and for your choice of LLM challenge:
In the new environment, create a new Flower project using the `FlowerTune` template. You will be prompted for a name to give to your app/project, your username, and for your choice of LLM challenge:
```shell
flwr new --framework=FlowerTune
```
Expand Down Expand Up @@ -64,5 +64,5 @@ following the `README.md` in [`evaluation`](https://github.com/adap/flower/tree/


> [!NOTE]
> If you have any questions about running FlowerTune LLM challenges or evaluation, please feel free to make posts at [Flower Discuss](https://discuss.flower.ai) forum,
> If you have any questions about running FlowerTune LLM challenges or evaluation, please feel free to make posts at our dedicated [FlowerTune Category](https://discuss.flower.ai/c/flowertune-llm-leaderboard/) on [Flower Discuss](https://discuss.flower.ai) forum,
or join our [Slack channel](https://flower.ai/join-slack/) to ask questions in the `#flowertune-llm-leaderboard` channel.
23 changes: 14 additions & 9 deletions datasets/doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,7 @@ Flower Datasets

Flower Datasets (``flwr-datasets``) is a library that enables the quick and easy creation of datasets for federated learning/analytics/evaluation. It enables heterogeneity (non-iidness) simulation and division of datasets with the preexisting notion of IDs. The library was created by the ``Flower Labs`` team that also created `Flower <https://flower.ai>`_ : A Friendly Federated Learning Framework.

.. raw:: html

<script
type="module"
src="https://gradio.s3-us-west-2.amazonaws.com/4.44.0/gradio.js"
></script>

<gradio-app src="https://flwrlabs-federated-learning-datasets-by-flwr-datasets.hf.space"></gradio-app>
Try out an interactive demo to generate code and visualize heterogeneous divisions at the :ref:`bottom of this page<demo>`.

Flower Datasets Framework
-------------------------
Expand Down Expand Up @@ -142,7 +135,6 @@ What makes Flower Datasets stand out from other libraries?

* New custom partitioning schemes (``Partitioner`` subclasses) integrated with the whole ecosystem.


Join the Flower Community
-------------------------

Expand All @@ -153,3 +145,16 @@ The Flower Community is growing quickly - we're a friendly group of researchers,
:shadow:

Join us on Slack

.. _demo:
Demo
----

.. raw:: html

<script
type="module"
src="https://gradio.s3-us-west-2.amazonaws.com/4.44.0/gradio.js"
></script>

<gradio-app src="https://flwrlabs-federated-learning-datasets-by-flwr-datasets.hf.space"></gradio-app>
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ def _determine_partition_id_to_unique_labels(self) -> None:
if self._class_assignment_mode == "first-deterministic":
# if self._first_class_deterministic_assignment:
for partition_id in range(self._num_partitions):
label = partition_id % num_unique_classes
label = self._unique_labels[partition_id % num_unique_classes]
self._partition_id_to_unique_labels[partition_id].append(label)

while (
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,18 @@
import unittest

import numpy as np
from parameterized import parameterized
from parameterized import parameterized, parameterized_class

import datasets
from datasets import Dataset
from flwr_datasets.partitioner.pathological_partitioner import PathologicalPartitioner


def _dummy_dataset_setup(
num_samples: int, partition_by: str, num_unique_classes: int
num_samples: int,
partition_by: str,
num_unique_classes: int,
string_partition_by: bool = False,
) -> Dataset:
"""Create a dummy dataset for testing."""
data = {
Expand All @@ -35,6 +38,8 @@ def _dummy_dataset_setup(
)[:num_samples],
"features": np.random.randn(num_samples),
}
if string_partition_by:
data[partition_by] = data[partition_by].astype(str)
return Dataset.from_dict(data)


Expand All @@ -51,6 +56,7 @@ def _dummy_heterogeneous_dataset_setup(
return Dataset.from_dict(data)


@parameterized_class(("string_partition_by",), [(False,), (True,)])
class TestClassConstrainedPartitioner(unittest.TestCase):
"""Unit tests for PathologicalPartitioner."""

Expand Down Expand Up @@ -94,7 +100,8 @@ def test_first_class_deterministic_assignment(self) -> None:
Test if all the classes are used (which has to be the case, given num_partitions
>= than the number of unique classes).
"""
dataset = _dummy_dataset_setup(100, "labels", 10)
partition_by = "labels"
dataset = _dummy_dataset_setup(100, partition_by, 10)
partitioner = PathologicalPartitioner(
num_partitions=10,
partition_by="labels",
Expand All @@ -103,7 +110,12 @@ def test_first_class_deterministic_assignment(self) -> None:
)
partitioner.dataset = dataset
partitioner.load_partition(0)
expected_classes = set(range(10))
expected_classes = set(
range(10)
# pylint: disable=unsubscriptable-object
if isinstance(dataset[partition_by][0], int)
else [str(i) for i in range(10)]
)
actual_classes = set()
for pid in range(10):
partition = partitioner.load_partition(pid)
Expand Down Expand Up @@ -141,6 +153,9 @@ def test_deterministic_class_assignment(
for i in range(num_classes_per_partition)
]
)
# pylint: disable=unsubscriptable-object
if isinstance(dataset["labels"][0], str):
expected_labels = [str(label) for label in expected_labels]
actual_labels = sorted(np.unique(partition["labels"]))
self.assertTrue(
np.array_equal(expected_labels, actual_labels),
Expand All @@ -166,6 +181,9 @@ def test_too_many_partitions_for_a_class(
"labels": np.array([num_unique_classes - 1] * (num_samples // 2)),
"features": np.random.randn(num_samples // 2),
}
# pylint: disable=unsubscriptable-object
if isinstance(dataset_1["labels"][0], str):
data["labels"] = data["labels"].astype(str)
dataset_2 = Dataset.from_dict(data)
dataset = datasets.concatenate_datasets([dataset_1, dataset_2])

Expand Down
1 change: 1 addition & 0 deletions e2e/e2e-bare-auth/certificate.conf
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,4 @@ subjectAltName = @alt_names
DNS.1 = localhost
IP.1 = ::1
IP.2 = 127.0.0.1
IP.3 = 0.0.0.0
122 changes: 122 additions & 0 deletions e2e/test_superexec.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
#!/bin/bash
set -e

# Set connectivity parameters
case "$1" in
secure)
./generate.sh
server_arg='--ssl-ca-certfile ../certificates/ca.crt
--ssl-certfile ../certificates/server.pem
--ssl-keyfile ../certificates/server.key'
client_arg='--root-certificates ../certificates/ca.crt'
# For $superexec_arg, note special ordering of single- and double-quotes
superexec_arg='--executor-config 'root-certificates=\"../certificates/ca.crt\"''
superexec_arg="$server_arg $superexec_arg"
;;
insecure)
server_arg='--insecure'
client_arg=$server_arg
superexec_arg=$server_arg
;;
esac

# Set authentication parameters
case "$2" in
client-auth)
server_auth='--auth-list-public-keys ../keys/client_public_keys.csv
--auth-superlink-private-key ../keys/server_credentials
--auth-superlink-public-key ../keys/server_credentials.pub'
client_auth_1='--auth-supernode-private-key ../keys/client_credentials_1
--auth-supernode-public-key ../keys/client_credentials_1.pub'
client_auth_2='--auth-supernode-private-key ../keys/client_credentials_2
--auth-supernode-public-key ../keys/client_credentials_2.pub'
server_address='127.0.0.1:9092'
;;
*)
server_auth=''
client_auth_1=''
client_auth_2=''
server_address='127.0.0.1:9092'
;;
esac

# Set engine
case "$3" in
deployment-engine)
superexec_engine_arg='--executor flwr.superexec.deployment:executor'
;;
simulation-engine)
superexec_engine_arg='--executor flwr.superexec.simulation:executor
--executor-config 'num-supernodes=10''
;;
esac


# Create and install Flower app
flwr new e2e-tmp-test --framework numpy --username flwrlabs
cd e2e-tmp-test
# Remove flwr dependency from `pyproject.toml`. Seems necessary so that it does
# not override the wheel dependency
if [[ "$OSTYPE" == "darwin"* ]]; then
# macOS (Darwin) system
sed -i '' '/flwr\[simulation\]/d' pyproject.toml
else
# Non-macOS system (Linux)
sed -i '/flwr\[simulation\]/d' pyproject.toml
fi
pip install -e . --no-deps

# Check if the first argument is 'insecure'
if [ "$1" == "insecure" ]; then
# If $1 is 'insecure', append the first line
echo -e $"\n[tool.flwr.federations.superexec]\naddress = \"127.0.0.1:9093\"\ninsecure = true" >> pyproject.toml
else
# Otherwise, append the second line
echo -e $"\n[tool.flwr.federations.superexec]\naddress = \"127.0.0.1:9093\"\nroot-certificates = \"../certificates/ca.crt\"" >> pyproject.toml
fi

timeout 2m flower-superlink $server_arg $server_auth &
sl_pid=$!
sleep 2

timeout 2m flower-supernode ./ $client_arg \
--superlink $server_address $client_auth_1 \
--node-config "partition-id=0 num-partitions=2" --max-retries 0 &
cl1_pid=$!
sleep 2

timeout 2m flower-supernode ./ $client_arg \
--superlink $server_address $client_auth_2 \
--node-config "partition-id=1 num-partitions=2" --max-retries 0 &
cl2_pid=$!
sleep 2

timeout 2m flower-superexec $superexec_arg $superexec_engine_arg 2>&1 | tee flwr_output.log &
se_pid=$(pgrep -f "flower-superexec")
sleep 2

timeout 1m flwr run --run-config num-server-rounds=1 ../e2e-tmp-test superexec

# Initialize a flag to track if training is successful
found_success=false
timeout=120 # Timeout after 120 seconds
elapsed=0

# Check for "Success" in a loop with a timeout
while [ "$found_success" = false ] && [ $elapsed -lt $timeout ]; do
if grep -q "Run finished" flwr_output.log; then
echo "Training worked correctly!"
found_success=true
kill $cl1_pid; kill $cl2_pid; sleep 1; kill $sl_pid; kill $se_pid;
else
echo "Waiting for training ... ($elapsed seconds elapsed)"
fi
# Sleep for a short period and increment the elapsed time
sleep 2
elapsed=$((elapsed + 2))
done

if [ "$found_success" = false ]; then
echo "Training had an issue and timed out."
kill $cl1_pid; kill $cl2_pid; kill $sl_pid; kill $se_pid;
fi
27 changes: 27 additions & 0 deletions glossary/flower-datasets.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
title: "Flower Datasets"
description: "Flower Datasets is a library that enables the creation of datasets for federated learning by partitioning centralized datasets to exhibit heterogeneity or using naturally partitioned datasets."
date: "2024-05-24"
author:
name: "Adam Narożniak"
position: "ML Engineer at Flower Labs"
website: "https://discuss.flower.ai/u/adam.narozniak/summary"
related:
- text: "Flower Datasets documentation"
link: "https://flower.ai/docs/datasets/"
- text: "Flower Datasets GitHub page"
link: "https://github.com/adap/flower/tree/main/datasets"
---

Flower Datasets is a library that enables the creation of datasets for federated learning/analytics/evaluation by partitioning centralized datasets to exhibit heterogeneity or using naturally partitioned datasets. It was created by the Flower Labs team, which also created Flower - a Friendly Federated Learning Framework.

The key features include:
* downloading datasets (HuggingFace `datasets` are used under the hood),
* partitioning (simulate different levels of heterogeneity by using one of the implemented partitioning schemes or create your own),
* creating centralized datasets (easily utilize centralized versions of the datasets),
* reproducibility (repeat the experiments with the same results),
* visualization (display the created partitions),
* ML agnostic (easy integration with all popular ML frameworks).


It is a supplementary library to Flower, with which it integrates easily.
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ protobuf = "^4.25.2"
cryptography = "^42.0.4"
pycryptodome = "^3.18.0"
iterators = "^0.0.2"
typer = { version = "^0.9.0", extras = ["all"] }
typer = "^0.12.5"
tomli = "^2.0.1"
tomli-w = "^1.0.0"
pathspec = "^0.12.1"
Expand Down
Loading

0 comments on commit daf2e2f

Please sign in to comment.