Skip to content

Commit

Permalink
Merge branch 'main' into update-xgb-comprehensive-example
Browse files Browse the repository at this point in the history
  • Loading branch information
yan-gao-GY authored Oct 2, 2024
2 parents 7bcc7d2 + f4b2da2 commit b95498e
Show file tree
Hide file tree
Showing 11 changed files with 115 additions and 26 deletions.
9 changes: 9 additions & 0 deletions .github/workflows/e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,15 @@ jobs:
with:
python-version: ${{ matrix.python-version }}
poetry-skip: 'true'
- name: Install Flower from repo
if: ${{ github.repository != 'adap/flower' || github.event.pull_request.head.repo.fork || github.actor == 'dependabot[bot]' }}
working-directory: ./
run: |
if [[ "${{ matrix.engine }}" == "simulation-engine" ]]; then
python -m pip install ".[simulation]"
else
python -m pip install .
fi
- name: Download and install Flower wheel from artifact store
if: ${{ github.repository == 'adap/flower' && !github.event.pull_request.head.repo.fork && github.actor != 'dependabot[bot]' }}
run: |
Expand Down
6 changes: 3 additions & 3 deletions benchmarks/flowertune-llm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,13 @@ As the first step, please register for a Flower account on [flower.ai/login](htt
Then, create a new Python environment and install Flower.

> [!TIP]
> We recommend using `pyenv` with the `virtualenv` plugin to create your environment. Other managers, such as Conda, will likely work as well. Check the [documentation](https://flower.ai/docs/framework/how-to-install-flower.html) for alternative ways to install Flower.
> We recommend using `pyenv` with the `virtualenv` plugin to create your environment with Python >= 3.10.0. Other managers, such as Conda, will likely work as well. Check the [documentation](https://flower.ai/docs/framework/how-to-install-flower.html) for alternative ways to install Flower.
```shell
pip install flwr
```

In the new environment, create a new Flower project using the `FlowerTune` template. You will be prompted for a name to give to your project, your username, and for your choice of LLM challenge:
In the new environment, create a new Flower project using the `FlowerTune` template. You will be prompted for a name to give to your app/project, your username, and for your choice of LLM challenge:
```shell
flwr new --framework=FlowerTune
```
Expand Down Expand Up @@ -64,5 +64,5 @@ following the `README.md` in [`evaluation`](https://github.com/adap/flower/tree/


> [!NOTE]
> If you have any questions about running FlowerTune LLM challenges or evaluation, please feel free to make posts at [Flower Discuss](https://discuss.flower.ai) forum,
> If you have any questions about running FlowerTune LLM challenges or evaluation, please feel free to make posts at our dedicated [FlowerTune Category](https://discuss.flower.ai/c/flowertune-llm-leaderboard/) on [Flower Discuss](https://discuss.flower.ai) forum,
or join our [Slack channel](https://flower.ai/join-slack/) to ask questions in the `#flowertune-llm-leaderboard` channel.
23 changes: 14 additions & 9 deletions datasets/doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,7 @@ Flower Datasets

Flower Datasets (``flwr-datasets``) is a library that enables the quick and easy creation of datasets for federated learning/analytics/evaluation. It enables heterogeneity (non-iidness) simulation and division of datasets with the preexisting notion of IDs. The library was created by the ``Flower Labs`` team that also created `Flower <https://flower.ai>`_ : A Friendly Federated Learning Framework.

.. raw:: html

<script
type="module"
src="https://gradio.s3-us-west-2.amazonaws.com/4.44.0/gradio.js"
></script>

<gradio-app src="https://flwrlabs-federated-learning-datasets-by-flwr-datasets.hf.space"></gradio-app>
Try out an interactive demo to generate code and visualize heterogeneous divisions at the :ref:`bottom of this page<demo>`.

Flower Datasets Framework
-------------------------
Expand Down Expand Up @@ -142,7 +135,6 @@ What makes Flower Datasets stand out from other libraries?

* New custom partitioning schemes (``Partitioner`` subclasses) integrated with the whole ecosystem.


Join the Flower Community
-------------------------

Expand All @@ -153,3 +145,16 @@ The Flower Community is growing quickly - we're a friendly group of researchers,
:shadow:

Join us on Slack

.. _demo:
Demo
----

.. raw:: html

<script
type="module"
src="https://gradio.s3-us-west-2.amazonaws.com/4.44.0/gradio.js"
></script>

<gradio-app src="https://flwrlabs-federated-learning-datasets-by-flwr-datasets.hf.space"></gradio-app>
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ def _determine_partition_id_to_unique_labels(self) -> None:
if self._class_assignment_mode == "first-deterministic":
# if self._first_class_deterministic_assignment:
for partition_id in range(self._num_partitions):
label = partition_id % num_unique_classes
label = self._unique_labels[partition_id % num_unique_classes]
self._partition_id_to_unique_labels[partition_id].append(label)

while (
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,18 @@
import unittest

import numpy as np
from parameterized import parameterized
from parameterized import parameterized, parameterized_class

import datasets
from datasets import Dataset
from flwr_datasets.partitioner.pathological_partitioner import PathologicalPartitioner


def _dummy_dataset_setup(
num_samples: int, partition_by: str, num_unique_classes: int
num_samples: int,
partition_by: str,
num_unique_classes: int,
string_partition_by: bool = False,
) -> Dataset:
"""Create a dummy dataset for testing."""
data = {
Expand All @@ -35,6 +38,8 @@ def _dummy_dataset_setup(
)[:num_samples],
"features": np.random.randn(num_samples),
}
if string_partition_by:
data[partition_by] = data[partition_by].astype(str)
return Dataset.from_dict(data)


Expand All @@ -51,6 +56,7 @@ def _dummy_heterogeneous_dataset_setup(
return Dataset.from_dict(data)


@parameterized_class(("string_partition_by",), [(False,), (True,)])
class TestClassConstrainedPartitioner(unittest.TestCase):
"""Unit tests for PathologicalPartitioner."""

Expand Down Expand Up @@ -94,7 +100,8 @@ def test_first_class_deterministic_assignment(self) -> None:
Test if all the classes are used (which has to be the case, given num_partitions
>= than the number of unique classes).
"""
dataset = _dummy_dataset_setup(100, "labels", 10)
partition_by = "labels"
dataset = _dummy_dataset_setup(100, partition_by, 10)
partitioner = PathologicalPartitioner(
num_partitions=10,
partition_by="labels",
Expand All @@ -103,7 +110,12 @@ def test_first_class_deterministic_assignment(self) -> None:
)
partitioner.dataset = dataset
partitioner.load_partition(0)
expected_classes = set(range(10))
expected_classes = set(
range(10)
# pylint: disable=unsubscriptable-object
if isinstance(dataset[partition_by][0], int)
else [str(i) for i in range(10)]
)
actual_classes = set()
for pid in range(10):
partition = partitioner.load_partition(pid)
Expand Down Expand Up @@ -141,6 +153,9 @@ def test_deterministic_class_assignment(
for i in range(num_classes_per_partition)
]
)
# pylint: disable=unsubscriptable-object
if isinstance(dataset["labels"][0], str):
expected_labels = [str(label) for label in expected_labels]
actual_labels = sorted(np.unique(partition["labels"]))
self.assertTrue(
np.array_equal(expected_labels, actual_labels),
Expand All @@ -166,6 +181,9 @@ def test_too_many_partitions_for_a_class(
"labels": np.array([num_unique_classes - 1] * (num_samples // 2)),
"features": np.random.randn(num_samples // 2),
}
# pylint: disable=unsubscriptable-object
if isinstance(dataset_1["labels"][0], str):
data["labels"] = data["labels"].astype(str)
dataset_2 = Dataset.from_dict(data)
dataset = datasets.concatenate_datasets([dataset_1, dataset_2])

Expand Down
27 changes: 27 additions & 0 deletions glossary/flower-datasets.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
title: "Flower Datasets"
description: "Flower Datasets is a library that enables the creation of datasets for federated learning by partitioning centralized datasets to exhibit heterogeneity or using naturally partitioned datasets."
date: "2024-05-24"
author:
name: "Adam Narożniak"
position: "ML Engineer at Flower Labs"
website: "https://discuss.flower.ai/u/adam.narozniak/summary"
related:
- text: "Flower Datasets documentation"
link: "https://flower.ai/docs/datasets/"
- text: "Flower Datasets GitHub page"
link: "https://github.com/adap/flower/tree/main/datasets"
---

Flower Datasets is a library that enables the creation of datasets for federated learning/analytics/evaluation by partitioning centralized datasets to exhibit heterogeneity or using naturally partitioned datasets. It was created by the Flower Labs team, which also created Flower - a Friendly Federated Learning Framework.

The key features include:
* downloading datasets (HuggingFace `datasets` are used under the hood),
* partitioning (simulate different levels of heterogeneity by using one of the implemented partitioning schemes or create your own),
* creating centralized datasets (easily utilize centralized versions of the datasets),
* reproducibility (repeat the experiments with the same results),
* visualization (display the created partitions),
* ML agnostic (easy integration with all popular ML frameworks).


It is a supplementary library to Flower, with which it integrates easily.
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ protobuf = "^4.25.2"
cryptography = "^42.0.4"
pycryptodome = "^3.18.0"
iterators = "^0.0.2"
typer = { version = "^0.9.0", extras = ["all"] }
typer = "^0.12.5"
tomli = "^2.0.1"
tomli-w = "^1.0.0"
pathspec = "^0.12.1"
Expand Down
17 changes: 14 additions & 3 deletions src/py/flwr/client/grpc_rere_client/grpc_adapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,14 @@

from flwr.common import log
from flwr.common.constant import (
GRPC_ADAPTER_METADATA_FLOWER_PACKAGE_NAME_KEY,
GRPC_ADAPTER_METADATA_FLOWER_PACKAGE_VERSION_KEY,
GRPC_ADAPTER_METADATA_FLOWER_VERSION_KEY,
GRPC_ADAPTER_METADATA_MESSAGE_MODULE_KEY,
GRPC_ADAPTER_METADATA_MESSAGE_QUALNAME_KEY,
GRPC_ADAPTER_METADATA_SHOULD_EXIT_KEY,
)
from flwr.common.version import package_version
from flwr.common.version import package_name, package_version
from flwr.proto.fab_pb2 import GetFabRequest, GetFabResponse # pylint: disable=E0611
from flwr.proto.fleet_pb2 import ( # pylint: disable=E0611
CreateNodeRequest,
Expand Down Expand Up @@ -62,9 +66,16 @@ def _send_and_receive(
self, request: GrpcMessage, response_type: type[T], **kwargs: Any
) -> T:
# Serialize request
req_cls = request.__class__
container_req = MessageContainer(
metadata={GRPC_ADAPTER_METADATA_FLOWER_VERSION_KEY: package_version},
grpc_message_name=request.__class__.__qualname__,
metadata={
GRPC_ADAPTER_METADATA_FLOWER_PACKAGE_NAME_KEY: package_name,
GRPC_ADAPTER_METADATA_FLOWER_PACKAGE_VERSION_KEY: package_version,
GRPC_ADAPTER_METADATA_FLOWER_VERSION_KEY: package_version,
GRPC_ADAPTER_METADATA_MESSAGE_MODULE_KEY: req_cls.__module__,
GRPC_ADAPTER_METADATA_MESSAGE_QUALNAME_KEY: req_cls.__qualname__,
},
grpc_message_name=req_cls.__qualname__,
grpc_message_content=request.SerializeToString(),
)

Expand Down
1 change: 1 addition & 0 deletions src/py/flwr/client/supernode/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ def run_supernode() -> None:
node_config=parse_config_args(
[args.node_config] if args.node_config else args.node_config
),
flwr_path=args.flwr_dir,
isolation=args.isolation,
supernode_address=args.supernode_address,
)
Expand Down
9 changes: 6 additions & 3 deletions src/py/flwr/common/constant.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,6 @@
# IDs
RUN_ID_NUM_BYTES = 8
NODE_ID_NUM_BYTES = 8
GRPC_ADAPTER_METADATA_FLOWER_VERSION_KEY = "flower-version"
GRPC_ADAPTER_METADATA_SHOULD_EXIT_KEY = "should-exit"

# Constants for FAB
APP_DIR = "apps"
Expand All @@ -72,8 +70,13 @@
PARTITION_ID_KEY = "partition-id"
NUM_PARTITIONS_KEY = "num-partitions"

GRPC_ADAPTER_METADATA_FLOWER_VERSION_KEY = "flower-version"
# Constants for keys in `metadata` of `MessageContainer` in `grpc-adapter`
GRPC_ADAPTER_METADATA_FLOWER_PACKAGE_NAME_KEY = "flower-package-name"
GRPC_ADAPTER_METADATA_FLOWER_PACKAGE_VERSION_KEY = "flower-package-version"
GRPC_ADAPTER_METADATA_FLOWER_VERSION_KEY = "flower-version" # Deprecated
GRPC_ADAPTER_METADATA_SHOULD_EXIT_KEY = "should-exit"
GRPC_ADAPTER_METADATA_MESSAGE_MODULE_KEY = "grpc-message-module"
GRPC_ADAPTER_METADATA_MESSAGE_QUALNAME_KEY = "grpc-message-qualname"


class MessageType:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,15 @@
import grpc
from google.protobuf.message import Message as GrpcMessage

from flwr.common.constant import (
GRPC_ADAPTER_METADATA_FLOWER_PACKAGE_NAME_KEY,
GRPC_ADAPTER_METADATA_FLOWER_PACKAGE_VERSION_KEY,
GRPC_ADAPTER_METADATA_FLOWER_VERSION_KEY,
GRPC_ADAPTER_METADATA_MESSAGE_MODULE_KEY,
GRPC_ADAPTER_METADATA_MESSAGE_QUALNAME_KEY,
)
from flwr.common.logger import log
from flwr.common.version import package_name, package_version
from flwr.proto import grpcadapter_pb2_grpc # pylint: disable=E0611
from flwr.proto.fab_pb2 import GetFabRequest, GetFabResponse # pylint: disable=E0611
from flwr.proto.fleet_pb2 import ( # pylint: disable=E0611
Expand Down Expand Up @@ -52,9 +60,16 @@ def _handle(
) -> MessageContainer:
req = request_type.FromString(msg_container.grpc_message_content)
res = handler(req)
res_cls = res.__class__
return MessageContainer(
metadata={},
grpc_message_name=res.__class__.__qualname__,
metadata={
GRPC_ADAPTER_METADATA_FLOWER_PACKAGE_NAME_KEY: package_name,
GRPC_ADAPTER_METADATA_FLOWER_PACKAGE_VERSION_KEY: package_version,
GRPC_ADAPTER_METADATA_FLOWER_VERSION_KEY: package_version,
GRPC_ADAPTER_METADATA_MESSAGE_MODULE_KEY: res_cls.__module__,
GRPC_ADAPTER_METADATA_MESSAGE_QUALNAME_KEY: res_cls.__qualname__,
},
grpc_message_name=res_cls.__qualname__,
grpc_message_content=res.SerializeToString(),
)

Expand Down

0 comments on commit b95498e

Please sign in to comment.