Merge branch 'main' into migrate-opacus-example

adap · Oct 2, 2024 · daf2e2f · daf2e2f
2 parents da4f4ab + fdfae1b
commit daf2e2f
Show file tree

Hide file tree

Showing 18 changed files with 404 additions and 131 deletions.
diff --git a/.github/workflows/e2e.yml b/.github/workflows/e2e.yml
@@ -51,6 +51,64 @@ jobs:
       short_sha: ${{ steps.upload.outputs.SHORT_SHA }}
       dir: ${{ steps.upload.outputs.DIR }}
 
+  superexec:
+    runs-on: ubuntu-22.04
+    timeout-minutes: 10
+    needs: wheel
+    strategy:
+      fail-fast: false
+      matrix:
+        python-version: ["3.9", "3.10", "3.11"]
+        directory: [e2e-bare-auth]
+        connection: [secure, insecure]
+        engine: [deployment-engine, simulation-engine]
+        authentication: [no-auth, client-auth]
+        exclude:
+          - connection: insecure
+            authentication: client-auth
+    name: |
+      SuperExec / 
+      Python ${{ matrix.python-version }} /
+      ${{ matrix.connection }} / 
+      ${{ matrix.authentication }} / 
+      ${{ matrix.engine }} 
+    defaults:
+      run:
+        working-directory: e2e/${{ matrix.directory }}
+    steps:
+      - uses: actions/checkout@v4
+      - name: Bootstrap
+        uses: ./.github/actions/bootstrap
+        with:
+          python-version: ${{ matrix.python-version }}
+          poetry-skip: 'true'
+      - name: Install Flower from repo
+        if: ${{ github.repository != 'adap/flower' || github.event.pull_request.head.repo.fork || github.actor == 'dependabot[bot]' }}
+        working-directory: ./
+        run: |
+          if [[ "${{ matrix.engine }}" == "simulation-engine" ]]; then
+            python -m pip install ".[simulation]"
+          else
+            python -m pip install .
+          fi
+      - name: Download and install Flower wheel from artifact store
+        if: ${{ github.repository == 'adap/flower' && !github.event.pull_request.head.repo.fork && github.actor != 'dependabot[bot]' }}
+        run: |
+          # Define base URL for wheel file
+          WHEEL_URL="https://${{ env.ARTIFACT_BUCKET }}/py/${{ needs.wheel.outputs.dir }}/${{ needs.wheel.outputs.short_sha }}/${{ needs.wheel.outputs.whl_path }}"
+          if [[ "${{ matrix.engine }}" == "simulation-engine" ]]; then
+            python -m pip install "flwr[simulation] @ ${WHEEL_URL}"
+          else
+            python -m pip install "${WHEEL_URL}"
+          fi
+      - name: >
+          Run SuperExec test / 
+          ${{ matrix.connection }} / 
+          ${{ matrix.authentication }} / 
+          ${{ matrix.engine }} 
+        working-directory: e2e/${{ matrix.directory }}
+        run: ./../test_superexec.sh "${{ matrix.connection }}" "${{ matrix.authentication}}" "${{ matrix.engine }}"
+
   frameworks:
     runs-on: ubuntu-22.04
     timeout-minutes: 10

diff --git a/benchmarks/flowertune-llm/README.md b/benchmarks/flowertune-llm/README.md
@@ -13,13 +13,13 @@ As the first step, please register for a Flower account on [flower.ai/login](htt
 Then, create a new Python environment and install Flower. 
 
 > [!TIP]
-> We recommend using `pyenv` with the `virtualenv` plugin to create your environment. Other managers, such as Conda, will likely work as well. Check the [documentation](https://flower.ai/docs/framework/how-to-install-flower.html) for alternative ways to install Flower.
+> We recommend using `pyenv` with the `virtualenv` plugin to create your environment with Python >= 3.10.0. Other managers, such as Conda, will likely work as well. Check the [documentation](https://flower.ai/docs/framework/how-to-install-flower.html) for alternative ways to install Flower.
 
 ```shell
 pip install flwr
 ```
 
-In the new environment, create a new Flower project using the `FlowerTune` template. You will be prompted for a name to give to your project, your username, and for your choice of LLM challenge:
+In the new environment, create a new Flower project using the `FlowerTune` template. You will be prompted for a name to give to your app/project, your username, and for your choice of LLM challenge:
 ```shell
 flwr new --framework=FlowerTune
 ```
@@ -64,5 +64,5 @@ following the `README.md` in [`evaluation`](https://github.com/adap/flower/tree/
 
 
 > [!NOTE]
-> If you have any questions about running FlowerTune LLM challenges or evaluation, please feel free to make posts at [Flower Discuss](https://discuss.flower.ai) forum, 
+> If you have any questions about running FlowerTune LLM challenges or evaluation, please feel free to make posts at our dedicated [FlowerTune Category](https://discuss.flower.ai/c/flowertune-llm-leaderboard/) on [Flower Discuss](https://discuss.flower.ai) forum, 
 or join our [Slack channel](https://flower.ai/join-slack/) to ask questions in the `#flowertune-llm-leaderboard` channel.
diff --git a/datasets/doc/source/index.rst b/datasets/doc/source/index.rst
@@ -3,14 +3,7 @@ Flower Datasets
 
 Flower Datasets (``flwr-datasets``) is a library that enables the quick and easy creation of datasets for federated learning/analytics/evaluation. It enables heterogeneity (non-iidness) simulation and division of datasets with the preexisting notion of IDs. The library was created by the ``Flower Labs`` team that also created `Flower <https://flower.ai>`_ : A Friendly Federated Learning Framework.
 
-.. raw:: html
-
-  <script
-    type="module"
-    src="https://gradio.s3-us-west-2.amazonaws.com/4.44.0/gradio.js"
-  ></script>
-
-  <gradio-app src="https://flwrlabs-federated-learning-datasets-by-flwr-datasets.hf.space"></gradio-app>
+Try out an interactive demo to generate code and visualize heterogeneous divisions at the :ref:`bottom of this page<demo>`.
 
 Flower Datasets Framework
 -------------------------
@@ -142,7 +135,6 @@ What makes Flower Datasets stand out from other libraries?
 
   * New custom partitioning schemes (``Partitioner`` subclasses) integrated with the whole ecosystem.
 
-
 Join the Flower Community
 -------------------------
 
@@ -153,3 +145,16 @@ The Flower Community is growing quickly - we're a friendly group of researchers,
     :shadow:
 
     Join us on Slack
+
+.. _demo:
+Demo
+----
+
+.. raw:: html
+
+  <script
+    type="module"
+    src="https://gradio.s3-us-west-2.amazonaws.com/4.44.0/gradio.js"
+  ></script>
+
+  <gradio-app src="https://flwrlabs-federated-learning-datasets-by-flwr-datasets.hf.space"></gradio-app>
diff --git a/datasets/flwr_datasets/partitioner/pathological_partitioner.py b/datasets/flwr_datasets/partitioner/pathological_partitioner.py
@@ -225,7 +225,7 @@ def _determine_partition_id_to_unique_labels(self) -> None:
         if self._class_assignment_mode == "first-deterministic":
             # if self._first_class_deterministic_assignment:
             for partition_id in range(self._num_partitions):
-                label = partition_id % num_unique_classes
+                label = self._unique_labels[partition_id % num_unique_classes]
                 self._partition_id_to_unique_labels[partition_id].append(label)
 
                 while (

diff --git a/datasets/flwr_datasets/partitioner/pathological_partitioner_test.py b/datasets/flwr_datasets/partitioner/pathological_partitioner_test.py
@@ -18,15 +18,18 @@
 import unittest
 
 import numpy as np
-from parameterized import parameterized
+from parameterized import parameterized, parameterized_class
 
 import datasets
 from datasets import Dataset
 from flwr_datasets.partitioner.pathological_partitioner import PathologicalPartitioner
 
 
 def _dummy_dataset_setup(
-    num_samples: int, partition_by: str, num_unique_classes: int
+    num_samples: int,
+    partition_by: str,
+    num_unique_classes: int,
+    string_partition_by: bool = False,
 ) -> Dataset:
     """Create a dummy dataset for testing."""
     data = {
@@ -35,6 +38,8 @@ def _dummy_dataset_setup(
         )[:num_samples],
         "features": np.random.randn(num_samples),
     }
+    if string_partition_by:
+        data[partition_by] = data[partition_by].astype(str)
     return Dataset.from_dict(data)
 
 
@@ -51,6 +56,7 @@ def _dummy_heterogeneous_dataset_setup(
     return Dataset.from_dict(data)
 
 
+@parameterized_class(("string_partition_by",), [(False,), (True,)])
 class TestClassConstrainedPartitioner(unittest.TestCase):
     """Unit tests for PathologicalPartitioner."""
 
@@ -94,7 +100,8 @@ def test_first_class_deterministic_assignment(self) -> None:
         Test if all the classes are used (which has to be the case, given num_partitions
         >= than the number of unique classes).
         """
-        dataset = _dummy_dataset_setup(100, "labels", 10)
+        partition_by = "labels"
+        dataset = _dummy_dataset_setup(100, partition_by, 10)
         partitioner = PathologicalPartitioner(
             num_partitions=10,
             partition_by="labels",
@@ -103,7 +110,12 @@ def test_first_class_deterministic_assignment(self) -> None:
         )
         partitioner.dataset = dataset
         partitioner.load_partition(0)
-        expected_classes = set(range(10))
+        expected_classes = set(
+            range(10)
+            # pylint: disable=unsubscriptable-object
+            if isinstance(dataset[partition_by][0], int)
+            else [str(i) for i in range(10)]
+        )
         actual_classes = set()
         for pid in range(10):
             partition = partitioner.load_partition(pid)
@@ -141,6 +153,9 @@ def test_deterministic_class_assignment(
                     for i in range(num_classes_per_partition)
                 ]
             )
+            # pylint: disable=unsubscriptable-object
+            if isinstance(dataset["labels"][0], str):
+                expected_labels = [str(label) for label in expected_labels]
             actual_labels = sorted(np.unique(partition["labels"]))
             self.assertTrue(
                 np.array_equal(expected_labels, actual_labels),
@@ -166,6 +181,9 @@ def test_too_many_partitions_for_a_class(
             "labels": np.array([num_unique_classes - 1] * (num_samples // 2)),
             "features": np.random.randn(num_samples // 2),
         }
+        # pylint: disable=unsubscriptable-object
+        if isinstance(dataset_1["labels"][0], str):
+            data["labels"] = data["labels"].astype(str)
         dataset_2 = Dataset.from_dict(data)
         dataset = datasets.concatenate_datasets([dataset_1, dataset_2])
 

diff --git a/e2e/e2e-bare-auth/certificate.conf b/e2e/e2e-bare-auth/certificate.conf
@@ -18,3 +18,4 @@ subjectAltName = @alt_names
 DNS.1 = localhost
 IP.1 = ::1
 IP.2 = 127.0.0.1
+IP.3 = 0.0.0.0
diff --git a/e2e/test_superexec.sh b/e2e/test_superexec.sh
@@ -0,0 +1,122 @@
+#!/bin/bash
+set -e
+
+# Set connectivity parameters
+case "$1" in
+    secure)
+      ./generate.sh
+      server_arg='--ssl-ca-certfile ../certificates/ca.crt
+                  --ssl-certfile    ../certificates/server.pem
+                  --ssl-keyfile     ../certificates/server.key'
+      client_arg='--root-certificates ../certificates/ca.crt'
+      # For $superexec_arg, note special ordering of single- and double-quotes
+      superexec_arg='--executor-config 'root-certificates=\"../certificates/ca.crt\"''
+      superexec_arg="$server_arg $superexec_arg"
+      ;;
+    insecure)
+      server_arg='--insecure'
+      client_arg=$server_arg
+      superexec_arg=$server_arg
+    ;;
+esac
+
+# Set authentication parameters
+case "$2" in
+    client-auth)
+      server_auth='--auth-list-public-keys      ../keys/client_public_keys.csv 
+                   --auth-superlink-private-key ../keys/server_credentials 
+                   --auth-superlink-public-key  ../keys/server_credentials.pub'
+      client_auth_1='--auth-supernode-private-key ../keys/client_credentials_1 
+                     --auth-supernode-public-key  ../keys/client_credentials_1.pub'
+      client_auth_2='--auth-supernode-private-key ../keys/client_credentials_2 
+                     --auth-supernode-public-key  ../keys/client_credentials_2.pub'
+      server_address='127.0.0.1:9092'
+      ;;
+    *)
+    server_auth=''
+    client_auth_1=''
+    client_auth_2=''
+    server_address='127.0.0.1:9092'
+    ;;
+esac
+
+# Set engine
+case "$3" in
+    deployment-engine)
+      superexec_engine_arg='--executor flwr.superexec.deployment:executor'
+      ;;
+    simulation-engine)
+      superexec_engine_arg='--executor flwr.superexec.simulation:executor
+                            --executor-config 'num-supernodes=10''
+      ;;
+esac
+
+
+# Create and install Flower app
+flwr new e2e-tmp-test --framework numpy --username flwrlabs
+cd e2e-tmp-test
+# Remove flwr dependency from `pyproject.toml`. Seems necessary so that it does
+# not override the wheel dependency
+if [[ "$OSTYPE" == "darwin"* ]]; then
+    # macOS (Darwin) system
+    sed -i '' '/flwr\[simulation\]/d' pyproject.toml
+else
+    # Non-macOS system (Linux)
+    sed -i '/flwr\[simulation\]/d' pyproject.toml
+fi
+pip install -e . --no-deps
+
+# Check if the first argument is 'insecure'
+if [ "$1" == "insecure" ]; then
+  # If $1 is 'insecure', append the first line
+  echo -e $"\n[tool.flwr.federations.superexec]\naddress = \"127.0.0.1:9093\"\ninsecure = true" >> pyproject.toml
+else
+  # Otherwise, append the second line
+  echo -e $"\n[tool.flwr.federations.superexec]\naddress = \"127.0.0.1:9093\"\nroot-certificates = \"../certificates/ca.crt\"" >> pyproject.toml
+fi
+
+timeout 2m flower-superlink $server_arg $server_auth &
+sl_pid=$!
+sleep 2
+
+timeout 2m flower-supernode ./ $client_arg \
+    --superlink $server_address $client_auth_1 \
+    --node-config "partition-id=0 num-partitions=2" --max-retries 0 &
+cl1_pid=$!
+sleep 2
+
+timeout 2m flower-supernode ./ $client_arg \
+    --superlink $server_address $client_auth_2 \
+    --node-config "partition-id=1 num-partitions=2" --max-retries 0 &
+cl2_pid=$!
+sleep 2
+
+timeout 2m flower-superexec $superexec_arg $superexec_engine_arg 2>&1 | tee flwr_output.log &
+se_pid=$(pgrep -f "flower-superexec")
+sleep 2
+
+timeout 1m flwr run --run-config num-server-rounds=1 ../e2e-tmp-test superexec
+
+# Initialize a flag to track if training is successful
+found_success=false
+timeout=120  # Timeout after 120 seconds
+elapsed=0
+
+# Check for "Success" in a loop with a timeout
+while [ "$found_success" = false ] && [ $elapsed -lt $timeout ]; do
+    if grep -q "Run finished" flwr_output.log; then
+        echo "Training worked correctly!"
+        found_success=true
+        kill $cl1_pid; kill $cl2_pid; sleep 1; kill $sl_pid; kill $se_pid;
+    else
+        echo "Waiting for training ... ($elapsed seconds elapsed)"
+    fi
+    # Sleep for a short period and increment the elapsed time
+    sleep 2
+    elapsed=$((elapsed + 2))
+done
+
+if [ "$found_success" = false ]; then
+    echo "Training had an issue and timed out."
+    kill $cl1_pid; kill $cl2_pid; kill $sl_pid; kill $se_pid;
+fi
diff --git a/glossary/flower-datasets.mdx b/glossary/flower-datasets.mdx
@@ -0,0 +1,27 @@
+---
+title: "Flower Datasets"
+description: "Flower Datasets is a library that enables the creation of datasets for federated learning by partitioning centralized datasets to exhibit heterogeneity or using naturally partitioned datasets."
+date: "2024-05-24"
+author:
+  name: "Adam Narożniak"
+  position: "ML Engineer at Flower Labs"
+  website: "https://discuss.flower.ai/u/adam.narozniak/summary"
+related:
+    - text: "Flower Datasets documentation"
+      link: "https://flower.ai/docs/datasets/"
+    - text: "Flower Datasets GitHub page"
+      link: "https://github.com/adap/flower/tree/main/datasets"
+---
+
+Flower Datasets is a library that enables the creation of datasets for federated learning/analytics/evaluation by partitioning centralized datasets to exhibit heterogeneity or using naturally partitioned datasets. It was created by the Flower Labs team, which also created Flower - a Friendly Federated Learning Framework.
+
+The key features include:
+* downloading datasets (HuggingFace `datasets` are used under the hood),
+* partitioning (simulate different levels of heterogeneity by using one of the implemented partitioning schemes or create your own),
+* creating centralized datasets (easily utilize centralized versions of the datasets),
+* reproducibility (repeat the experiments with the same results),
+* visualization (display the created partitions),
+* ML agnostic (easy integration with all popular ML frameworks).
+
+
+It is a supplementary library to Flower, with which it integrates easily.
diff --git a/pyproject.toml b/pyproject.toml
@@ -73,7 +73,7 @@ protobuf = "^4.25.2"
 cryptography = "^42.0.4"
 pycryptodome = "^3.18.0"
 iterators = "^0.0.2"
-typer = { version = "^0.9.0", extras = ["all"] }
+typer = "^0.12.5"
 tomli = "^2.0.1"
 tomli-w = "^1.0.0"
 pathspec = "^0.12.1"