Merge branch 'main' into add_exp_tracking_for_DLI

NVIDIA · Feb 10, 2025 · 57f4f27 · 57f4f27
2 parents 9d8e9ec + 7f03432
commit 57f4f27
Show file tree

Hide file tree

Showing 11 changed files with 1,042 additions and 41 deletions.
diff --git a/...on/chapter-2_develop_federated_learning_applications/02.0_introduction/introduction.ipynb b/...on/chapter-2_develop_federated_learning_applications/02.0_introduction/introduction.ipynb
@@ -1,19 +1,47 @@
 {
  "cells": [
   {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "9d86124d-3f0d-4cbd-8855-d1fe6877ee6c",
+   "cell_type": "markdown",
+   "id": "ed32af6f",
    "metadata": {},
-   "outputs": [],
-   "source": []
+   "source": [
+    "# Introduction: Develop Federated Learning Applications\n",
+    "\n",
+    "In this chapter, we will explore the process of developing federated learning applications. We will start by exploring federated statistics and getting different visualizations from the data. We will then convert PyTorch Lightning code for use with NVFlare. We will then examine machine learning algorithms and look at how to convert logistics regression, kmeans, and survival analysis for use with federated learning. Finally, we will look into the Client API and conclude with a recap of the covered topics.\n",
+    "\n",
+    "\n",
+    "2.1. **Federated Statistics**\n",
+    "   * [Federated Statistics with image data](../02.1_federated_statistics/0federated_statistics_with_image_data/federated_statistics_with_image_data.ipynb)\n",
+    "   * [Federated Statistics with tabular data](../02.1_federated_statistics/federated_statistics_with_tabular_data/federated_statistics_with_tabular_data.ipynb)\n",
+    "\n",
+    "2.2. **Convert PyTorch Lightning to Federated Learning**\n",
+    "\n",
+    "   * [Convert Torch Lightning to FL](../02.2_convert_torch_lightning_to_federated_learning/convert_torch_lightning_to_fl.ipynb)\n",
+    "\n",
+    "\n",
+    "2.3. **How to Convert Machine Learning Algorithms to Federated Algorithms**\n",
+    "\n",
+    "   * [Convert Logistics Regression to federated learning](../02.3_convert_machine_learning_to_federated_learning/02.3.1_convert_Logistics_regression_to_federated_learning/convert_lr_to_fl.ipynb)\n",
+    "   * [Convert KMeans to federated learning](../02.3_convert_machine_learning_to_federated_learning/02.3.2_convert_kmeans_to_federated_learning/convert_kmeans_to_fl.ipynb)\n",
+    "   * [Convert Survival Analysis to federated learning](../02.3_convert_machine_learning_to_federated_learning/02.3.3_convert_survival_analysis_to_federated_learning/convert_survival_analysis_to_fl.ipynb)\n",
+    "\n",
+    "2.4. **Client API** \n",
+    "\n",
+    "   * [NVFlare Client API](../02.4_client_api/Client_api.ipynb)\n",
+    "\n",
+    "2.5. [Recap of the covered topics](../02.5_recap/recap.ipynb)\n",
+    "\n",
+    "\n",
+    "\n",
+    "Let's get started with [Federated Statistics](../02.1_federated_statistics/0federated_statistics_with_image_data/federated_statistics_with_image_data.ipynb)\n"
+   ]
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "nvflare_example",
+   "display_name": "Python 3",
    "language": "python",
-   "name": "nvflare_example"
+   "name": "python3"
   },
   "language_info": {
    "codemirror_mode": {
@@ -25,7 +53,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.2"
+   "version": "3.10.12"
   }
  },
  "nbformat": 4,

diff --git a/...atistics/0federated_statistics_with_image_data/federated_statistics_with_image_data.ipynb b/...atistics/0federated_statistics_with_image_data/federated_statistics_with_image_data.ipynb
diff --git a/...ated_statistics/federated_statistics_with_image_data/code/data/download_and_unzip_data.sh b/...ated_statistics/federated_statistics_with_image_data/code/data/download_and_unzip_data.sh
@@ -0,0 +1,25 @@
+#!/bin/bash
+
+SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
+DATASET_PATH="/tmp/nvflare/image_stats/data"
+
+if [ ! -d $DATASET_PATH ]; then
+  mkdir -p $DATASET_PATH
+fi
+
+source_url="$1"
+echo "download url = ${source_url}"
+if [ -n "${source_url}" ]; then
+  if [ ! -f "${DATASET_PATH}/COVID-19_Radiography_Dataset.zip" ]; then
+    wget -O "${DATASET_PATH}/COVID-19_Radiography_Dataset.zip" "${source_url}"
+  else
+    echo "zip file exists."
+  fi
+  if [ ! -d "${DATASET_PATH}/COVID-19_Radiography_Dataset" ]; then
+    unzip -d $DATASET_PATH "${DATASET_PATH}/COVID-19_Radiography_Dataset.zip"
+  else
+    echo "image files exist."
+  fi
+else
+  echo "empty URL, nothing downloaded, you need to provide real URL to download"
+fi
diff --git a/...atistics/federated_statistics_with_image_data/code/image_stats/demo/image_statistics.json b/...atistics/federated_statistics_with_image_data/code/image_stats/demo/image_statistics.json
diff --git a/...statistics/federated_statistics_with_image_data/code/image_stats/demo/visualization.ipynb b/...statistics/federated_statistics_with_image_data/code/image_stats/demo/visualization.ipynb
diff --git a/.../federated_statistics_with_image_data/code/image_stats/figs/image_histogram.png b/.../federated_statistics_with_image_data/code/image_stats/figs/image_histogram.png
diff --git a/...ed_statistics/federated_statistics_with_image_data/code/image_stats/utils/prepare_data.py b/...ed_statistics/federated_statistics_with_image_data/code/image_stats/utils/prepare_data.py
@@ -0,0 +1,94 @@
+# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import glob
+import json
+import os
+import random
+
+SEED = 0
+
+
+def create_datasets(root, subdirs, extension, shuffle, seed):
+    random.seed(seed)
+
+    data_lists = []
+    for subdir in subdirs:
+        search_string = os.path.join(root, "**", subdir, "images", "*" + extension)
+        data_list = glob.glob(search_string, recursive=True)
+
+        assert (
+            len(data_list) > 0
+        ), f"No images found using {search_string} for subdir '{subdir}' and extension '{extension}'!"
+
+        if shuffle:
+            random.shuffle(data_list)
+
+        data_lists.append(data_list)
+
+    return data_lists
+
+
+def save_data_list(data, data_list_file, data_root, key="data"):
+    data_list = []
+    for d in data:
+        data_list.append({"image": d.replace(data_root + os.path.sep, "")})
+
+    os.makedirs(os.path.dirname(data_list_file), exist_ok=True)
+    with open(data_list_file, "w") as f:
+        json.dump({key: data_list}, f, indent=4)
+
+    print(f"Saved {len(data_list)} entries at {data_list_file}")
+
+
+def prepare_data(
+    input_dir: str,
+    input_ext: str = ".png",
+    output_dir: str = "/tmp/nvflare/image_stats/data",
+    sub_dirs: str = "COVID,Lung_Opacity,Normal,Viral Pneumonia",
+):
+    sub_dir_list = [sd for sd in sub_dirs.split(",")]
+
+    data_lists = create_datasets(root=input_dir, subdirs=sub_dir_list, extension=input_ext, shuffle=True, seed=SEED)
+    print(f"Created {len(data_lists)} data lists for {sub_dir_list}.")
+
+    site_id = 1
+    for subdir, data_list in zip(sub_dir_list, data_lists):
+        save_data_list(data_list, os.path.join(output_dir, f"site-{site_id}_{subdir}.json"), data_root=input_dir)
+        site_id += 1
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--input_dir", type=str, required=True, help="Location of image files")
+    parser.add_argument("--input_ext", type=str, default=".png", help="Search extension")
+    parser.add_argument(
+        "--output_dir", type=str, default="/tmp/nvflare/image_stats/data", help="Output location of data lists"
+    )
+    parser.add_argument(
+        "--subdirs",
+        type=str,
+        default="COVID,Lung_Opacity,Normal,Viral Pneumonia",
+        help="A list of sub-folders to include.",
+    )
+    args = parser.parse_args()
+
+    assert "," in args.subdirs, "Expecting a comma separated list of subdirs names"
+
+    prepare_data(args.input_dir, args.input_ext, args.output_dir, args.subdirs)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/...ns/02.1_federated_statistics/federated_statistics_with_image_data/code/image_stats_job.py b/...ns/02.1_federated_statistics/federated_statistics_with_image_data/code/image_stats_job.py
@@ -0,0 +1,64 @@
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+
+from src.image_statistics import ImageStatistics
+
+from nvflare.job_config.stats_job import StatsJob
+
+
+def define_parser():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("-n", "--n_clients", type=int, default=3)
+    parser.add_argument("-d", "--data_root_dir", type=str, nargs="?", default="/tmp/nvflare/image_stats/data")
+    parser.add_argument("-o", "--stats_output_path", type=str, nargs="?", default="statistics/stats.json")
+    parser.add_argument("-j", "--job_dir", type=str, nargs="?", default="/tmp/nvflare/jobs/image_stats")
+    parser.add_argument("-w", "--work_dir", type=str, nargs="?", default="/tmp/nvflare/workspace/image_stats")
+    parser.add_argument("-co", "--export_config", action="store_true", help="config only mode, export config")
+
+    return parser.parse_args()
+
+
+def main():
+    args = define_parser()
+
+    n_clients = args.n_clients
+    data_root_dir = args.data_root_dir
+    output_path = args.stats_output_path
+    job_dir = args.job_dir
+    work_dir = args.work_dir
+    export_config = args.export_config
+
+    statistic_configs = {"count": {}, "histogram": {"*": {"bins": 20, "range": [0, 256]}}}
+    # define local stats generator
+    stats_generator = ImageStatistics(data_root_dir)
+
+    job = StatsJob(
+        job_name="stats_image",
+        statistic_configs=statistic_configs,
+        stats_generator=stats_generator,
+        output_path=output_path,
+    )
+
+    sites = [f"site-{i + 1}" for i in range(n_clients)]
+    job.setup_clients(sites)
+
+    if export_config:
+        job.export_job(job_dir)
+    else:
+        job.simulator_run(work_dir, gpu="0")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/...ions/02.1_federated_statistics/federated_statistics_with_image_data/code/requirements.txt b/...ions/02.1_federated_statistics/federated_statistics_with_image_data/code/requirements.txt
@@ -0,0 +1,8 @@
+numpy
+monai[itk]
+pandas
+kaleido
+matplotlib
+jupyter
+notebook
+tdigest