Skip to content

Commit

Permalink
Merge branch 'main' into add_exp_tracking_for_DLI
Browse files Browse the repository at this point in the history
  • Loading branch information
nvkevlu authored Feb 10, 2025
2 parents 9d8e9ec + 7f03432 commit 57f4f27
Show file tree
Hide file tree
Showing 11 changed files with 1,042 additions and 41 deletions.
Original file line number Diff line number Diff line change
@@ -1,19 +1,47 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "9d86124d-3f0d-4cbd-8855-d1fe6877ee6c",
"cell_type": "markdown",
"id": "ed32af6f",
"metadata": {},
"outputs": [],
"source": []
"source": [
"# Introduction: Develop Federated Learning Applications\n",
"\n",
"In this chapter, we will explore the process of developing federated learning applications. We will start by exploring federated statistics and getting different visualizations from the data. We will then convert PyTorch Lightning code for use with NVFlare. We will then examine machine learning algorithms and look at how to convert logistics regression, kmeans, and survival analysis for use with federated learning. Finally, we will look into the Client API and conclude with a recap of the covered topics.\n",
"\n",
"\n",
"2.1. **Federated Statistics**\n",
" * [Federated Statistics with image data](../02.1_federated_statistics/0federated_statistics_with_image_data/federated_statistics_with_image_data.ipynb)\n",
" * [Federated Statistics with tabular data](../02.1_federated_statistics/federated_statistics_with_tabular_data/federated_statistics_with_tabular_data.ipynb)\n",
"\n",
"2.2. **Convert PyTorch Lightning to Federated Learning**\n",
"\n",
" * [Convert Torch Lightning to FL](../02.2_convert_torch_lightning_to_federated_learning/convert_torch_lightning_to_fl.ipynb)\n",
"\n",
"\n",
"2.3. **How to Convert Machine Learning Algorithms to Federated Algorithms**\n",
"\n",
" * [Convert Logistics Regression to federated learning](../02.3_convert_machine_learning_to_federated_learning/02.3.1_convert_Logistics_regression_to_federated_learning/convert_lr_to_fl.ipynb)\n",
" * [Convert KMeans to federated learning](../02.3_convert_machine_learning_to_federated_learning/02.3.2_convert_kmeans_to_federated_learning/convert_kmeans_to_fl.ipynb)\n",
" * [Convert Survival Analysis to federated learning](../02.3_convert_machine_learning_to_federated_learning/02.3.3_convert_survival_analysis_to_federated_learning/convert_survival_analysis_to_fl.ipynb)\n",
"\n",
"2.4. **Client API** \n",
"\n",
" * [NVFlare Client API](../02.4_client_api/Client_api.ipynb)\n",
"\n",
"2.5. [Recap of the covered topics](../02.5_recap/recap.ipynb)\n",
"\n",
"\n",
"\n",
"Let's get started with [Federated Statistics](../02.1_federated_statistics/0federated_statistics_with_image_data/federated_statistics_with_image_data.ipynb)\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "nvflare_example",
"display_name": "Python 3",
"language": "python",
"name": "nvflare_example"
"name": "python3"
},
"language_info": {
"codemirror_mode": {
Expand All @@ -25,7 +53,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.2"
"version": "3.10.12"
}
},
"nbformat": 4,
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/bin/bash

SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
DATASET_PATH="/tmp/nvflare/image_stats/data"

if [ ! -d $DATASET_PATH ]; then
mkdir -p $DATASET_PATH
fi

source_url="$1"
echo "download url = ${source_url}"
if [ -n "${source_url}" ]; then
if [ ! -f "${DATASET_PATH}/COVID-19_Radiography_Dataset.zip" ]; then
wget -O "${DATASET_PATH}/COVID-19_Radiography_Dataset.zip" "${source_url}"
else
echo "zip file exists."
fi
if [ ! -d "${DATASET_PATH}/COVID-19_Radiography_Dataset" ]; then
unzip -d $DATASET_PATH "${DATASET_PATH}/COVID-19_Radiography_Dataset.zip"
else
echo "image files exist."
fi
else
echo "empty URL, nothing downloaded, you need to provide real URL to download"
fi

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse
import glob
import json
import os
import random

SEED = 0


def create_datasets(root, subdirs, extension, shuffle, seed):
random.seed(seed)

data_lists = []
for subdir in subdirs:
search_string = os.path.join(root, "**", subdir, "images", "*" + extension)
data_list = glob.glob(search_string, recursive=True)

assert (
len(data_list) > 0
), f"No images found using {search_string} for subdir '{subdir}' and extension '{extension}'!"

if shuffle:
random.shuffle(data_list)

data_lists.append(data_list)

return data_lists


def save_data_list(data, data_list_file, data_root, key="data"):
data_list = []
for d in data:
data_list.append({"image": d.replace(data_root + os.path.sep, "")})

os.makedirs(os.path.dirname(data_list_file), exist_ok=True)
with open(data_list_file, "w") as f:
json.dump({key: data_list}, f, indent=4)

print(f"Saved {len(data_list)} entries at {data_list_file}")


def prepare_data(
input_dir: str,
input_ext: str = ".png",
output_dir: str = "/tmp/nvflare/image_stats/data",
sub_dirs: str = "COVID,Lung_Opacity,Normal,Viral Pneumonia",
):
sub_dir_list = [sd for sd in sub_dirs.split(",")]

data_lists = create_datasets(root=input_dir, subdirs=sub_dir_list, extension=input_ext, shuffle=True, seed=SEED)
print(f"Created {len(data_lists)} data lists for {sub_dir_list}.")

site_id = 1
for subdir, data_list in zip(sub_dir_list, data_lists):
save_data_list(data_list, os.path.join(output_dir, f"site-{site_id}_{subdir}.json"), data_root=input_dir)
site_id += 1


def main():
parser = argparse.ArgumentParser()
parser.add_argument("--input_dir", type=str, required=True, help="Location of image files")
parser.add_argument("--input_ext", type=str, default=".png", help="Search extension")
parser.add_argument(
"--output_dir", type=str, default="/tmp/nvflare/image_stats/data", help="Output location of data lists"
)
parser.add_argument(
"--subdirs",
type=str,
default="COVID,Lung_Opacity,Normal,Viral Pneumonia",
help="A list of sub-folders to include.",
)
args = parser.parse_args()

assert "," in args.subdirs, "Expecting a comma separated list of subdirs names"

prepare_data(args.input_dir, args.input_ext, args.output_dir, args.subdirs)


if __name__ == "__main__":
main()
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse

from src.image_statistics import ImageStatistics

from nvflare.job_config.stats_job import StatsJob


def define_parser():
parser = argparse.ArgumentParser()
parser.add_argument("-n", "--n_clients", type=int, default=3)
parser.add_argument("-d", "--data_root_dir", type=str, nargs="?", default="/tmp/nvflare/image_stats/data")
parser.add_argument("-o", "--stats_output_path", type=str, nargs="?", default="statistics/stats.json")
parser.add_argument("-j", "--job_dir", type=str, nargs="?", default="/tmp/nvflare/jobs/image_stats")
parser.add_argument("-w", "--work_dir", type=str, nargs="?", default="/tmp/nvflare/workspace/image_stats")
parser.add_argument("-co", "--export_config", action="store_true", help="config only mode, export config")

return parser.parse_args()


def main():
args = define_parser()

n_clients = args.n_clients
data_root_dir = args.data_root_dir
output_path = args.stats_output_path
job_dir = args.job_dir
work_dir = args.work_dir
export_config = args.export_config

statistic_configs = {"count": {}, "histogram": {"*": {"bins": 20, "range": [0, 256]}}}
# define local stats generator
stats_generator = ImageStatistics(data_root_dir)

job = StatsJob(
job_name="stats_image",
statistic_configs=statistic_configs,
stats_generator=stats_generator,
output_path=output_path,
)

sites = [f"site-{i + 1}" for i in range(n_clients)]
job.setup_clients(sites)

if export_config:
job.export_job(job_dir)
else:
job.simulator_run(work_dir, gpu="0")


if __name__ == "__main__":
main()
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
numpy
monai[itk]
pandas
kaleido
matplotlib
jupyter
notebook
tdigest
Loading

0 comments on commit 57f4f27

Please sign in to comment.