Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync #903

Merged
merged 26 commits into from
Jul 24, 2024
Merged

Sync #903

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
d90d19e
fix train metrics errors
VukW May 8, 2024
ada9577
Output_metrics is filled only for the last weighted avg_type
VukW May 14, 2024
73174c7
Refactored logger
VukW May 14, 2024
7303736
general refactoring & typing
VukW May 14, 2024
36bbfa9
Fix after changing step output shape
VukW May 14, 2024
62ffb14
dynamic lists instead of fixed size for handling dynamic batch_size
VukW May 21, 2024
3987439
Fix for segmentation
VukW May 21, 2024
26b33a9
a crutch for deep_* and sdnet architectures (that return list)
VukW May 21, 2024
887def5
Merge remote-tracking branch 'origin/VukW-patch-1' into fix_train_met…
VukW May 22, 2024
71273ce
turning training dataset shuffle on
VukW May 22, 2024
d30cf20
Test fix for the case when both label and value_to_pred exist
VukW May 23, 2024
92c4387
bugfix when label is not present
VukW May 23, 2024
8409f7a
Merge branch 'master' into fix_train_metrics
VukW May 23, 2024
ac2a442
Merge branch 'master' into fix_train_metrics
sarthakpati Jun 4, 2024
d0d25fb
Do not assert metric shape; lets take a first evaluated instead
VukW Jun 6, 2024
ca8a904
Blacked
VukW Jun 6, 2024
53eb145
correct forward operations order
hongbozheng Jul 9, 2024
346e7c2
fixed pip version for CI tests
sarthakpati Jul 9, 2024
df86271
removed additional space
sarthakpati Jul 9, 2024
4f9dfd7
Merge pull request #898 from mlcommons/master_pip-fix
scap3yvt Jul 9, 2024
b604142
Merge branch 'master' into patch-1
sarthakpati Jul 9, 2024
2cb0ac0
Corrected forward operations order and norm layer parameter
hongbozheng Jul 10, 2024
00c65bd
Merge pull request #897 from hongbozheng/patch-1
Geeks-Sid Jul 13, 2024
6b22745
Error correction in validation and testing loops
VukW Jul 18, 2024
0829662
Merge branch 'master' into fix_train_metrics
VukW Jul 18, 2024
8b9fb47
Merge pull request #868 from VukW/fix_train_metrics
sarthakpati Jul 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .devcontainer/onCreateCommand.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/usr/bin/env bash

python -m ensurepip # ensures pip is installed in the current environment
pip install --upgrade pip
pip install --upgrade pip==24.0
pip install wheel
pip install openvino-dev==2023.0.1 # [OPTIONAL] to generate optimized models for inference
pip install mlcube_docker # [OPTIONAL] to deploy GaNDLF models as MLCube-compliant Docker containers
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/black.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ jobs:

- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install --upgrade pip==24.0
python -m pip install black==${{ env.BLACK_VERSION }}

- name: Run tests
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ jobs:
${{ runner.os }}-pip-
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install --upgrade pip==24.0
pip install scikit-build
pip install -e .
pip install build
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/mlcube-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ jobs:
run: |
sudo apt-get update
sudo apt-get install libvips libvips-tools -y
python -m pip install --upgrade pip
python -m pip install --upgrade pip==24.0
python -m pip install wheel
python -m pip install openvino-dev==2023.0.1 mlcube_docker
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cpu
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/openfl-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ jobs:
run: |
sudo apt-get update
sudo apt-get install libvips libvips-tools -y
python -m pip install --upgrade pip
python -m pip install --upgrade pip==24.0
python -m pip install wheel
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cpu
pip install -e .
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/publish-nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ jobs:
- name: Install dependencies
if: env.publish_nightly
run: |
python -m pip install --upgrade pip
python -m pip install --upgrade pip==24.0
pip install scikit-build
pip install -e .
pip install build
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/python-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ jobs:
run: |
sudo apt-get update
sudo apt-get install libvips libvips-tools -y
python -m pip install --upgrade pip
python -m pip install --upgrade pip==24.0
python -m pip install wheel
python -m pip install openvino-dev==2023.0.1 mlcube_docker
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cpu
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile-CPU
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ LABEL version=1.0
RUN apt-get update && apt-get install -y software-properties-common
RUN add-apt-repository ppa:deadsnakes/ppa
RUN apt-get update && apt-get install -y python3.9 python3-pip libjpeg8-dev zlib1g-dev python3-dev libpython3.9-dev libffi-dev libgl1
RUN python3.9 -m pip install --upgrade pip
RUN python3.9 -m pip install --upgrade pip==24.0
# EXPLICITLY install cpu versions of torch/torchvision (not all versions have +cpu modes on PyPI...)
RUN python3.9 -m pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cpu
RUN python3.9 -m pip install openvino-dev==2023.0.1 opencv-python-headless mlcube_docker
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile-CUDA11.8
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y software-properties-common
RUN add-apt-repository ppa:deadsnakes/ppa
RUN apt-get update && apt-get install -y python3.9 python3-pip libjpeg8-dev zlib1g-dev python3-dev libpython3.9-dev libffi-dev libgl1
RUN python3.9 -m pip install --upgrade pip
RUN python3.9 -m pip install --upgrade pip==24.0
RUN python3.9 -m pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
RUN python3.9 -m pip install openvino-dev==2023.0.1 opencv-python-headless mlcube_docker

Expand Down
2 changes: 1 addition & 1 deletion Dockerfile-CUDA12.1
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y software-properties-common
RUN add-apt-repository ppa:deadsnakes/ppa
RUN apt-get update && apt-get install -y python3.9 python3-pip libjpeg8-dev zlib1g-dev python3-dev libpython3.9-dev libffi-dev libgl1
RUN python3.9 -m pip install --upgrade pip
RUN python3.9 -m pip install --upgrade pip==24.0
RUN python3.9 -m pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
RUN python3.9 -m pip install openvino-dev==2023.0.1 opencv-python-headless mlcube_docker

Expand Down
4 changes: 2 additions & 2 deletions Dockerfile-ROCm
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ LABEL version=1.0
RUN apt-get update && apt-get install -y software-properties-common
RUN add-apt-repository ppa:deadsnakes/ppa
RUN apt-get update && apt-get install -y python3.9 python3-pip libjpeg8-dev zlib1g-dev python3-dev libpython3.9-dev libffi-dev libgl1
RUN python3.9 -m pip install --upgrade pip
RUN python3.9 -m pip install --upgrade pip==24.0
RUN python3.9 -m pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/rocm5.6
RUN python3.9 -m pip install --upgrade pip && python3.9 -m pip install openvino-dev==2023.0.1 opencv-python-headless mlcube_docker
RUN python3.9 -m pip install --upgrade pip==24.0 && python3.9 -m pip install openvino-dev==2023.0.1 opencv-python-headless mlcube_docker
RUN apt-get update && apt-get install -y libgl1

# Do some dependency installation separately here to make layer caching more efficient
Expand Down
69 changes: 26 additions & 43 deletions GANDLF/compute/forward_pass.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import os
import pathlib
from typing import Optional, Tuple
from typing import Optional, Tuple, Union

import numpy as np
import pandas as pd
Expand Down Expand Up @@ -51,21 +51,20 @@ def validate_network(
print("*" * 20)
# Initialize a few things
total_epoch_valid_loss = 0
total_epoch_valid_metric = {}
total_epoch_valid_metric: dict[str, Union[float, np.array]] = {}
average_epoch_valid_metric = {}

for metric in params["metrics"]:
if "per_label" in metric:
total_epoch_valid_metric[metric] = []
total_epoch_valid_metric[metric] = np.zeros(1)
else:
total_epoch_valid_metric[metric] = 0

logits_list = []
subject_id_list = []
is_classification = params.get("problem_type") == "classification"
calculate_overall_metrics = (
(params["problem_type"] == "classification")
or (params["problem_type"] == "regression")
params["problem_type"] in {"classification", "regression"}
) and mode == "validation"
is_inference = mode == "inference"

Expand Down Expand Up @@ -107,10 +106,8 @@ def validate_network(

# get ground truths for classification problem, validation set
if calculate_overall_metrics:
(
ground_truth_array,
predictions_array,
) = get_ground_truths_and_predictions_tensor(params, "validation_data")
ground_truth_array = []
predictions_array = []

for batch_idx, (subject) in enumerate(
tqdm(valid_dataloader, desc="Looping over " + mode + " data")
Expand Down Expand Up @@ -193,6 +190,7 @@ def validate_network(

if params["save_output"] or is_inference:
# we divide by scaling factor here because we multiply by it during loss/metric calculation
# TODO: regression-only, right?
outputToWrite += (
str(epoch)
+ ","
Expand All @@ -206,23 +204,15 @@ def validate_network(
)

if calculate_overall_metrics:
predictions_array[batch_idx] = (
torch.argmax(pred_output[0], 0).cpu().item()
)
ground_truth_array.append(label_ground_truth.item())
# TODO: that's for classification only. What about regression?
predictions_array.append(torch.argmax(pred_output[0], 0).cpu().item())
# # Non network validation related
total_epoch_valid_loss += final_loss.detach().cpu().item()
for metric in final_metric.keys():
if isinstance(total_epoch_valid_metric[metric], list):
if len(total_epoch_valid_metric[metric]) == 0:
total_epoch_valid_metric[metric] = np.array(
final_metric[metric]
)
else:
total_epoch_valid_metric[metric] += np.array(
final_metric[metric]
)
else:
total_epoch_valid_metric[metric] += final_metric[metric]
for metric, metric_val in final_metric.items():
total_epoch_valid_metric[metric] = (
total_epoch_valid_metric[metric] + metric_val
)

else: # for segmentation problems OR regression/classification when no label is present
grid_sampler = torchio.inference.GridSampler(
Expand Down Expand Up @@ -315,8 +305,7 @@ def validate_network(

# save outputs
if params["problem_type"] == "segmentation":
output_prediction = aggregator.get_output_tensor()
output_prediction = output_prediction.unsqueeze(0)
output_prediction = aggregator.get_output_tensor().unsqueeze(0)
if params["save_output"]:
img_for_metadata = torchio.ScalarImage(
tensor=subject["1"]["data"].squeeze(0),
Expand Down Expand Up @@ -386,16 +375,18 @@ def validate_network(
# final regression output
output_prediction = output_prediction / len(patch_loader)
if calculate_overall_metrics:
predictions_array[batch_idx] = (
# TOD: what? regression and argmax?
predictions_array.append(
torch.argmax(output_prediction[0], 0).cpu().item()
)
ground_truth_array.append(label_ground_truth.item())
if params["save_output"]:
outputToWrite += (
str(epoch)
+ ","
+ subject["subject_id"][0]
+ ","
+ str(output_prediction)
+ str(output_prediction[0])
+ "\n"
)

Expand All @@ -407,7 +398,6 @@ def validate_network(
n.squeeze(), raw_input=image[i].squeeze(-1)
)

output_prediction = output_prediction.squeeze(-1)
if is_inference and is_classification:
logits_list.append(output_prediction)
subject_id_list.append(subject.get("subject_id")[0])
Expand All @@ -418,9 +408,8 @@ def validate_network(
if label_ground_truth.shape[0] == 3:
label_ground_truth = label_ground_truth[0, ...].unsqueeze(0)
# we always want the ground truth to be in the same format as the prediction
# add batch dim
label_ground_truth = label_ground_truth.unsqueeze(0)
if label_ground_truth.shape[-1] == 1:
label_ground_truth = label_ground_truth.squeeze(-1)
final_loss, final_metric = get_loss_and_metrics(
image,
label_ground_truth,
Expand All @@ -440,17 +429,9 @@ def validate_network(
# loss.cpu().data.item()
total_epoch_valid_loss += final_loss.cpu().item()
for metric in final_metric.keys():
if isinstance(total_epoch_valid_metric[metric], list):
if len(total_epoch_valid_metric[metric]) == 0:
total_epoch_valid_metric[metric] = np.array(
final_metric[metric]
)
else:
total_epoch_valid_metric[metric] += np.array(
final_metric[metric]
)
else:
total_epoch_valid_metric[metric] += final_metric[metric]
total_epoch_valid_metric[metric] = (
total_epoch_valid_metric[metric] + final_metric[metric]
)

if label_ground_truth is not None:
if params["verbose"]:
Expand Down Expand Up @@ -486,7 +467,9 @@ def validate_network(
# get overall stats for classification
if calculate_overall_metrics:
average_epoch_valid_metric = overall_stats(
predictions_array, ground_truth_array, params
torch.Tensor(predictions_array),
torch.Tensor(ground_truth_array),
params,
)
average_epoch_valid_metric = print_and_format_metrics(
average_epoch_valid_metric,
Expand Down
66 changes: 35 additions & 31 deletions GANDLF/compute/loss_and_metric.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import sys
from typing import Dict, Tuple
import warnings
from typing import Dict, Tuple, Union
from GANDLF.losses import global_losses_dict
from GANDLF.metrics import global_metrics_dict
import torch
Expand All @@ -13,7 +14,7 @@ def get_metric_output(
prediction: torch.Tensor,
target: torch.Tensor,
params: dict,
) -> float:
) -> Union[float, list]:
"""
This function computes the metric output for a given metric function, prediction and target.

Expand All @@ -36,6 +37,12 @@ def get_metric_output(
if len(temp) > 1:
return temp
else:
# TODO: this branch is extremely age case and is buggy.
# Overall the case when metric returns a list but of length 1 is very rare. The only case is when
# the metric returns Nx.. tensor (i.e. without aggregation by elements) and batch_size==N==1. This branch
# would definitely fail for such a metrics like
# MulticlassAccuracy(num_classes=3, multidim_average="samplewise")
# Maybe the best solution is to raise an error here if metric is configured to return samplewise results?
return metric_output.item()


Expand Down Expand Up @@ -115,41 +122,38 @@ def get_loss_and_metrics(
loss_kld = global_losses_dict["kld"](prediction[2], prediction[3])
loss_cycle = global_losses_dict["mse"](prediction[2], prediction[4], None)
loss = 0.01 * loss_kld + loss_reco + 10 * loss_seg + loss_cycle
elif deep_supervision_model:
# this is for models that have deep-supervision
for i, _ in enumerate(prediction):
# loss is calculated based on resampled "soft" labels using a pre-defined weights array
loss += (
loss_function(prediction[i], ground_truth_resampled[i], params)
* loss_weights[i]
)
else:
if deep_supervision_model:
# this is for models that have deep-supervision
for i, _ in enumerate(prediction):
# loss is calculated based on resampled "soft" labels using a pre-defined weights array
loss += (
loss_function(prediction[i], ground_truth_resampled[i], params)
* loss_weights[i]
)
else:
loss = loss_function(prediction, target, params)
loss = loss_function(prediction, target, params)
metric_output = {}

# Metrics should be a list
for metric in params["metrics"]:
metric_lower = metric.lower()
metric_output[metric] = 0
if metric_lower in global_metrics_dict:
metric_function = global_metrics_dict[metric_lower]
if sdnet_check:
metric_output[metric] = get_metric_output(
metric_function, prediction[0], target.squeeze(-1), params
if metric_lower not in global_metrics_dict:
warnings.warn("WARNING: Could not find the requested metric '" + metric)
continue

metric_function = global_metrics_dict[metric_lower]
if sdnet_check:
metric_output[metric] = get_metric_output(
metric_function, prediction[0], target.squeeze(-1), params
)
elif deep_supervision_model:
for i, _ in enumerate(prediction):
metric_output[metric] += get_metric_output(
metric_function, prediction[i], ground_truth_resampled[i], params
)
else:
if deep_supervision_model:
for i, _ in enumerate(prediction):
metric_output[metric] += get_metric_output(
metric_function,
prediction[i],
ground_truth_resampled[i],
params,
)

else:
metric_output[metric] = get_metric_output(
metric_function, prediction, target, params
)
else:
metric_output[metric] = get_metric_output(
metric_function, prediction, target, params
)
return loss, metric_output
Loading
Loading