Issue with Model Generalization for infection segmentation #7863

SrMateos · 2024-06-20T08:49:11Z

SrMateos
Jun 20, 2024

Hi!

I'm currently working on training a model using MONAI and PyTorch Lightning and the model doesn't seem to generalize well.

I am using a Covid-19 infection dataset with Coronacases and Radiopaedia volumes. You can find it here

HRCT transformations are being used for "coronacases" volumes of the dataset, while CBCT are being used for "radiopaedia" volumes. Val transformations are used for validation, while the others are used for training. The differenciation relays on the orientation of the volumes and the intensity scaling.

SPATIAL_SIZE = (64, 64, 64)
NUM_RAND_PATCHES = 16
LEVEL = -650
WIDTH = 1500
LOWER_BOUND_WINDOW_HRCT = LEVEL - (WIDTH // 2)
UPPER_BOUND_WINDOW_HRCT = LEVEL + (WIDTH // 2)
LOWER_BOUND_WINDOW_CBCT = 0
UPPER_BOUND_WINDOW_CBCT = 255

def get_hrct_transforms():
    return monai.transforms.Compose(
        [
            monai.transforms.LoadImaged(keys=('img', 'mask'), image_only=True, ensure_channel_first=True),
            monai.transforms.Orientationd(keys=('img', 'mask'), axcodes="PLI"),
            monai.transforms.RandCropByPosNegLabeld(keys=('img', 'mask'), label_key="mask",
                                                    spatial_size=SPATIAL_SIZE, pos=1, neg=1,
                                                    num_samples=NUM_RAND_PATCHES, allow_smaller=True),
            # monai.transforms.SpatialPadd(keys=('img', 'mask'), spatial_size=SPATIAL_SIZE, method='symmetric'),
            monai.transforms.ScaleIntensityRanged(keys=('img',), a_min=LOWER_BOUND_WINDOW_HRCT,
                                                  a_max=UPPER_BOUND_WINDOW_HRCT, b_min=0.0, b_max=1.0, clip=True),
            monai.transforms.ToTensord(keys=("img", "mask")),
        ]
    )


def get_cbct_transforms():
    return monai.transforms.Compose(
        [
            monai.transforms.LoadImaged(keys=('img', 'mask'), image_only=True, ensure_channel_first=True),
            monai.transforms.Orientationd(keys=('img', 'mask'), axcodes="ALI"),
            monai.transforms.RandCropByPosNegLabeld(keys=('img', 'mask'), label_key="mask",
                                                    spatial_size=SPATIAL_SIZE, pos=1, neg=1,
                                                    num_samples=NUM_RAND_PATCHES, allow_smaller=True),
            # monai.transforms.SpatialPadd(keys=('img', 'mask'), spatial_size=SPATIAL_SIZE, method='symmetric'),
            monai.transforms.ScaleIntensityd(keys='img', minv=LOWER_BOUND_WINDOW_CBCT, maxv=UPPER_BOUND_WINDOW_CBCT),
            monai.transforms.ToTensord(keys=("img", "mask")),
        ]
    )


def get_val_hrct_transforms():
    return monai.transforms.Compose(
        [
            monai.transforms.LoadImaged(keys=('img', 'mask'), image_only=True, ensure_channel_first=True),
            monai.transforms.Orientationd(keys=('img', 'mask'), axcodes="PLI"),
            monai.transforms.ScaleIntensityRanged(keys=('img',), a_min=LOWER_BOUND_WINDOW_HRCT,
                                                  a_max=UPPER_BOUND_WINDOW_HRCT, b_min=0.0, b_max=1.0,
                                                  clip=True),
            monai.transforms.ToTensord(keys=("img", "mask")),
        ]
    )


def get_val_cbct_transforms():
    return monai.transforms.Compose(
        [
            monai.transforms.LoadImaged(keys=('img', 'mask'), image_only=True, ensure_channel_first=True),
            monai.transforms.Orientationd(keys=('img', 'mask'), axcodes="ALI"),
            monai.transforms.ScaleIntensityd(keys='img', minv=LOWER_BOUND_WINDOW_CBCT, maxv=UPPER_BOUND_WINDOW_CBCT),
            monai.transforms.ToTensord(keys=("img", "mask")),
        ]
    )

Additionally, I am using an Unet architecture and I have tried trainning with dice and generalized dice loss functions. The optimizer is an AdamW and currently trying over 5000 epocs. The issue I'm facing is that, despite the training loss decreasing and the training Dice score increasing as expected, the validation loss doesn't decrease as much as I would like and the validation Dice score doesn't improve significantly.

class Net(L.pytorch.LightningModule):
    def __init__(self):
        super(Net, self).__init__()
        self.save_hyperparameters()
        self.model = UNet(
            spatial_dims=3,
            in_channels=1,
            out_channels=1,
            channels=(16, 32, 64, 128, 256),
            strides=(2, 2, 2, 2),
            num_res_units=2
        )
        self.dice_metric = DiceMetric(include_background=False, reduction="mean")
        self.train_dice_metric = DiceMetric(include_background=False, reduction="mean")
        self.loss_function = monai.losses.GeneralizedDiceLoss(sigmoid=True, include_background=False)
        self.post_pred = monai.transforms.Compose(
            [monai.transforms.Activations(sigmoid=True), monai.transforms.AsDiscrete(threshold_values=0.5)])
        self.post_label = monai.transforms.Compose([monai.transforms.AsDiscrete(threshold_values=0.5)])
        self.best_val_dice = 0
        self.best_val_epoch = 0
        self.validation_step_outputs = []
        self.train_step_outputs = []
        self.training_ds = None
        self.validation_ds = None
        self.test_ds = None

    def prepare_data(self) -> None:
        # Load images and masks
        logging.info(f"Loading images from {COVID_CASES_PATH}")
        images = load_images_from_path(COVID_CASES_PATH)
        labels = load_images_from_path(INFECTION_MASKS_PATH)

        # Convert images and masks to a list of dictionaries with keys "img" and "mask"
        data_dicts = np.array([{"img": img, "mask": mask} for img, mask in zip(images, labels)])
        logging.debug(data_dicts)

        shuffler = np.random.RandomState(SEED)
        shuffler.shuffle(data_dicts)
        data_dicts = list(data_dicts)

        # Split the data into training (70%), validation (10%), and test sets (20%)
        test_split = int(len(data_dicts) * 0.2)
        val_split = int(len(data_dicts) * 0.1)

        train_paths = data_dicts[test_split + val_split:]
        val_paths = data_dicts[test_split:test_split + val_split]
        test_paths = data_dicts[:test_split]

        # Define the CovidDataset instances for training, validation, and test
        self.training_ds = CovidDataset(volumes=train_paths, hrct_transform=get_hrct_transforms(),
                                        cbct_transform=get_cbct_transforms())
        self.validation_ds = CovidDataset(volumes=val_paths, hrct_transform=get_val_hrct_transforms(),
                                          cbct_transform=get_val_cbct_transforms())
        self.test_ds = CovidDataset(volumes=test_paths, hrct_transform=get_val_hrct_transforms(),
                                    cbct_transform=get_val_cbct_transforms())
    
    def train_dataloader(self):
        train_dataloader = DataLoader(self.training_ds, batch_size=1, num_workers=4)
        return train_dataloader

    def val_dataloader(self):
        val_dataloader = DataLoader(self.validation_ds, batch_size=1, num_workers=4)
        return val_dataloader

    def configure_optimizers(self):
        optimizer = torch.optim.AdamW(self.model.parameters(), lr=1e-3, weight_decay=1e-5)
        return optimizer

    def training_step(self, batch, batch_idx):
        inputs, labels = batch["img"], batch["mask"]
        outputs = self.forward(inputs)

        loss = self.loss_function(outputs, labels)
        outputs = [self.post_pred(i) for i in decollate_batch(outputs)]
        labels = [self.post_label(i) for i in decollate_batch(labels)]
        self.train_dice_metric(y_pred=outputs, y=labels)

        train_loss_dictionary = {"loss": loss}
        self.train_step_outputs.append(train_loss_dictionary)
        return train_loss_dictionary

    def on_train_epoch_end(self) -> None:
        train_loss = 0
        for output in self.train_step_outputs:
            train_loss += output["loss"].sum().item()

        mean_train_loss = torch.tensor(train_loss / len(self.train_step_outputs)) # Total loss of batches / number of batches
        mean_train_dice = self.train_dice_metric.aggregate().item()
        self.train_dice_metric.reset()

        self.log_dict({"train_dice": mean_train_dice, "train_loss": train_loss / len(self.train_step_outputs)}, prog_bar=True)

        tensorboard_logs = {
            "train_dice": mean_train_dice,
            "train_loss": mean_train_loss,
        }

        self.logger.experiment.add_scalars("losses", {"train": mean_train_loss}, self.current_epoch)
        self.logger.experiment.add_scalars("dice", {"train": mean_train_dice}, self.current_epoch)
        self.logger.log_metrics(tensorboard_logs, step=self.current_epoch)

        self.train_step_outputs.clear()

    def validation_step(self, batch, batch_idx):
        inputs, labels = batch["img"], batch["mask"]
        roi_size = VALIDATION_INFERENCE_ROI_SIZE
        sw_batch_size = 4

        outputs = sliding_window_inference(inputs, roi_size, sw_batch_size, self.forward)
        loss = self.loss_function(outputs, labels)
        outputs = [self.post_pred(i) for i in decollate_batch(outputs)]
        labels = [self.post_label(i) for i in decollate_batch(labels)]
        self.dice_metric(y_pred=outputs, y=labels)

        validation_loss_dictionary = {"loss": loss}
        self.validation_step_outputs.append(validation_loss_dictionary)
        return validation_loss_dictionary

    def on_validation_epoch_end(self) -> None:
        val_loss = 0
        for output in self.validation_step_outputs:
            val_loss += output["loss"].sum().item()

        mean_val_loss = torch.tensor(val_loss / len(self.validation_step_outputs))
        mean_val_dice = self.dice_metric.aggregate().item()
        self.dice_metric.reset()

        self.log_dict({"val_dice": mean_val_dice, "val_loss": val_loss / len(self.validation_step_outputs)}, prog_bar=True)

        tensorboard_logs = {
            "val_dice": mean_val_dice,
            "val_loss": mean_val_loss,
        }

        self.logger.experiment.add_scalars("losses", {"val_loss": mean_val_loss}, self.current_epoch)
        self.logger.experiment.add_scalars("dice", {"val_dice": mean_val_dice}, self.current_epoch)
        self.logger.log_metrics(tensorboard_logs, step=self.current_epoch)

        if mean_val_dice > self.best_val_dice:
            self.best_val_dice = mean_val_dice
            self.best_val_epoch = self.current_epoch

        self.validation_step_outputs.clear()

Here you can see some logs of the terminal so you can have a better understanding of the training process:

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]

  | Name          | Type                | Params
------------------------------------------------------
0 | model         | UNet                | 4.8 M 
1 | loss_function | GeneralizedDiceLoss | 0     
------------------------------------------------------
4.8 M     Trainable params
0         Non-trainable params
4.8 M     Total params
19.222    Total estimated model params size (MB)
Epoch 0:   0%|                                                                                                                                                                                                        | 0/14 [00:00<?, ?it/s]/home/user/miniconda3/envs/env1/lib/python3.11/site-packages/monai/losses/dice.py:360: UserWarning: single channel prediction, `include_background=False` ignored.
  warnings.warn("single channel prediction, `include_background=False` ignored.")
Epoch 0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:12<00:00,  1.08it/s, v_num=50]
Epoch 30: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:12<00:00,  1.13it/s, v_num=50, val_dice=0.0695, val_loss=0.966, train_dice=0.187, train_loss=0.360]
Epoch 30: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:30<00:00,  0.46it/s, v_num=50, val_dice=0.0641, val_loss=0.966, train_dice=0.187, train_loss=0.360]
Epoch 46:   7%|████████▌                                                                                                              | 1/14 [00:00<00:10,  1.28it/s, v_num=50, val_dice=0.0305, val_loss=0.962, train_dice=0.300, train_lossEpoch 46:  14%|█████████████████                                                                                                      | 2/14 [00:04<00:24,  0.49it/s, v_num=50, val_dice=0.0305, val_loss=0.962, train_dice=0.300, train_lossEpoch 46:  14%|█████████████████                                                                                                      | 2/14 [00:04<00:24,  0.49it/s, v_num=50, val_dice=0.0305, val_loss=0.962, train_dice=0.300, train_lossEpoch 46:  21%|████████████████████████                                                                                        | 3/14 [00:04<00:15,  0.70it/s, v_num=50, val_dice=0.0305, val_loss=0.962, train_dice=0.300, train_loss=0.348]
Epoch 67: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:11<00:00,  1.18it/s, v_num=50, val_dice=0.0271, val_loss=0.960, train_dice=0.302, train_loss=0.317]
Validation DataLoader 0:  50%|████████████████████████████████████████████████████████████████████████████████████████████                                                                                            | 1/2 [00:09<00:09,  0.
Epoch 67: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:30<00:00,  0.46it/s, v_num=50, val_dice=0.0296, val_loss=0.957, train_dice=0.302, train_loss=0.317]
Epoch 107: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:12<00:00,  1.15it/s, v_num=50, val_dice=0.0325, val_loss=0.955, train_dice=0.341, train_loss=0.291]
Epoch 107: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:30<00:00,  0.46it/s, v_num=50, val_dice=0.0477, val_loss=0.948, train_dice=0.341, train_loss=0.291]
Epoch 163: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:11<00:00,  1.17it/s, v_num=50, val_dice=0.0599, val_loss=0.940, train_dice=0.443, train_loss=0.271]
Epoch 163: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:30<00:00,  0.46it/s, v_num=50, val_dice=0.0507, val_loss=0.946, train_dice=0.443, train_loss=0.271]
Epoch 249: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:12<00:00,  1.16it/s, v_num=50, val_dice=0.0568, val_loss=0.921, train_dice=0.382, train_loss=0.250]
Epoch 249: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:30<00:00,  0.46it/s, v_num=50, val_dice=0.0507, val_loss=0.917, train_dice=0.382, train_loss=0.250]
Epoch 264: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:11<00:00,  1.19it/s, v_num=50, val_dice=0.0502, val_loss=0.911, train_dice=0.390, train_loss=0.230]
Epoch 264: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:30<00:00,  0.46it/s, v_num=50, val_dice=0.0528, val_loss=0.915, train_dice=0.390, train_loss=0.230]
Epoch 306: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:12<00:00,  1.09it/s, v_num=50, val_dice=0.064, val_loss=0.857, train_dice=0.389, train_loss=0.242]Epoch 306: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:12<00:00,  1.09it/s, v_num=50, val_dice=0.064, val_loss=0.857, train_dice=0.389, train_loss=0.242]
Validation: |                                                                                                                                                                                                                 | 0/? [00:00<?,
Epoch 306: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:31<00:00,  0.45it/s, v_num=50, val_dice=0.0587, val_loss=0.857, train_dice=0.389, train_loss=0.242]
Epoch 373: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:11<00:00,  1.19it/s, v_num=50, val_dice=0.0809, val_loss=0.899, train_dice=0.418, train_loss=0.175]
Epoch 373: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:30<00:00,  0.47it/s, v_num=50, val_dice=0.0864, val_loss=0.876, train_dice=0.418, train_loss=0.175]
Epoch 1956: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:12<00:00,  1.13it/s, v_num=50, val_dice=0.235, val_loss=0.714, train_dice=0.750, train_loss=0.102]
Epoch 1956: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:30<00:00,  0.46it/s, v_num=50, val_dice=0.224, val_loss=0.754, train_dice=0.750, train_loss=0.102]
Epoch 2025: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:12<00:00,  1.09it/s, v_num=50, val_dice=0.393, val_loss=0.529, train_dice=0.694, train_loss=0.130]
Validation DataLoader 0: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:11<00:00,  0.
Epoch 2025: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:31<00:00,  0.45it/s, v_num=50, val_dice=0.337, val_loss=0.596, train_dice=0.694, train_lossEpoch 2025: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:31<00:00,  0.45it/s, v_num=50, val_dice=0.337, val_loss=0.596, train_dice=0.694, train_lossEpoch 2025:   0%|                                                                                                                              | 0/14 [00:00<?, ?it/s, v_num=50, val_dice=0.337, val_loss=0.596, train_dice=0.694, train_lossEpoch 2026:   0%|                                                                                                                              | 0/14 [00:00<?, ?it/s, v_num=50, val_dice=0.337, val_loss=0.596, train_dice=0.694, train_lossEpoch 2026:   7%|███████▉                                                                                                       | 1/14 [00:00<00:10,  1.27it/s, v_num=50, val_dice=0.337, val_loss=0.596, train_dice=0.694, train_loss=0.130]Epoch 2026: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:12<00:00,  1.13it/s, v_num=50, val_dice=0.337, val_loss=0.596, train_dice=0.760, train_loss=0.106]
Validation DataLo

Environment

MONAI version: 1.3.0
PyTorch version: 2.2.1
PyTorch Lightning version: 2.2.5
CUDA/cuDNN version: 12.2

I would appreciate any guidance on how to improve the generalization of my model. Are there specific strategies or best practices that I should consider? Any insights into potential issues with my current approach would also be very helpful.

Thank you for your assistance!

EDIT:
For anyone that wants to execute the code, here is a notebook in colab.

Answered by SrMateos

Jul 1, 2024

Hi @KumoLiu,

It appears that the issue was with the threshold_values parameter in the AsDiscrete transformation for the postPred transforms. The steps were not discretizing the values correctly, which caused the Dice coefficient to malfunction and the model to produce inaccurate results.

Thank you for your time and attention.

View full answer

KumoLiu · 2024-06-24T04:06:53Z

KumoLiu
Jun 24, 2024
Maintainer

Hi @SrMateos, thanks for your interest here.

Based on the situation described where training loss is decreasing and Dice score is improving, but validation loss and Dice score are not showing similar improvement over 5000 epochs with a Unet architecture and dice-based loss functions, here are a few suggestions:

First I would suggest you check the data after the process to ensure the data makes sense.
If not already implemented, data augmentation can help the model generalize better by exposing it to a wider variety of data examples.
AdamW already has a form of weight decay that helps with regularization, but you may want to experiment with additional techniques such as dropout layers if they are not already being used.
Experiment with different hyperparameters for the optimizer, including different initial learning rates, betas for AdamW, and loss function coefficients.

Hope it helps, thanks.

2 replies

SrMateos Jun 24, 2024
Author

Hi @KumoLiu , thanks for your response.

I have tried everything listed in your response, but still, I don't get any good results. I have tried data augmentation, changing the optimized, the loss function,... I have even tried to change the UNET for a UNETR but It doesn't seem to improve that much. I have added a notebook for executing the code at the end of the issue, in case you want to try it or in case you could give me better feedback. Moreover it would be really nice if you could tell me whether you can replicate the experiment or not, as I don't know if I am doing something wrong or something important is missing in my pipeline.

Thank you for your time and attention.

SrMateos Jul 1, 2024
Author

Hi @KumoLiu,

It appears that the issue was with the threshold_values parameter in the AsDiscrete transformation for the postPred transforms. The steps were not discretizing the values correctly, which caused the Dice coefficient to malfunction and the model to produce inaccurate results.

Thank you for your time and attention.

Answer selected by SrMateos

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue with Model Generalization for infection segmentation #7863

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Issue with Model Generalization for infection segmentation #7863

Uh oh!

Uh oh!

SrMateos Jun 20, 2024

Replies: 1 comment · 2 replies

Uh oh!

KumoLiu Jun 24, 2024 Maintainer

Uh oh!

SrMateos Jun 24, 2024 Author

Uh oh!

SrMateos Jul 1, 2024 Author

SrMateos
Jun 20, 2024

Replies: 1 comment 2 replies

KumoLiu
Jun 24, 2024
Maintainer

SrMateos Jun 24, 2024
Author

SrMateos Jul 1, 2024
Author