Unexpected behavior in Loss Functions' loss/epoch curves #7615

NastaranVB · 2024-04-09T08:59:42Z

NastaranVB
Apr 9, 2024

Hello,
I'm working on a multi-class 3D segmentation project with Monai's UNet network and loss functions, Diceloss and DiceCE. However, I'm encountering an issue with these loss functions. With using Diceloss and DiceCE, the loss per epoch shoots up to 5 or 6, while it stays between 0 and 1 with torch.nn.CrossEntropyLoss() in conjunction with Monai's UNet model. I've attached the results for reference (The loss/epoch curves for Diceloss and DiceCEloss are plotted over 1000 epochs, while for cross-entropy cover 20 epochs). Could you help me understand why this happens with Monai's Diceloss and DiceCE?
Thanks in advance.

Lucas-rbnt · 2024-04-09T09:28:05Z

Lucas-rbnt
Apr 9, 2024

Hello!

Can you possible to have a little more information on the code used? What dataset do you use? Have you looked at the evaluation metrics in the 2 cases?

In any case, can you provide a little more information about the code you're using to help us understand whether this behaviour is normal or not?

The first thing to look at might be the include_background argument in the Dice losses and also the way the mask targets are constructed

0 replies

NastaranVB · 2024-04-16T12:48:27Z

NastaranVB
Apr 16, 2024
Author

Hello @Lucas-rbnt !
Apologies for the delay in my response. I had to review certain sections before sharing the latest version of my code.
The dataset I'm using consists of CT images of the brain, which contain two labels: infarction and hemorrhagic areas. Considering the background, it's a multi-class 3D segmentation project. I've utilized Medical image segmentation with TorchIO, MONAI & PyTorch Lightning the example provided in this GitHub repository (https://github.com/Project-MONAI/tutorials/blob/main/modules/TorchIO_MONAI_PyTorch_Lightning.ipynb). The key difference in my approach lies in training the model in a patch-wise manner.
The code with Monai's UNet network, loss function (DiceCE) and Dice metric is as below:
Data

def change_img_to_label_path(path):
    """
    Replaces imagesTr with labelsTr
    """
    parts = list(path.parts) 
    parts[parts.index("imagesTr")] = "labelsTr"  
    return Path(*parts) 

path = Path("/Data/imagesTr")
subjects_paths = list(path.glob("PAT_*"))
subjects = []

for subject_path in subjects_paths:
    label_path = change_img_to_label_path(subject_path)
    subject = tio.Subject({"CT":tio.ScalarImage(subject_path), "Label":tio.LabelMap(label_path)})
    subjects.append(subject)    
dataset = tio.SubjectsDataset(subjects)

# Training and validation split
num_subjects = len(dataset)
num_train_subjects = int(train_val_ratio * num_subjects)
num_val_subjects = num_subjects - num_train_subjects
num_split_subjects = num_train_subjects, num_val_subjects
train_subjects, val_subjects = torch.utils.data.random_split(subjects, num_split_subjects)

# Preprocessing
process = tio.Compose([
            tio.ToCanonical(),
            tio.Resample('CT'),
            tio.CopyAffine('CT'),
            tio.CropOrPad((512, 512, 35)),
            tio.EnsureShapeMultiple(4), 
            tio.RescaleIntensity((-1, 1)), 
            tio.OneHot(3),
            ])    
augmentation = tio.Compose([
                tio.RandomAnisotropy(), 
                tio.RandomAffine(),
                tio.RandomElasticDeformation(max_displacement=[10, 5, 0]),
                tio.RandomFlip(),
                tio.RandomNoise(),
                tio.RandomGamma(),
                ])

train_transform = tio.Compose([process, augmentation])
val_transform = process

train_dataset = tio.SubjectsDataset(train_subjects, transform=train_transform)
val_dataset = tio.SubjectsDataset(val_subjects, transform=val_transform)

# Patching
sampler = tio.data.LabelSampler(patch_size, label_name="Label", label_probabilities={0:0.2, 1:0.3, 2:0.5})

train_patches_queue = tio.Queue(
     train_dataset,
     max_length=40,   
     samples_per_volume=5, 
     sampler=sampler,
     num_workers=4
    )

val_patches_queue = tio.Queue(
     val_dataset,
     max_length=40,
     samples_per_volume=5,
     sampler=sampler,
     num_workers=4
    )

train_loader = torch.utils.data.DataLoader(train_patches_queue, batch_size=train_batch_size, num_workers=0)
val_loader = torch.utils.data.DataLoader(val_patches_queue, batch_size=val_batch_size, num_workers=0)

Training

# Training step
class Segmenter(pl.LightningModule):
    def __init__(self, model, learning_rate, loss_function, optimizer_name, scheduler_name):
        super().__init__()
        self.save_hyperparameters()
        self.model = self.hparams.model
        self.lr = self.hparams.learning_rate
        self.loss_fn = self.hparams.loss_function
        self.optimizer_name = self.hparams.optimizer_name
        self.scheduler_name = self.hparams.scheduler_name

       self.training_step_outputs = [] 
        self.validation_step_outputs = []  
        self.best_val_dice = 0   
        self.best_val_epoch = 0   
        
    def forward(self, data):
        pred = self.model(data)
        return pred
          
    def training_step(self, batch, batch_idx):       
        img = batch["Image"]["data"]
        mask = batch["Label"]["data"]       
        pred = self(img)
        loss = self.loss_fn(pred, mask)
        self.training_step_outputs.append(loss)
       self.log("Train Loss", loss, on_step=True, on_epoch=True, logger=True)
        return loss

   def on_train_epoch_end(self):
        train_epoch_average = torch.stack(self.training_step_outputs).mean()
        self.logger.experiment.add_scalars('loss/epoch', {'Train': train_epoch_average}, 
       global_step=self.current_epoch)
        self.training_step_outputs.clear()
        
    def validation_step(self, batch, batch_idx):        
        img = batch["Image"]["data"]
        mask = batch["Label"]["data"]     
        pred = self(img)
        loss = self.loss_fn(pred, mask)
        preds = [self.post_pred(i) for i in decollate_batch(pred)]
        masks = [self.post_label(i) for i in decollate_batch(mask)]
         self.validation_step_outputs.append(loss)     
        self.log("Val Loss", loss, on_step=True, on_epoch=True, logger=True)
        return loss

   def on_validation_epoch_end(self):
        val_epoch_average = torch.stack(self.validation_step_outputs).mean()
        self.logger.experiment.add_scalars('loss/epoch', {'Val':val_epoch_average}, 
        global_step=self.current_epoch)
        self.validation_step_outputs.clear()
    
    def configure_optimizers(self):
        optimizer = self.optimizer_name(self.parameters(), lr = self.lr)
        lr_scheduler = {
        "scheduler": self.scheduler_name(optimizer, gamma=0.99),
        "interval": "epoch",
        "monitor": "val_loss",
        }
        return [optimizer], [lr_scheduler]

# ###
unet = UNet(
    spatial_dims=3,
    in_channels=1,
    out_channels=3, 
    channels=(16, 32, 64),
    strides=(2, 2),
)

if __name__ == "__main__":
       
    model = Segmenter(
                model=unet,
                learning_rate=1e-4,
                loss_function=DiceCELoss(include_background=False, to_onehot_y=False, softmax=True,  
                reduction='mean', smooth_nr=1e-5, smooth_dr=1e-5, weight=torch.tensor([1., 2., 10.])),
                optimizer_name=torch.optim.AdamW,
                scheduler_name=torch.optim.lr_scheduler.ExponentialLR
    )

checkpoint_callback = ModelCheckpoint(
    monitor='Val Loss',
    save_top_k=10,  # TODO 10
    mode='min')
lr_monitor = LearningRateMonitor()
callbacks = [checkpoint_callback, lr_monitor]
logger = TensorBoardLogger(save_dir="./logs")
trainer = pl.Trainer(devices=1, accelerator="gpu",
                         logger=logger,
                         log_every_n_steps=1,
                         check_val_every_n_epoch=1,
                         callbacks=callbacks,
                         max_epochs=num_epochs

                        )
trainer.fit(model, train_loader, val_loader)

Evaluation

model = Segmenter.load_from_checkpoint("/checkpoints/epoch=97-step=25773.ckpt")
model = model.eval()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model.to(device);

# Patch Aggregation
# Select a validation subject and extract the images and segmentation for evaluation
IDX = 10 
imgs = val_dataset[IDX]["CT"]["data"]
mask = val_dataset[IDX]["Label"]["data"]
# GridSampler
grid_sampler = tio.inference.GridSampler(val_dataset[IDX], patch_size=(96, 96, 8), patch_overlap=(2, 2, 2))
# GridAggregator
aggregator = tio.inference.GridAggregator(grid_sampler)
patch_loader = torch.utils.data.DataLoader(grid_sampler, batch_size=val_batch_size)
# Prediction
CHANNELS_DIMENSION = 1
with torch.no_grad():
    for patches_batch in patch_loader:
        input_tensor = patches_batch['CT']["data"].to(device)  
        locations = patches_batch[tio.LOCATION]  # Get locations of patches
        pred = model(input_tensor).softmax(dim=CHANNELS_DIMENSION) 
        aggregator.add_batch(pred, locations)  # Combine predictions to volume
# Extract the volume prediction
output_tensor = aggregator.get_output_tensor()

**Post-Processing**
pred_mask = AsDiscrete(argmax=True, to_onehot=3, threshold=None)(output_tensor)

# Dice score calculation
GT_mask = val_dataset[IDX]["Label"]["data"].float().unsqueeze(0) 
PRED_mask = pred_mask.unsqueeze(0) 

dice_metric = DiceMetric(include_background=False, reduction='mean_batch', get_not_nans=True, ignore_empty=True, num_classes=3)
dice = dice_metric(PRED_mask.to(device), GT_mask.to(device))

classes = ['class1', 'class2']  
for i, row in enumerate(dice):
    for j, element in enumerate(row):
        print(f"{classes[j]}: {element.item()}")

The mean Dice score for class one is 0.026 and for class two is 0.008 for 1000 epochs.

While for the code with torch.nn.CrossEntropyLoss() as loss function in conjunction with Monai's UNet model, the mean Dice score for class one is 0.26 and for class two is 0.002 for 1000 epochs.

Your assistance would be greatly appreciated.

0 replies

NastaranVB · 2024-04-18T15:00:19Z

NastaranVB
Apr 18, 2024
Author

Hello @Lucas-rbnt !
I wanted to follow up on my question. Since you receive many emails daily, I thought it might have been missed. I'm eager to move forward with resolving this issue.

Warm regards.

0 replies

Lucas-rbnt · 2024-04-19T08:25:08Z

Lucas-rbnt
Apr 19, 2024

Hello, it's a bit complicated to answer without knowing all the ins and outs. You mention two classes in the results but your OneHot mentions 3. Is it possible to make a OneHot with just two labels by removing the background? If you have two classes you can simply produce an output with two channels, the first channel would be made up of 1 for the label infarction and 0 for the remaining. The second channel would be made up of 1 for the label hemorrhagic and 0 for the remaining voxel. So you could train using the following loss function.

DiceCELoss(to_onehot_y=False, sigmoid=True)

Otherwise the problem may come from the way the metrics are calculated or assembled via patching, since in all cases the values are quite low, whether with CrossEntropy or DiceCELoss.

Finally, the dataset you're using isn't public, I imagine, but it's made up of how many samples, and are the augmentations/preprocessing appropriate?

In the hope that this will provide a few more insights for your investigation!

4 replies

NastaranVB Apr 19, 2024
Author

Hello, thanks for coming back to me.
My multi class 3D segmentation task contains 3 classes including background (background, infarction, hemorrhage).

Lucas-rbnt Apr 19, 2024

And how does your DiceMetric look for the background class then?
Is it worth adding a background class? Given that it can be deduced from the other 2?

NastaranVB Apr 19, 2024
Author

Actually, I ignored the background class in the 'DiceMetric' calculation, as below:

dice_metric = DiceMetric(include_background=False, reduction='mean_batch', get_not_nans=True, ignore_empty=False, num_classes=3)

I will proceed with your suggestion and update you on the results once I obtain them.

Thank you so much.

NastaranVB Apr 19, 2024
Author

Hi @Lucas-rbnt !
When considering the background in the calculation of the "DiceMetric" but ignoring it in the "DiceCEloss" as a loss function, the results are as follows:
class 1 (background): 0.992
class 2: 0.0255
class 3: 0.0081
The results for class2 and class3 align with those obtained previously when the background was ignored in both "DiceMetric" and "DiceCEloss" calculations.

I'm also working on validating with public data and will update you once I obtain the results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unexpected behavior in Loss Functions' loss/epoch curves #7615

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Unexpected behavior in Loss Functions' loss/epoch curves #7615

Uh oh!

NastaranVB Apr 9, 2024

Replies: 4 comments · 4 replies

Uh oh!

Lucas-rbnt Apr 9, 2024

Uh oh!

Uh oh!

NastaranVB Apr 16, 2024 Author

Uh oh!

NastaranVB Apr 18, 2024 Author

Uh oh!

Lucas-rbnt Apr 19, 2024

Uh oh!

NastaranVB Apr 19, 2024 Author

Uh oh!

Lucas-rbnt Apr 19, 2024

Uh oh!

NastaranVB Apr 19, 2024 Author

Uh oh!

NastaranVB Apr 19, 2024 Author

NastaranVB
Apr 9, 2024

Replies: 4 comments 4 replies

Lucas-rbnt
Apr 9, 2024

NastaranVB
Apr 16, 2024
Author

NastaranVB
Apr 18, 2024
Author

Lucas-rbnt
Apr 19, 2024

NastaranVB Apr 19, 2024
Author

NastaranVB Apr 19, 2024
Author

NastaranVB Apr 19, 2024
Author