Apply online data augmentation only to minority class #3152

tommydino93 · 2021-10-20T09:59:19Z

tommydino93
Oct 20, 2021

Hi :)
I modified the 3D classification tutorial for a 3-class problem (labels 0, 1, and 2) and it runs fine.

Now I'd like to apply online data augmentation with Compose, but only to volumes that are associated with the labels 1 and 2 (minority classes). What is the recommended way to do this? I came up with 2 possible solutions:

Create two separate ImageDataset, apply different transformation and then merge them. However, I did not find a way to merge the datasets.
Set a condition/argument in Compose that selectively applies the augmentations, though I did not find such an argument.

Thanks a lot in advance,
Tommaso

Nic-Ma · 2021-10-20T11:58:55Z

Nic-Ma
Oct 20, 2021
Maintainer

Hi @tommydino93 ,

Thanks for your interest here.
I suppose your data is in One-Hot format with 3 channels.
I think you can leverage SplitChanneld transform to split the image into 3 images, then apply transforms and leverage ConcatItemsd transform to concat the images.

Thanks.

0 replies

rijobro · 2021-10-20T12:06:08Z

rijobro
Oct 20, 2021
Collaborator

Hi @tommydino93,

You could separate your data augmentations from the other transforms and then use a lambda function to conditionally apply the data augmentations. Something like this (untested):

# data augmentations
random_augmentations = Compose([
    RandFlipd(keys="image"),
    ...
])

# standard transformations
transformations = Compose([
    LoadImaged(keys="image"),
    ...,
    Lambda(lambda x: random_augmentations(x) if x["label"] in (1, 2) else x)
])

data1 = {"image": some_filename, "label": 0}
transformations(data1)  # should have identity applied

data2 = {"image": some_filename, "label": 1}
transformations(data2)  # should have random augmentations applied

Here, I've used dictionary transformations so that when performing transformations to the image, we have access to the label at the same time.

0 replies

tommydino93 · 2021-10-20T12:43:04Z

tommydino93
Oct 20, 2021
Author

thank you very much @Nic-Ma and @rijobro !
I will have a look and try to implement with the dictionary transforms.

However, I was thinking that by doing as you suggested (@rijobro ), I would not duplicate the minority class, but instead I would simply apply the transformations to the volumes belonging to the minority class. What would be more desirable in my case would be probably to perform offline augmentation and really create augmented volumes.

Or am I missing something and the same thing can be done with online augmentation?

Thanks again

1 reply

rijobro Oct 20, 2021
Collaborator

What do you mean by online/offline augmentations? By online, I assume you mean on-the-fly. By offline, do you mean to pre-compute the data augmentations, save them to disk and then create a dataset containing all of the data, augmented and not?

If so, I think the former is done more frequently, as it allows the augmentation to be different each time the transform is run. For example, RandRotate generates continuous variables, meaning there are infinite instances of rotations that can be generated. If you pre-compute the augmentations, then you are limited to just using those instances, and risk over-fitting to those data points.

tommydino93 · 2021-10-21T08:57:41Z

tommydino93
Oct 21, 2021
Author

Thanks @rijobro

Yes, by online I meant on-the-fly and by offline I meant to pre-compute the augmentations. I hadn't thought about the overfitting on the pre-computed augmentations, you are right.

So I decided to go for the on-the-fly. Your solution works, but I had to add collate_fn=lambda x: x in DataLoader (thread1 and thread2) otherwise I was getting:
RuntimeError: each element in list of batch should be of equal size because RandRotate90d sometimes changes the volume shape.

Here is the (pseudo) code that works for me:

# define monai data augmentation (will only be applied on minority class volumes, see Lambda)
augmentation_transforms = Compose([RandRotate90d(keys="volume", prob=0.25, max_k=3, spatial_axes=(0, 1)),
                                   RandFlipd(keys="volume", prob=0.25, spatial_axis=None),
                                   RandGaussianNoised(keys="volume", prob=0.25, mean=0.0, std=0.1),
                                   RandZoomd(keys="volume", prob=0.25, min_zoom=0.7, max_zoom=1.3)])

# define preprocessing and transforms for training volumes 
train_transforms = Compose([LoadImaged(keys="volume"),
                            ScaleIntensityd(keys="volume", minv=0.0, maxv=1.0),  # rescale intensities between "minv" and "maxv"
                            AddChanneld(keys="volume"),  # add channel; monai expects channel-first tensors
                            Resized(keys="volume", spatial_size=tuple(median_shape)),  # resize all volumes to median volume shape
                            Lambda(lambda x: augmentation_transforms(x) if x["label"] in (1, 2) else x),  # apply data augmentation(s)
                            EnsureTyped(keys="volume")])  # ensure that input data is either a PyTorch Tensor or np array

x_train = ["filepath1", "filepath2", "filepath2", "..."]
y_train = [0, 1, 2, ...] 

# create input dict
train_files = [{"volume": volume_name, 'label': label_name} for volume_name, label_name in zip(x_train, y_train)]

train_ds = Dataset(data=train_files, transform=train_transforms)
train_loader = DataLoader(train_ds, batch_size=batch_size, shuffle=True, num_workers=nb_workers, pin_memory=torch.cuda.is_available(), collate_fn=lambda x: x)

# ... begin training loop

2 replies

rijobro Oct 21, 2021
Collaborator

Glad you got it working! You could consider padding or cropping if you want your images to be square/cubic. This would make it possible to collate images, even after 90 degree rotations. Whether that's possible or not depends on the original dimensions of your image -- if they're not far off square then it should be fine, but if they're very rectangular perhaps not.

tommydino93 Oct 21, 2021
Author

Thanks for the suggestion! The median shape of the volumes is [432, 512, 46], so probably not worth transforming them into cubes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply online data augmentation only to minority class #3152

{{title}}

Replies: 4 comments 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Apply online data augmentation only to minority class #3152

tommydino93 Oct 20, 2021

Replies: 4 comments · 3 replies

Nic-Ma Oct 20, 2021 Maintainer

rijobro Oct 20, 2021 Collaborator

tommydino93 Oct 20, 2021 Author

rijobro Oct 20, 2021 Collaborator

tommydino93 Oct 21, 2021 Author

rijobro Oct 21, 2021 Collaborator

tommydino93 Oct 21, 2021 Author

tommydino93
Oct 20, 2021

Replies: 4 comments 3 replies

Nic-Ma
Oct 20, 2021
Maintainer

rijobro
Oct 20, 2021
Collaborator

tommydino93
Oct 20, 2021
Author

rijobro Oct 20, 2021
Collaborator

tommydino93
Oct 21, 2021
Author

rijobro Oct 21, 2021
Collaborator

tommydino93 Oct 21, 2021
Author