Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strategies to improve invariance with data augmentation #54

Open
mattersoflight opened this issue Sep 16, 2023 · 1 comment
Open

strategies to improve invariance with data augmentation #54

mattersoflight opened this issue Sep 16, 2023 · 1 comment
Assignees

Comments

@mattersoflight
Copy link
Member

mattersoflight commented Sep 16, 2023

We use intensity scaling and noise augmentations to make the virtual staining model invariant. We should leverage augmentations while keeping the training process stable and efficient.

This paper suggests a simple strategy and reports that it is effective: include many augmentations of the same sample to construct the batch, and average the losses (which happens naturally). @ziw-liu what is the current strategy in HCSDataModule? Can you test the strategy reported in Fig. 1B (top) of this paper?

PS: The paper also reports the regularization of a classification model with KL divergence over the augmentations. This doesn't translate naturally to virtual staining.

@ziw-liu
Copy link
Collaborator

ziw-liu commented Sep 18, 2023

This paper suggests a simple strategy and reports that it is effective: include many augmentations of the same sample to construct the batch, and average the losses (which happens naturally)

This seems to be what the current code does when data.train_patches_per_stack is set to larger than 1 ($K_{train}$ in the paper). However this technique reduces the number of unique samples per batch, therefore its effect cannot be clearly determined with batch normalization or smaller batches (for example they used batch size of 528 and $K_{train} \in \{ 1,2,4,8,16 \}$ for Fig 9):

Since the performance of models with batch normalisation depends strongly on the examples used to estimate the batch statistics (Hoffer et al., 2017), in the main paper we train on highly performant models that do not use batch normalisation, following Fort et al. (2021), to simplify our analysis (see App. D for more discussion).

Note that for this study to be fully rigorous, one should also sweep over the ghost batch size; however, due to computational constraints and to be able to reproduce the results of Nabarro et al. (2021), we use the same experimental setup as Nabarro et al. (2021) and estimate the batch statistics over all examples in the minibatch.

This experiment involves sweeping hyperparameters (compute-intensive), so I think it should be done after this round of pipeline engineering (e.g. #42, #55).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants