Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why "Construct different mini-batches for real and fake, i.e. each mini-batch needs to contain only all real images or all generated images"? #52

Open
mrgloom opened this issue Apr 16, 2019 · 3 comments

Comments

@mrgloom
Copy link

mrgloom commented Apr 16, 2019

What is motivation behind "Construct different mini-batches for real and fake, i.e. each mini-batch needs to contain only all real images or all generated images" can you eleborate on that?

@smikhai1
Copy link

smikhai1 commented Apr 28, 2019

@mrgloom
tldr: If we don't construct different mini-batches for real and fake samples, batch normalization will not work as it supposed and get any profit.

The purpose of batch normalization is to reduce internal covariance shift in activation maps by making all of the activations be distributed equally (with zero mean and std equal to 1). In this case, there is no necessity for a neural network to adapt to changes in distributions of activations, that occur due to the changes in weights during the training process. As a result, such normalization simplifies learning significantly.

At the very beginning of GAN's training real and fake samples in a mini-batch have very very different distributions, thus if we try to normalize it, we won't end up with well-centered data. Moreover, the distribution of such normalized data will be changing significantly during the training (because the generator provides better and better results), and the discriminator will have to adapt to these changes.

@ECEMACHINE
Copy link

@mrgloom
tldr: If we don't construct different mini-batches for real and fake samples, batch normalization will not work as it supposed and get any profit.

The purpose of batch normalization is to reduce internal covariance shift in activation maps by making all of the activations be distributed equally (with zero mean and std equal to 1). In this case, there is no necessity for a neural network to adapt to changes in distributions of activations, that occur due to the changes in weights during the training process. As a result, such normalization simplifies learning significantly.

At the very beginning of GAN's training real and fake samples in a mini-batch have very very different distributions, thus if we try to normalize it, we won't end up with well-centered data. Moreover, the distribution of such normalized data will be changing significantly during the training (because the generator provides better and better results), and the discriminator will have to adapt to these changes.

Thanks for your explaination! That's impressive. But I have another question. If it is a regular image classification question and we still use the BN function, in this case, we need to shuffle the data first and we will have different labels' data in these batches to make the model more robust. How can we explain this? Or maybe it's different in the Gan?
Hope for your reply!
Thanks so much!

@ManoharSai2000
Copy link

It can be viewed such that the similar labels follow similar distribution and hence their mix, and they do not change rapidly over training as in case of GAN's generator model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants