-
Generative models
- AE (autoencoders) [1] (sec 3.3): reconstruct (part of) inputs from (corrupted) inputs
- AEs
- Denoising AEs
- Variational AEs
- Flow-based models [1] (sec 3.2): estimate high-dimensional densities p(x) from data
- AR (auto-regressive) models [1] (sec 3.1): model images pixel by pixel, examples
- PixelRNN
- PixelCNN
- AE (autoencoders) [1] (sec 3.3): reconstruct (part of) inputs from (corrupted) inputs
-
Contrastive models (i.e. discriminative models)
- Context-instance contrast == global-local contrast [1] (sec 4.1): focus on modeling the belonging relationship between local feature of a sample and its global context representation
- PRP (Predict Relative Position): learn relative postiions between components
- predict relative postitions of 2 patches from a sample
- recover positions of shuffled segments of the image (sove jigsaw)
- predict rotation angle
- MI (Maximize Mutual Information): learn the explicit belonging relationships between local parts and global context
- Deep InfoMax - 1st one to explicitly model mutual information through a contrastive learning task, maximizing the MI btwn a local pathc and its global context
- Contrastive Predictinve Coding
- AMDIM - similar to Deep InfoMax
- PRP (Predict Relative Position): learn relative postiions between components
- Context-context contrast [1] (sec 4.2)
- Cluster-based Discrimination
- Instance Discrimination
- InstDisc - prototype
- CMC
- MoCo
- use momentum contrast to substantially increacse the amount of negative samples
- however uses a too simple positive sample strategy (a pair of positives comes from the same sample without transformation or augmentation
- PIRL adds jigsaw augmentation
- SimCLR illustrates importance of a hard positive strategy by introduing data augmenttaion in 10 forms
- BYOL discards negative sampling
- critics suggest BYOL actually doesn't work
- InfoMin
- investigate positive samples augemntation
- suggest to select augemented views with less mutual infromation for better performance
- Context-instance contrast == global-local contrast [1] (sec 4.1): focus on modeling the belonging relationship between local feature of a sample and its global context representation
-
Generative-contrastive (= adversarial) models [1] (sec 5)
- generate from complete input
- AAE (adversarial autoencoder)
- BiGAN
- ALI (same architecture as BiGAN)
- reconstruct from partial input
- coloriztion
- inpainting
- super-resolution
- generate from complete input
Generative models:
- +
- can fit data distribution -> strong expressiveness
- -
- sensitive to rare samples (since p(x|c)=0 => L_{MLE} -> +\infty)
- low level abstraction objective (learn to model pixels, not necesserily higher level representations) Problems can be solved by contrastive (discriminative) objectives. Hence generaive-contrastive models - best of 2 worlds
GANs vs contrastive: reconstructive representation (expressive) vs just distinguishable representation
Comparison of AE, GAN, contrastive Structure: generator (encoder + decoder) + discriminator
latent space Z | discriminator | objectives | |
---|---|---|---|
AE | explicit | - | generative |
GAN | implicit | + (heavier) | discriminative |
contrastive | explicit | + (lighter) | discriminative |
[1] Liu et al. 2020 Self-supervised Learning: Generative or Contrastive arxiv pdf