Skip to content

dansuh17/deep-learning-roadmap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 

Repository files navigation

Deep Learning Roadmap

My own deep learning mastery roadmap, inspired by Deep Learning Papers Reading Roadmap.

There are some customized differences:

  • not only academic papers but also blog posts, online courses, and other references are included
  • customized for my own plans - may not include RL, NLP, etc.
  • updated for 2019 SOTA

Introductory Courses

Basic CNN Architectures

  • AlexNet (2012) [paper]
    • Alex Krizhevsky et al. "ImageNet Classification with Deep Convolutional Neural Networks"
  • ZFNet (2013) [paper]
    • Zeiler et al. "Visualizing and Understanding Convolutional Networks"
  • VGG (2014)
    • Simonyan et al. "Very Deep Convolutional Networks for Large-Scale Image Recognition" (2014) [Google DeepMind & Oxford's Visual Geometry Group (VGG)] [paper]
    • VGG-16: Zhang et al. "Accelerating Very Deep Convolutional Networks for Classification and Detection" [paper]
  • GoogLeNet, a.k.a Inception v.1 (2014) [paper]
    • Szegedy et al. "Going Deeper with Convolutions" [Google]
    • Original LeNet page from Yann LeCun's homepage.
    • Inception v.2 and v.3 (2015) Szegedy et al. "Rethinking the Inception Architecture for Computer Vision" [paper]
    • Inception v.4 and InceptionResNet (2016) Szegedy et al. "Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning" [paper]
    • "A Simple Guide to the Versions of the Inception Network" [blogpost]
  • ResNet (2015) [paper]
    • He et al. "Deep Residual Learning for Image Recognition"
  • Xception (2016) [paper]
    • Chollet, Francois - "Xception: Deep Learning with Depthwise Separable Convolutions"
  • MobileNet (2016) [paper]
    • Howard et al. "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications"
    • A nice paper about reducing CNN parameter sizes while maintaining performance.
  • DenseNet (2016) [paper]
    • Huang et al. "Densely Connected Convolutional Networks"

Generative adversarial networks

  • GAN (2014.6) [paper]
    • Goodfellow et al. "Generative Adversarial Networks"
  • DCGAN (2015.11) [paper]
    • Radford et al. "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks"
  • Info GAN (2016.6) [paper]
    • Chen et al. "InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets"
  • Improved Techinques for Training GANs (2016.6) [paper]
    • Salimans et al. "Improved Techinques for Training GANs"
    • This paper suggests multiple GAN training techinques such as feautre matching, minibatch discrimination, one sided label smoothing, virtual batch normalization.
    • It also suggests a renown generator performance metric, called the inception score.
  • f-GAN (2016.6) [paper]
    • Nowozin et al. "f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization"
  • Unrolled GAN (2016.7) [paper]
    • Metz et al. "Unrolled Generative Adversarial Networks"
  • ACGAN (2016.10) [paper]
    • Odena et al. "Conditional Image Synthesis With Auxiliary Classifier GANs"
  • LSGAN (2016.11) [paper]
    • Mao et al. "Least Squares Generative Adversarial Networks"
  • Pix2Pix (2016.11) [paper]
    • Isola et al. "Image-to-Image Translation with Conditional Adversarial Networks"
  • EBGAN (2016.11) [paper]
    • Zhao et al. "Energy-based Generative Adversarial Network"
  • WGAN (2017.4) [paper]
    • Arjovsky et al., "Wasserstein GAN"
  • WGAN_GP (2017.5) [paper]
    • Gulrajani et al., "Improved Training of Wasserstein GANs"
    • Improves the training stability by applying "gradient penalty (GP)" to the loss function
  • BEGAN (2017.5) [paper]
    • Berthelot et al. "BEGAN: Boundary Equilibrium Generative Adversarial Networks"
    • Introduces a diversity ratio, or an equilibrium constant that controls the variety - quality tradeoff, and also proposes a convergence measure using it.
  • CycleGAN (2017.5) [paper]
    • DiscoGAN (2017.5) [paper]
    • DiscoGAN and CycleGAN proposes the EXACT SAME learning techniques for style transfer task using GAN, developed independently at the same time.
  • Frechet Inception Distance (FID) (2017.6) [paper]
    • Heusel et al. "GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium"
    • The paper's main contribution is a technique called Two Time-Scale Update Rule (TTSU), but it is mostly known for the distance metric called Frechet Inception Distance that measures the distance between two distributions of activation values.
  • ProGAN (2017.10) [paper]
    • Karras et al. "Progressive Growing of GANs for Improved Quality, Stability, and Variation"
  • PacGAN (2017.12) [paper]
    • Higgins et al. "PacGAN: The power of two samples in generative adversarial networks"
  • BigGAN (2018) [paper]
  • GauGAN (2019.3) [paper]
    • Park et al. "Semantic Image Synthesis with Spatially-Adaptive Normalization"

Advanced GANs

  • DRAGAN (2017.5) [paper]
    • Kodali et al. "On Convergence and Stability of GANs"
  • Are GANs Created Equal? (2017.11) [paper]
    • Lucic et al. "Are GANs Created Equal? A Large-Scale Study"
  • SGAN (2017.12) [paper]
    • Chavdarova et al. "SGAN: An Alternative Training of Generative Adversarial Networks"
  • MaskGAN (2018.1) [paper]
    • Fedus et al. "MaskGAN: Better Text Generation via Filling in the _____"
  • Spectral Normalization (2018.2) [paper]
    • Miyato et al. "Spectral Normalization for Generative Adversarial Networks"
  • SAGAN (2018.5) [paper] [tensorflow]
    • Zhang et al. "Self-Attention Generative Adversarial Networks"
  • Unusual Effectiveness of Averaging in GAN Training (2018) [paper]
    • "Benefitting from training on past snapshots."
    • Uses exponential moving averaging (EMA)
  • Disconnected Manifold Learning (2018.6) [paper]
    • Khayatkhoei, et al. "Disconnected Manifold Learning for Generative Adversarial Networks"
  • A Note on the Inception Score (2018.6) [paper]
    • Barratt et al., "A Note on the Inception Score"
  • Which Training Methods for GAN do actually converge? (2018.7) [paper]
    • Mescheder et al., "Which Training Methods for GANs do actually Converge?"
  • GAN Dissection (2018.11) [paper]
    • Bau et al. "GAN Dissection: Visualizing and Understanding Generative Adversarial Networks"
  • Improving Generalization and Stability for GANs (2019.2) [paper]
    • Thanh-Tung et al., "Improving Generalization and Stability of Generative Adversarial Networks"
  • Augustus Odena - "Open Questions about GANs" (2019.4) [distill.pub]
    • Very nice article about current state of GAN research and discusses problems yet to be solved.

Autoencoders

  • Original autoencoder (1986) [paper]
    • Rumelhart, Hinton, and Williams, "Learning Internal Representations by Error Propagation"
  • AutoEncoder [science]
    • Hinton et al., "Reducing the Dimensionality of Data with Neural Networks"
  • Denoising Autoencoders (2008) [paper]
    • Vincent et al. "Extracting and Composing Robust Features with Denoising Autoencoders"
  • Wasserstein Autoencoder (2017) [paper]
    • Tolstikhin et al. "Wasserstein Auto Encoders"

Autoregressive models

  • PixelCNN (2016) [paper]
    • van den Oord et al. "Conditional image generation with PixelCNN decoders."
  • WaveNet (2016) [paper]
    • van den Oord et al. "WaveNet: A Generative Model for Raw Audio"
  • tacotron?

Layer Normalizations

  • Batch Normalization (2015.2) [paper]
    • Ioeffe et al. "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift"
  • Group Norm
  • Instance Normalization (2016.7) [paper]
    • Ulyanov et al. "Instance Normalization: The Missing Ingredient for Fast Stylization"
  • Santurkar et al. "How does Batch Normalization help Optimization?" (2018.5) [paper]
  • Switchable Normalization (2019) [paper]
    • Luo et al. "Differentiable Learning-to-Normalize via Switchable Normalization"
  • Weight Standardization (2019.3) [paper]
    • Qiao et al. "Weight Standardization"

Initializations

  • Xavier Initialization (2010) [paper]
    • Glorot et al., "Understanding the difficulty of training deep feedforward neural networks"
  • Kaiming (He) Initialization (2015.2) [paper]
    • He et al., "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification"
  • All you need is a good init (2015.11) [paper]
    • Mishkin et al., "All you need is a good init"
  • All you need is beyond a good init (2017.4) [paper]
    • Xie et al. "All You Need is Beyond a Good Init: Exploring Better Solution for Training Extremely Deep Convolutional Neural Networks with Orthonormality and Modulation"

Dropouts

  • Dropout (2014) [paper]
    • Srivastava et al. "Dropout: A Simple Way to Prevent Neural Networks from Overfitting"
  • Inverted Dropouts [notes on CS231n]
    • Multiplying the inverted keep_prob value on training so that values during inference (or testing) is consistent.
  • Li et al., "Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift" (2018.1) [paper]

Meta-Learning / Representation Learning (Zero-Shot learning, Few-Shot learning)

  • Zero-Data Learning (2008) [paper]
    • Larochelle et al., "Zero-data Learning of New Tasks"
  • Palatucci et al., "Zero-shot Learning with Semantic Output Codes" (NIPS 2009) [paper]
  • Socher et al., "Zero-Shot Learning Through Cross-Modal Transfer" (2013.1) [paper]
  • Lampert et al., "Attribute-Based Classification for Zero-Shot Visual Object Categorization" (2013.7) [paper]
  • Dinu et al., "Improving zero-shot learning by mitigating the hubness problem" (2014.12) [paper]
  • Romera-Paredes et al. - "An embarrassingly simple approach to zero-shot learning" (2015) [paper]
  • Prototypical Networks (2017.3) [paper]
    • Snell et al., "Prototypical Networks for Few-shot Learning"
  • Zero-shot learning - the Good, the Bad and the Ugly" (2017.3) [paper]
    • Xian et al., "Zero-Shot Learning - The Good, the Bad and the Ugly"
  • In defence of the Triplet Loss (2017.3) [paper]
    • Hermans et al., "In Defense of the Triplet Loss for Person Re-Identification"
  • MAML (2017.3) [paper]
    • Finn et al, "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks"
  • Triplet Loss and Online Triplet Mining in Tensorflow (2018.3) [Oliver Moindrot Blog]
  • Few-Shot learning Survey (2019.4) [paper]
    • Wang et al. "Few-shot Learning: A Survey"

Transfer learning

  • Survey 2018 (2018) [paper]
    • Tan et al. "A Survey on Deep Transfer Learning"

Geometric learning

  • Geometric Deep Learning (2016) [paper]
    • Bronstein et al. "Geometric deep learning: going beyond Euclidean data"

Variational Autoencoders (VAE)

  • VQ-VAE (2017.11) [paper]
    • van den Oord et al., "Neural Discrete Representation Learning"
  • Semi-Amortized Variational Autoencoders (2018.2) [paper]
    • Kim et al. "Semi-Amortized Variational Autoencoders"

Object detection

Semantic Segmentation

Sequential Model

  • Seq2Seq (2014) [paper]
    • Sutskever et al. "Sequence to sequence learning with neural networks."

Neural Turing Machine

  • Neural Turing Machines (2014) [paper]
    • Graves et al., "Neural turing machines."
  • Pointer Networks (2015) [paper]]
    • Vinyals et al., "Pointer networks."

Attention / Question-Answering

  • NMT (Neural Machine Translation) (2014) [paper]
    • Bahdanau et al, "Neural Machine Translation by Jointly Learning to Align and Translate"
  • Stanford Attentive Reader (2016.6) [paper]
    • Chen et al. "A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task"
  • BiDAF (2016.11) [paper]
    • Seo et al. "Bidirectional Attention Flow for Machine Comprehension"
  • DrQA or Stanford Attentive Reader++ (2017.3) [paper]
    • Chen et al. "Reading Wikipedia to Answer Open-Domain Questions"
  • Transformer (2017.8) [paper] [google ai blog]
    • Vaswani et al. "Attention is all you need"
  • [read] Lilian Weng - "Attention? Attention!" (2018) [blog_post]
    • A nice explanation of attention mechanism and its concepts.
  • BERT (2018.10) [paper]
    • Devlin et al., "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"
  • GPT-2 (2019) [paper (pdf)]
    • Radford et al. "Language Models are Unsupervised Multitask Learners"

Advanced RNNs

Model Compression

  • MobileNet (2016) (see above: Basic CNN Architectures)
  • ShuffleNet (2017)
    • Zhang et al. "ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices"

Neural Processes

  • Neural Processes (2018) [paper]
    • Garnelo et al. "Neural Processes"
  • Attentive Neural Processes (2019) [paper]
    • Kim et al. "Attentive Neural Processes"
  • A Visual Exploration of Gaussian Processes (2019) [Distill.pub]
    • Not a neural process, but gives very nice intuition about Gaussian Processes. Good Read.

Self-supervised learning

Data Augmentation

  • Shake Shake Regularization (2017.5) [paper]
    • Gastaldi, Xavier - "Shake-Shake Regularization"

Interpretation and Theory on Generalization, Overfitting, and Learning Capacity

  • MDL (Minimum Description Length)
    • Peter Grunwald - "A tutorial introduction to the minimum description length principle" (2004) [paper]
  • Grunwald et al., - "Shannon Information and Kolmogorov Complexity" (2010) [paper]
  • Dauphin et al. "Identifying and attacking the saddle point problem in high-dimensional non-convex optimization" (2014.6) [paper]
  • Choromanska et al. "The Loss Surfaces of Multilayer Networks" (2014.11) [paper]
    • argues that non-convexity in NNs are not a huge problem
  • Knowledge Distillation (2015.3) [paper]
    • Hinton et al., "Distilling the Knowledge in a Neural Network"
  • 3-Part Learning Theory by Mostafa Samir
  • Deconvolution and Checkerboard Artifacts - Odena (2016) [distill.pub article]
  • Keskar et al. "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima" (2016.9) [paper]
  • Rethinking Generalization (2016.11) [paper]
    • Zhang et al. "Understanding deep learning requires rethinking generalization"
  • Information Bottleneck (2017) [paper] [original paper on information bottleneck (2000)] [youtube-talk] [article in quantamagazine]
    • Shwartz-Ziv and Tishby, "Opening the Black Box of Deep Neural Networks via Information"
  • Neyshabur et al, "Exploring Generalization in Deep Learning" (2017.7) [paper]
  • Sun et al., "Revisiting Unreasonable Effectiveness of Data in Deep Learning Era" (2017.7) [paper]
  • Super-Convergence (2017.8) [paper]
    • Smith et al. - "Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates"
  • Don't Decay the Learning Rate, Increase the Batch Size (2017.11) [paper]
    • Smith et al. "Don't Decay the Learning Rate, Increase the Batch Size"
  • Hestness et al. "Deep Learning Scaling is Predictable, Empirically" (2017.12) [paper]
  • Visualizing loss landscape of neural nets (2018) [paper]
  • Olson et al., "Modern Neural Networks Generalize on Small Data Sets" (NeurIPS 2018) [paper]
  • Lottery Ticket Hypothesis (2018.3) [paper]
    • Frankle et al., "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks"
    • Empirically showed that zeroing small weights after training, rewinding except zeroed wegiths, and then re-triaining with 'pruned' weights showed even better results.
  • Intrinsic Dimension (2018.4) [paper]
    • Li et al., "Measuring the Intrinsic Dimension of Objective Landscapes"
  • Geirhos et al. "ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness" (2018.11) [paper]
  • Belkin et al. "Reconciling modern machine learning and the bias-variance trade-off" (2018.12) [paper]
  • Graetz - "How to visualize convolution features in 40 lines of code" (2019) [medium]
  • Geiger et al. "Scaling description of generalization with number of parameters in deep learning" (2019.1) [paper]
  • Are all layers created equal? (2019.2) [paper]
    • Zhang et al. "Are all layers created equal?"
  • Lilian Weng - "Are Deep Neural Networks Dramatically Overfitted?" (2019.4) [lil'log]
    • Excellent article about generalization and overfitting of deep neural networks

Adversarial Attacks and Defense against attacks (RobustML)

  • RobustML site
  • Adversarial Examples Szegedy et al. - Intreguing Properties of Neural Networks (2013.12) [paper]
    • induces missclassification by applying small perturbations
    • this paper was the first to coin the term "Adversarial Example"
  • Fast Gradient Sign Attack (FGSM) (2014.12)
    • Goodfellow et al., "Explaining and Harnessing Adversarial Examples" (ICLR 2015) [paper]
    • This paper presented the famous "panda example" (as also seen in pytorch tutorial)
  • Kurakin et al., "Adversarial Machine Learning at Scale" (2016.11) [paper]
  • Mandry et al., "Towards Deep Learning Models Resistant to Adversarial Attacks" (2017.6) [paper]
  • Carlini et al., "Audio Adversarial Examples: Targeted Attacks on Speech-to-Text" (2018.1) [paper]

Neural architecture search (NAS) and AutoML

  • GREAT AutoML Website [site]
    • They maintain a blog, a list of NAS literatures, analysis page, and a web book.
  • AdaNet (2016.7) [paper] [GoogleAI blog]
    • Cortes et al. "AdaNet: Adaptive Structural Learning of Artificial Neural Networks"
  • NAS (2016.12) [paper]
    • Zoph et al. "Neural Architecture Search with Reinforcement Learning"
  • PNAS (2017.12) [paper]
    • Liu et al. "Progressive Neural Architecture Search"
  • ENAS (2018.2) [paper]
    • Pham et al. "Efficient Neural Architecture Search via Parameter Sharing"
  • DARTS (2018.6) [paper]
    • Liu et al. "DARTS: Differentiable Architecture Search"
    • Uses a continuous relaxation over the discrete neural architecture space.
  • RandWire (2019) [paper]
    • Xie et al. "Exploring Randomly Wired Neural Networks for Image Recognition" [Facebook AI Research]
  • A Survey on Neural Architecture Search (2019) [paper]
    • Witsuba et al., "A Survey on Neural Architecture Search"

Practical Techniques

DL roadmap reference

Theory

Resources

  • A Selective Overview of Deep Learning (2019) [paper]
    • Fan et al. "A Selective Overview of Deep Learning"
    • A nice overview paper on deep learning up to early 2019 (about 30 pages)