My own deep learning mastery roadmap, inspired by Deep Learning Papers Reading Roadmap.
There are some customized differences:
- not only academic papers but also blog posts, online courses, and other references are included
- customized for my own plans - may not include RL, NLP, etc.
- updated for 2019 SOTA
- AlexNet (2012) [paper]
- Alex Krizhevsky et al. "ImageNet Classification with Deep Convolutional Neural Networks"
- ZFNet (2013) [paper]
- Zeiler et al. "Visualizing and Understanding Convolutional Networks"
- VGG (2014)
- GoogLeNet, a.k.a Inception v.1 (2014) [paper]
- Szegedy et al. "Going Deeper with Convolutions" [Google]
- Original LeNet page from Yann LeCun's homepage.
- Inception v.2 and v.3 (2015) Szegedy et al. "Rethinking the Inception Architecture for Computer Vision" [paper]
- Inception v.4 and InceptionResNet (2016) Szegedy et al. "Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning" [paper]
- "A Simple Guide to the Versions of the Inception Network" [blogpost]
- ResNet (2015) [paper]
- He et al. "Deep Residual Learning for Image Recognition"
- Xception (2016) [paper]
- Chollet, Francois - "Xception: Deep Learning with Depthwise Separable Convolutions"
- MobileNet (2016) [paper]
- Howard et al. "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications"
- A nice paper about reducing CNN parameter sizes while maintaining performance.
- DenseNet (2016) [paper]
- Huang et al. "Densely Connected Convolutional Networks"
- GAN (2014.6) [paper]
- Goodfellow et al. "Generative Adversarial Networks"
- DCGAN (2015.11) [paper]
- Radford et al. "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks"
- Info GAN (2016.6) [paper]
- Chen et al. "InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets"
- Improved Techinques for Training GANs (2016.6) [paper]
- Salimans et al. "Improved Techinques for Training GANs"
- This paper suggests multiple GAN training techinques such as feautre matching, minibatch discrimination, one sided label smoothing, virtual batch normalization.
- It also suggests a renown generator performance metric, called the inception score.
- f-GAN (2016.6) [paper]
- Nowozin et al. "f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization"
- Unrolled GAN (2016.7) [paper]
- Metz et al. "Unrolled Generative Adversarial Networks"
- ACGAN (2016.10) [paper]
- Odena et al. "Conditional Image Synthesis With Auxiliary Classifier GANs"
- LSGAN (2016.11) [paper]
- Mao et al. "Least Squares Generative Adversarial Networks"
- Pix2Pix (2016.11) [paper]
- Isola et al. "Image-to-Image Translation with Conditional Adversarial Networks"
- EBGAN (2016.11) [paper]
- Zhao et al. "Energy-based Generative Adversarial Network"
- WGAN (2017.4) [paper]
- Arjovsky et al., "Wasserstein GAN"
- WGAN_GP (2017.5) [paper]
- Gulrajani et al., "Improved Training of Wasserstein GANs"
- Improves the training stability by applying "gradient penalty (GP)" to the loss function
- BEGAN (2017.5) [paper]
- Berthelot et al. "BEGAN: Boundary Equilibrium Generative Adversarial Networks"
- Introduces a diversity ratio, or an equilibrium constant that controls the variety - quality tradeoff, and also proposes a convergence measure using it.
- CycleGAN (2017.5) [paper]
- DiscoGAN (2017.5) [paper]
- DiscoGAN and CycleGAN proposes the EXACT SAME learning techniques for style transfer task using GAN, developed independently at the same time.
- Frechet Inception Distance (FID) (2017.6) [paper]
- Heusel et al. "GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium"
- The paper's main contribution is a technique called Two Time-Scale Update Rule (TTSU), but it is mostly known for the distance metric called Frechet Inception Distance that measures the distance between two distributions of activation values.
- ProGAN (2017.10) [paper]
- Karras et al. "Progressive Growing of GANs for Improved Quality, Stability, and Variation"
- PacGAN (2017.12) [paper]
- Higgins et al. "PacGAN: The power of two samples in generative adversarial networks"
- BigGAN (2018) [paper]
- GauGAN (2019.3) [paper]
- Park et al. "Semantic Image Synthesis with Spatially-Adaptive Normalization"
- DRAGAN (2017.5) [paper]
- Kodali et al. "On Convergence and Stability of GANs"
- Are GANs Created Equal? (2017.11) [paper]
- Lucic et al. "Are GANs Created Equal? A Large-Scale Study"
- SGAN (2017.12) [paper]
- Chavdarova et al. "SGAN: An Alternative Training of Generative Adversarial Networks"
- MaskGAN (2018.1) [paper]
- Fedus et al. "MaskGAN: Better Text Generation via Filling in the _____"
- Spectral Normalization (2018.2) [paper]
- Miyato et al. "Spectral Normalization for Generative Adversarial Networks"
- SAGAN (2018.5) [paper] [tensorflow]
- Zhang et al. "Self-Attention Generative Adversarial Networks"
- Unusual Effectiveness of Averaging in GAN Training (2018) [paper]
- "Benefitting from training on past snapshots."
- Uses exponential moving averaging (EMA)
- Disconnected Manifold Learning (2018.6) [paper]
- Khayatkhoei, et al. "Disconnected Manifold Learning for Generative Adversarial Networks"
- A Note on the Inception Score (2018.6) [paper]
- Barratt et al., "A Note on the Inception Score"
- Which Training Methods for GAN do actually converge? (2018.7) [paper]
- Mescheder et al., "Which Training Methods for GANs do actually Converge?"
- GAN Dissection (2018.11) [paper]
- Bau et al. "GAN Dissection: Visualizing and Understanding Generative Adversarial Networks"
- Improving Generalization and Stability for GANs (2019.2) [paper]
- Thanh-Tung et al., "Improving Generalization and Stability of Generative Adversarial Networks"
- Augustus Odena - "Open Questions about GANs" (2019.4) [distill.pub]
- Very nice article about current state of GAN research and discusses problems yet to be solved.
- Original autoencoder (1986) [paper]
- Rumelhart, Hinton, and Williams, "Learning Internal Representations by Error Propagation"
- AutoEncoder [science]
- Hinton et al., "Reducing the Dimensionality of Data with Neural Networks"
- Denoising Autoencoders (2008) [paper]
- Vincent et al. "Extracting and Composing Robust Features with Denoising Autoencoders"
- Wasserstein Autoencoder (2017) [paper]
- Tolstikhin et al. "Wasserstein Auto Encoders"
- PixelCNN (2016) [paper]
- van den Oord et al. "Conditional image generation with PixelCNN decoders."
- WaveNet (2016) [paper]
- van den Oord et al. "WaveNet: A Generative Model for Raw Audio"
- tacotron?
- Batch Normalization (2015.2) [paper]
- Ioeffe et al. "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift"
- Group Norm
- Instance Normalization (2016.7) [paper]
- Ulyanov et al. "Instance Normalization: The Missing Ingredient for Fast Stylization"
- Santurkar et al. "How does Batch Normalization help Optimization?" (2018.5) [paper]
- Switchable Normalization (2019) [paper]
- Luo et al. "Differentiable Learning-to-Normalize via Switchable Normalization"
- Weight Standardization (2019.3) [paper]
- Qiao et al. "Weight Standardization"
- Xavier Initialization (2010) [paper]
- Glorot et al., "Understanding the difficulty of training deep feedforward neural networks"
- Kaiming (He) Initialization (2015.2) [paper]
- He et al., "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification"
- All you need is a good init (2015.11) [paper]
- Mishkin et al., "All you need is a good init"
- All you need is beyond a good init (2017.4) [paper]
- Xie et al. "All You Need is Beyond a Good Init: Exploring Better Solution for Training Extremely Deep Convolutional Neural Networks with Orthonormality and Modulation"
- Dropout (2014) [paper]
- Srivastava et al. "Dropout: A Simple Way to Prevent Neural Networks from Overfitting"
- Inverted Dropouts [notes on CS231n]
- Multiplying the inverted keep_prob value on training so that values during inference (or testing) is consistent.
- Li et al., "Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift" (2018.1) [paper]
- Zero-Data Learning (2008) [paper]
- Larochelle et al., "Zero-data Learning of New Tasks"
- Palatucci et al., "Zero-shot Learning with Semantic Output Codes" (NIPS 2009) [paper]
- Socher et al., "Zero-Shot Learning Through Cross-Modal Transfer" (2013.1) [paper]
- Lampert et al., "Attribute-Based Classification for Zero-Shot Visual Object Categorization" (2013.7) [paper]
- Dinu et al., "Improving zero-shot learning by mitigating the hubness problem" (2014.12) [paper]
- Romera-Paredes et al. - "An embarrassingly simple approach to zero-shot learning" (2015) [paper]
- Prototypical Networks (2017.3) [paper]
- Snell et al., "Prototypical Networks for Few-shot Learning"
- Zero-shot learning - the Good, the Bad and the Ugly" (2017.3) [paper]
- Xian et al., "Zero-Shot Learning - The Good, the Bad and the Ugly"
- In defence of the Triplet Loss (2017.3) [paper]
- Hermans et al., "In Defense of the Triplet Loss for Person Re-Identification"
- MAML (2017.3) [paper]
- Finn et al, "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks"
- Triplet Loss and Online Triplet Mining in Tensorflow (2018.3) [Oliver Moindrot Blog]
- Few-Shot learning Survey (2019.4) [paper]
- Wang et al. "Few-shot Learning: A Survey"
- Survey 2018 (2018) [paper]
- Tan et al. "A Survey on Deep Transfer Learning"
- Geometric Deep Learning (2016) [paper]
- Bronstein et al. "Geometric deep learning: going beyond Euclidean data"
- VQ-VAE (2017.11) [paper]
- van den Oord et al., "Neural Discrete Representation Learning"
- Semi-Amortized Variational Autoencoders (2018.2) [paper]
- Kim et al. "Semi-Amortized Variational Autoencoders"
- RCNN: https://arxiv.org/abs/1311.2524
- Fast-RCNN: https://arxiv.org/abs/1504.08083
- Faster-RCNN: https://arxiv.org/abs/1506.01497
- SSD: https://arxiv.org/abs/1512.02325
- YOLO: https://arxiv.org/abs/1506.02640
- YOLO9000: https://arxiv.org/abs/1612.08242
- FCN: https://arxiv.org/abs/1411.4038
- SegNet: https://arxiv.org/abs/1511.00561
- UNet: https://arxiv.org/abs/1505.04597
- PSPNet: https://arxiv.org/abs/1612.01105
- DeepLab: https://arxiv.org/abs/1606.00915
- ICNet: https://arxiv.org/abs/1704.08545
- ENet: https://arxiv.org/abs/1606.02147
- Nice survey
- Seq2Seq (2014) [paper]
- Sutskever et al. "Sequence to sequence learning with neural networks."
- Neural Turing Machines (2014) [paper]
- Graves et al., "Neural turing machines."
- Pointer Networks (2015) [paper]]
- Vinyals et al., "Pointer networks."
- NMT (Neural Machine Translation) (2014) [paper]
- Bahdanau et al, "Neural Machine Translation by Jointly Learning to Align and Translate"
- Stanford Attentive Reader (2016.6) [paper]
- Chen et al. "A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task"
- BiDAF (2016.11) [paper]
- Seo et al. "Bidirectional Attention Flow for Machine Comprehension"
- DrQA or Stanford Attentive Reader++ (2017.3) [paper]
- Chen et al. "Reading Wikipedia to Answer Open-Domain Questions"
- Transformer (2017.8) [paper] [google ai blog]
- Vaswani et al. "Attention is all you need"
- [read] Lilian Weng - "Attention? Attention!" (2018) [blog_post]
- A nice explanation of attention mechanism and its concepts.
- BERT (2018.10) [paper]
- Devlin et al., "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"
- GPT-2 (2019) [paper (pdf)]
- Radford et al. "Language Models are Unsupervised Multitask Learners"
- Unitary evolution RNNs : https://arxiv.org/abs/1511.06464
- Recurrent Batch Norm : https://arxiv.org/abs/1603.09025
- Zoneout : https://arxiv.org/abs/1606.01305
- IndRNN : https://arxiv.org/abs/1803.04831
- DilatedRNNs : https://arxiv.org/abs/1710.02224
- MobileNet (2016) (see above: Basic CNN Architectures)
- ShuffleNet (2017)
- Zhang et al. "ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices"
- Neural Processes (2018) [paper]
- Garnelo et al. "Neural Processes"
- Attentive Neural Processes (2019) [paper]
- Kim et al. "Attentive Neural Processes"
- A Visual Exploration of Gaussian Processes (2019) [Distill.pub]
- Not a neural process, but gives very nice intuition about Gaussian Processes. Good Read.
- Denoising AE https://www.iro.umontreal.ca/~vincentp/Publications/denoising_autoencoders_tr1316.pdf
- Exemplar Nets https://arxiv.org/abs/1406.6909
- Co-occ https://arxiv.org/abs/1511.06811
- Egomotion https://arxiv.org/abs/1505.01596
- Jigsaw https://arxiv.org/abs/1603.09246
- Context Encoders https://arxiv.org/abs/1604.07379
- Split-brain autoencoders https://arxiv.org/abs/1611.09842
- multi-task self-supervised learning https://arxiv.org/abs/1708.07860
- Audio-visual scene analysis https://arxiv.org/abs/1804.03641
- a survey https://slideplayer.com/slide/13195863/
- Supervising unsupervised learning https://arxiv.org/abs/1709.05262
- Unsupervised Representation Learning by Predicting Image Rotations https://arxiv.org/abs/1803.07728
- Mahjourian et al., "Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints" (2018.2) [paper]
- Gordon et al., "Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras" (2019.4) [paper]
- Shake Shake Regularization (2017.5) [paper]
- Gastaldi, Xavier - "Shake-Shake Regularization"
- MDL (Minimum Description Length)
- Peter Grunwald - "A tutorial introduction to the minimum description length principle" (2004) [paper]
- Grunwald et al., - "Shannon Information and Kolmogorov Complexity" (2010) [paper]
- Dauphin et al. "Identifying and attacking the saddle point problem in high-dimensional non-convex optimization" (2014.6) [paper]
- Choromanska et al. "The Loss Surfaces of Multilayer Networks" (2014.11) [paper]
- argues that non-convexity in NNs are not a huge problem
- Knowledge Distillation (2015.3) [paper]
- Hinton et al., "Distilling the Knowledge in a Neural Network"
- 3-Part Learning Theory by Mostafa Samir
- Deconvolution and Checkerboard Artifacts - Odena (2016) [distill.pub article]
- Keskar et al. "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima" (2016.9) [paper]
- Rethinking Generalization (2016.11) [paper]
- Zhang et al. "Understanding deep learning requires rethinking generalization"
- Information Bottleneck (2017) [paper] [original paper on information bottleneck (2000)] [youtube-talk] [article in quantamagazine]
- Shwartz-Ziv and Tishby, "Opening the Black Box of Deep Neural Networks via Information"
- Neyshabur et al, "Exploring Generalization in Deep Learning" (2017.7) [paper]
- Sun et al., "Revisiting Unreasonable Effectiveness of Data in Deep Learning Era" (2017.7) [paper]
- Super-Convergence (2017.8) [paper]
- Smith et al. - "Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates"
- Don't Decay the Learning Rate, Increase the Batch Size (2017.11) [paper]
- Smith et al. "Don't Decay the Learning Rate, Increase the Batch Size"
- Hestness et al. "Deep Learning Scaling is Predictable, Empirically" (2017.12) [paper]
- Visualizing loss landscape of neural nets (2018) [paper]
- Olson et al., "Modern Neural Networks Generalize on Small Data Sets" (NeurIPS 2018) [paper]
- Lottery Ticket Hypothesis (2018.3) [paper]
- Frankle et al., "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks"
- Empirically showed that zeroing small weights after training, rewinding except zeroed wegiths, and then re-triaining with 'pruned' weights showed even better results.
- Intrinsic Dimension (2018.4) [paper]
- Li et al., "Measuring the Intrinsic Dimension of Objective Landscapes"
- Geirhos et al. "ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness" (2018.11) [paper]
- Belkin et al. "Reconciling modern machine learning and the bias-variance trade-off" (2018.12) [paper]
- Graetz - "How to visualize convolution features in 40 lines of code" (2019) [medium]
- Geiger et al. "Scaling description of generalization with number of parameters in deep learning" (2019.1) [paper]
- Are all layers created equal? (2019.2) [paper]
- Zhang et al. "Are all layers created equal?"
- Lilian Weng - "Are Deep Neural Networks Dramatically Overfitted?" (2019.4) [lil'log]
- Excellent article about generalization and overfitting of deep neural networks
- RobustML site
- Adversarial Examples Szegedy et al. - Intreguing Properties of Neural Networks (2013.12) [paper]
- induces missclassification by applying small perturbations
- this paper was the first to coin the term "Adversarial Example"
- Fast Gradient Sign Attack (FGSM) (2014.12)
- Goodfellow et al., "Explaining and Harnessing Adversarial Examples" (ICLR 2015) [paper]
- This paper presented the famous "panda example" (as also seen in pytorch tutorial)
- Kurakin et al., "Adversarial Machine Learning at Scale" (2016.11) [paper]
- Mandry et al., "Towards Deep Learning Models Resistant to Adversarial Attacks" (2017.6) [paper]
- Carlini et al., "Audio Adversarial Examples: Targeted Attacks on Speech-to-Text" (2018.1) [paper]
- GREAT AutoML Website [site]
- They maintain a blog, a list of NAS literatures, analysis page, and a web book.
- AdaNet (2016.7) [paper] [GoogleAI blog]
- Cortes et al. "AdaNet: Adaptive Structural Learning of Artificial Neural Networks"
- NAS (2016.12) [paper]
- Zoph et al. "Neural Architecture Search with Reinforcement Learning"
- PNAS (2017.12) [paper]
- Liu et al. "Progressive Neural Architecture Search"
- ENAS (2018.2) [paper]
- Pham et al. "Efficient Neural Architecture Search via Parameter Sharing"
- DARTS (2018.6) [paper]
- Liu et al. "DARTS: Differentiable Architecture Search"
- Uses a continuous relaxation over the discrete neural architecture space.
- RandWire (2019) [paper]
- Xie et al. "Exploring Randomly Wired Neural Networks for Image Recognition" [Facebook AI Research]
- A Survey on Neural Architecture Search (2019) [paper]
- Witsuba et al., "A Survey on Neural Architecture Search"
- Andrej Karpathy - "A recipe for training neural networks" (2019) [Andrej Karpathy Blog Post]
- https://github.com/songrotek/Deep-Learning-Papers-Reading-Roadmap
- https://github.com/terryum/awesome-deep-learning-papers
- which DL algorithms should I implement to learn? https://www.reddit.com/r/MachineLearning/comments/8vmuet/d_what_deep_learning_papers_should_i_implement_to/
- The MML(Mathematics for Machine Learning) book
- Andrej Karpathy - Yes You Should Understand Backprop
- Theoretical principles for Deep Learning
- Stanford STATS 385 - Theories of Deep Learning
- CSC 231 notes : http://www.cs.toronto.edu/~rgrosse/courses/csc321_2018/
- A Selective Overview of Deep Learning (2019) [paper]
- Fan et al. "A Selective Overview of Deep Learning"
- A nice overview paper on deep learning up to early 2019 (about 30 pages)