- CVPR17 Deep Learning Math Tutorial
- JHU 2018 Deep Learning Math Tutorial
- NYU MathsDL-spring19
- Understanding the Neural Tangent Kernel
- Neural Tangent Kernel (NTK) Made Practical (Hu, 2020)
- Theory of Deep Learning Seminars @ Northwestern
Representation Learning
Generalization
- THE DEEP BOOTSTRAP: GOOD ONLINE LEARNERS ARE GOOD OFFLINE GENERALIZERS
- Understanding the Failure Modes of Out-of-Distribution Generalization
- Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers (Zhu, 2019)
Optimization
- Adversarial Examples Are Not Bugs, They Are Features
- Neural Tangent Kernel: Convergence and Generalization in Neural Networks (Jacot, 2018)
- (Li, 2018)
- Investigating Learning in Deep Neural Networks
using Layer-Wise Weight Change (Agrawal, 2020)
- Deeper layers change faster than shallower layers
- Does this have any ramifications on transfer learning practice? (Freeze inital layers and retrain classifier)
- Greg Yang says here around 26:00 that later layers have larger gradient than earlier layers. Which would explain this.
Gradient Descent
- GRADIENT DESCENT PROVABLY OPTIMIZES OVER-PARAMETERIZED NEURAL NETWORKS (Du, 2019)
- Gradient Descent Finds Global Minima of Deep Neural Networks (Du, 2019)
- Gradient Starvation:A Learning Proclivity in Neural Networks (Pezeshki,2020)
Network Design
HyperNetworks