Applied Deep Learning (YouTube Playlist)

Course Objectives & Prerequisites:

This is a two-semester-long course primarily designed for graduate students. However, undergraduate students with demonstrated strong backgrounds in probability, statistics (e.g., linear & logistic regressions), numerical linear algebra and optimization are also welcome to register. We will be pursuing the objective of familiarizing the students with state-of-the-art deep learning techniques employed in the industry. Deep learning is a field that has been witnessing a mini-revolution every few months. It is therefore very important that the students registering for this course are eager to learn new concepts. So much of deep learning is just software engineering. Consequently, the students should be able to write clean code while doing their assignments. Python will be the programming language used in this course. Familiarity with TensorFlow and PyTorch is a plus but is not a requirement. However, it is very important that the students are willing to do the hard work to learn and use these two frameworks as the course progresses.

Part I Topics (Fall Semester)

Training Deep Neural Networks (Lecture Notes) (YouTube Playlist)
Computer Vision
- Image Classification
  - Large Networks (Lecture Notes) (YouTube Playlist)
  - Small Networks (Lecture Notes) (YouTube Playlist)
  - AutoML (Lecture Notes) (YouTube Playlist)
  - Robustness (Lecture Notes) (YouTube Playlist)
  - Visualizing & Understanding (Lecture Notes) (YouTube Playlist)
  - Transfer Learning (Lecture Notes) (YouTube Playlist)
- Image Transformation
  - Semantic Segmentation (Lecture Notes) (YouTube Playlist)
  - Super-Resolution, Denoising, and Colorization (Lecture Notes) (YouTube Playlist)
  - Pose Estimation (Lecture Notes) (YouTube Playlist)
  - Optical Flow and Depth Estimation (Lecture Notes) (YouTube Playlist)
- Object Detection
  - Two Stage Detectors (Lecture Notes) (YouTube Playlist)
  - One Stage Detectors (Lecture Notes) (YouTube Playlist)
- Face Recognition and Detection (Lecture Notes) (YouTube Playlist)
- Video (Lecture Notes) (YouTube Playlist)
- 3D (Lecture Notes) (YouTube Playlist)

Part II Topics (Spring Semester)

Natural Language Processing
- Word Representations (Lecture Notes) (YouTube Playlist)
- Text Classification (Lecture Notes) (YouTube Playlist)
- Neural Machine Translation (Lecture Notes) (YouTube Playlist)
- Language Modeling (Lecture Notes) (YouTube Playlist)
Multimodal Learning (Lecture Notes) (YouTube Playlist)
Generative Networks (YouTube Playlist)
- Variational Auto-Encoders (Lecture Notes) (YouTube Playlist)
- Unconditional GANs (Lecture Notes) (YouTube Playlist)
- Conditional GANs (Lecture Notes) (YouTube Playlist)
- Diffusion Models (Lecture Notes)
Advanced Topics
- Domain Adaptation (Lecture Notes) (YouTube Playlist)
- Few Shot Learning (Lecture Notes) (YouTube Playlist)
- Federated Learning (Lecture Notes) (YouTube Playlist)
- Semi-Supervised Learning (Lecture Notes) (YouTube Playlist)
- Self-Supervised Learning (Lecture Notes) (YouTube Playlist)
Speech & Music (YouTube Playlist)
- Recognition (Lecture Notes) (YouTube Playlist)
- Synthesis (Lecture Notes) (YouTube Playlist)
- Modeling (Lecture Notes) (YouTube Playlist)
Reinforcement Learning (YouTube Playlist)
- Games (Lecture Notes) (YouTube Playlist)
- Simulated Environments (Lecture Notes) (YouTube Playlist)
- Real Environments (Lecture Notes) (YouTube Playlist)
- Uncertainty Quantification & Multitask Learning (Lecture Notes) (YouTube Playlist)
Graph Neural Networks (Lecture Notes) (YouTube Playlist)
Recommender Systems (Lecture Notes) (YouTube Playlist)
Computational Biology (Lecture Notes)

References

Training Deep Neural Networks

An overview of gradient descent optimization algorithms

Computer Vision; Image Classification; Large Networks

Multi-column Deep Neural Networks for Image Classification
ImageNet Classification with Deep Convolutional Neural Networks (code)
Dropout: A Simple Way to Prevent Neural Networks from Overfitting (code)
Network In Network
Very Deep Convolutional Networks for Large-Scale Image Recognition (code)
Going Deeper with Convolutions
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Rethinking the Inception Architecture for Computer Vision
Training Very Deep Networks
Deep Residual Learning for Image Recognition (code)
Identity Mappings in Deep Residual Networks (code)
Deep Networks with Stochastic Depth (code)
Wide Residual Networks (code)
Aggregated Residual Transformations for Deep Neural Networks (code)
Densely Connected Convolutional Networks (code)
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
mixup: Beyond Empirical Risk Minimization (code)
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour (code)
SGDR: Stochastic Gradient Descent with Warm Restarts (code)
Decoupled Weight Decay Regularization (code)
Residual Attention Network for Image Classification
Squeeze-and-Excitation Networks (code)
CBAM: Convolutional Block Attention Module (code)
ResNeSt: Split-Attention Networks (code)
Random Erasing Data Augmentation (code)
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features (code)
Neural Ordinary Differential Equations (code)
Spatial Transformer Networks
Dynamic Routing Between Capsules
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (code)
MLP-Mixer: An all-MLP Architecture for Vision (code)
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
High-Performance Large-Scale Image Recognition Without Normalization (code)
A ConvNet for the 2020s (code)

Computer Vision; Image Classification; Small Networks

Distilling the Knowledge in a Neural Network
Learning both Weights and Connections for Efficient Neural Networks
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding (code)
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size (code)
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks (code)
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications (code)
Xception: Deep Learning with Depthwise Separable Convolutions (code)
MobileNetV2: Inverted Residuals and Linear Bottlenecks (code)
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices (code)
ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design
CSPNet: A New Backbone that can Enhance Learning Capability of CNN (code) (code)

Computer Vision; Image Classification; AutoML

Neural Architecture Search With Reinforcement Learning (code)
Learning Transferable Architectures for Scalable Image Recognition
Regularized Evolution for Image Classifier Architecture Search (code)
Evolving Deep Neural Networks
Efficient Neural Architecture Search via Parameter Sharing (code)
DARTS: Differentiable Architecture Search (code)
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (code)
MnasNet: Platform-Aware Neural Architecture Search for Mobile (code)
Searching for MobileNetV3
Designing Network Design Spaces (code)
AutoAugment: Learning Augmentation Strategies from Data
RandAugment: Practical Automated Data Augmentation with a Reduced Search Space

Computer Vision; Image Classification; Robustness

Intriguing properties of neural networks
Explaining and harnessing adversarial examples
Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images
DeepFool: a simple and accurate method to fool deep neural networks (code)
Adversarial Examples in the Physical World
The Limitations of Deep Learning in Adversarial Settings
Practical Black-Box Attacks against Machine Learning
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
Towards Evaluating the Robustness of Neural Networks (code)
Towards Deep Learning Models Resistant to Adversarial Attacks (code)
Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples (code)
Ensemble Adversarial Training: Attacks and Defenses (code)
One Pixel Attack for Fooling Deep Neural Networks

Computer Vision; Image Classification; Visualizing & Understanding

Visualizing and Understanding Convolutional Networks
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
Striving for Simplicity: The All Convolutional Net
Methods for interpreting and understanding deep neural networks (code)
“Why Should I Trust You?” Explaining the Predictions of Any Classifier (code)
Learning Deep Features for Discriminative Localization (code)
Understanding Deep Learning Requires Rethinking Generalization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization (code)
A Unified Approach to Interpreting Model Predictions (code)
Learning Important Features Through Propagating Activation Differences (code)
Axiomatic Attribution for Deep Networks (code)
On Calibration of Modern Neural Networks (code)
Understanding the role of individual units in a deep neural network (code)
Do Vision Transformers See Like Convolutional Neural Networks?

Computer Vision; Image Classification; Transfer Learning

How transferable are features in deep neural networks? (code)
DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition (code)
CNN Features off-the-shelf: an Astounding Baseline for Recognition
Return of the Devil in the Details: Delving Deep into Convolutional Nets (code)
Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks (code)

Computer Vision; Image Transformation; Semantic Segmentation

Fully Convolutional Networks for Semantic Segmentation (code)
Learning Deconvolution Network for Semantic Segmentation (code)
U-Net: Convolutional Networks for Biomedical Image Segmentation (code)
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs (code)
Conditional Random Fields as Recurrent Neural Networks (code)
Multi-scale Context Aggregation by Dilated Convolutions (code)
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
Pyramid Scene Parsing Network (code)
Rethinking Atrous Convolution for Semantic Image Segmentation
What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?
RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation (code)
Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation (code)
Dual Attention Network for Scene Segmentation (code)
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers (code) (code)

Computer Vision; Image Transformation; Super-Resolution, Denoising, and Colorization

Learning a Deep Convolutional Network for Image Super-Resolution (code)
Perceptual Losses for Real-Time Style Transfer and Super-Resolution
Image Style Transfer Using Convolutional Neural Networks (code)
Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization (code)
Accurate Image Super-Resolution Using Very Deep Convolutional Networks (code)
Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network
Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising (code)
Enhanced Deep Residual Networks for Single Image Super-Resolution (code)
Deep Image Prior (code)
Residual Dense Network for Image Super-Resolution (code)
Image Super-Resolution Using Very Deep Residual Channel Attention Networks (code)
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric (code)
Colorful Image Colorization (code)

Computer Vision; Pose Estimation

DeepPose: Human Pose Estimation via Deep Neural Networks
Convolutional Pose Machines (code)
Stacked Hourglass Networks for Human Pose Estimation (code)
Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields (code)
Deep High-Resolution Representation Learning for Human Pose Estimation (code)

Computer Vision; Image Transformation; Optical Flow and Depth Estimation

FlowNet: Learning Optical Flow with Convolutional Networks
FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks (code)
PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume (code)
Depth Map Prediction from a Single Image using a Multi-Scale Deep Network
Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture (code)
Unsupervised Monocular Depth Estimation with Left-Right Consistency (code)
Unsupervised Learning of Depth and Ego-Motion from Video (code)
Robust Consistent Video Depth Estimation (code)

Computer Vision; Object Detection; Two Stage Detectors

A Survey on Performance Metrics for Object-Detection Algorithms (code)
Rich feature hierarchies for accurate object detection and semantic segmentation (code)
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
Fast R-CNN (code)
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (code)
R-FCN: Object Detection via Region-based Fully Convolutional Networks (code)
Feature Pyramid Networks for Object Detection
Deformable Convolutional Networks (code)
Mask R-CNN (code)
Cascade R-CNN: Delving into High Quality Object Detection (code)

Computer Vision; Object Detection; One Stage Detectors

OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks (code)
You Only Look Once: Unified, Real-Time Object Detection (code)
SSD: Single Shot MultiBox Detector (code)
YOLO9000: Better, Faster, Stronger (code)
Focal Loss for Dense Object Detection (code)
Speed/Accuracy Trade-Offs For Modern Convolutional Object Detectors
YOLOv3: An Incremental Improvement (code)
CornerNet: Detecting Objects as Paired Keypoints (code)
FCOS: Fully Convolutional One-Stage Object Detection (code)
Objects as Points (code)
EfficientDet: Scalable and Efficient Object Detection (code)
YOLOv4: Optimal Speed and Accuracy of Object Detection (code)
End-to-End Object Detection with Transformers (code)
Deformable DETR: Deformable Transformers for End-to-End Object Detection (code)

Computer Vision; Face Recognition and Detection

DeepFace: Closing the Gap to Human-Level Performance in Face Verification
FaceNet: A Unified Embedding for Face Recognition and Clustering
Deep Face Recognition
Deep Learning Face Attributes in the Wild
Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks (code)
A Discriminative Feature Learning Approach for Deep Face Recognition
In Defense of the Triplet Loss for Person Re-Identification (code)
SphereFace: Deep Hypersphere Embedding for Face Recognition (code)
ArcFace: Additive Angular Margin Loss for Deep Face Recognition (code)

Computer Vision; Video

3D Convolutional Neural Networks for Human Action Recognition
Large-scale Video Classification with Convolutional Neural Networks (code)
Two-Stream Convolutional Networks for Action Recognition in Videos
Learning Spatiotemporal Features with 3D Convolutional Networks (code)
Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors (code)
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition (code)
Convolutional Two-Stream Network Fusion for Video Action Recognition (code)
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset (code)
A Closer Look at Spatiotemporal Convolutions for Action Recognition (code)
Non-local Neural Networks (code)
Group Normalization (code)
SlowFast Networks for Video Recognition (code)
Learning Multi-Domain Convolutional Neural Networks for Visual Tracking (code)
Fully-Convolutional Siamese Networks for Object Tracking (code)

Computer Vision; 3D

V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation (code)
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation (code)
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space (code)
Dynamic Graph CNN for Learning on Point Clouds (code)
Point Transformer
VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (code)

Natural Language Processing; Word Representations

Linguistic Regularities in Continuous Space Word Representations
Distributed Representations of Words and Phrases and their Compositionality
Efficient Estimation of Word Representations in Vector Space (code)
GloVe: Global Vectors for Word Representation (code)
Enriching Word Vectors with Subword Information (code)

Natural Language Processing; Text Classification

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank (code)
Convolutional Neural Networks for Sentence Classification (code)
Distributed Representations of Sentences and Documents
Effective Use of Word Order for Text Categorization with Convolutional Neural Networks (code)
A Convolutional Neural Network for Modelling Sentences
A Sensitivity Analysis Of (And Practitioners' Guide To) Convolutional Neural Networks For Sentence Classification
Character-level Convolutional Networks for Text Classification (code)
Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks (code)
Bag Of Tricks For Efficient Text Classification (code)
Hierarchical Attention Networks for Document Classification
Bidirectional LSTM-CRF Models for Sequence Tagging
Neural Architectures For Named Entity Recognition (code) (code)
End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF
Universal Language Model Fine-tuning for Text Classification (code)

Natural Language Processing; Neural Machine Translation

Neural Machine Translation by Jointly Learning to Align and Translate
Sequence to Sequence Learning with Neural Networks
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
On the Properties of Neural Machine Translation: Encoder–Decoder Approaches
Effective Approaches to Attention-based Neural Machine Translation (code)
Neural Machine Translation Of Rare Words With Subword Units (code)
Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Convolutional Sequence to Sequence Learning (code)
Attention Is All You Need (code)
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing (code)
Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates
Reformer: The Efficient Transformer (code)
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention (code)
Rethinking Attention with Performers (code)

Natural Language Processing; Language Modeling

Deep contextualized word representations (code)
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling (code)
Improving Language Understanding by Generative Pre-Training (code)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (code)
Language Models are Unsupervised Multitask Learners (code)
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations (code)
RoBERTa: A Robustly Optimized BERT Pretraining Approach (code)
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (code) (code)
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (code)
XLNet: Generalized Autoregressive Pretraining for Language Understanding (code)
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (code)
Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks (code)
Cross-lingual Language Model Pretraining (code)
Unsupervised Cross-lingual Representation Learning at Scale (code)
SpanBERT: Improving Pre-training by Representing and Predicting Spans (code)
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Longformer: The Long-Document Transformer (code)
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks (code)
Language Models are Few-Shot Learners (code)
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (code)
SimCSE: Simple Contrastive Learning of Sentence Embeddings (code)
Pay Attention to MLPs
Evaluating Large Language Models Trained on Code (code)
The Curious Case of Neural Text Degeneration (code)

Multimodal Learning

Long-term Recurrent Convolutional Networks for Visual Recognition and Description
Show and Tell: A Neural Image Caption Generator
Deep Visual-Semantic Alignments for Generating Image Descriptions
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (code)
Layer Normalization
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering (code)
Generative Adversarial Text to Image Synthesis (code)
StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks (code)
Learning Transferable Visual Models From Natural Language Supervision (code)
Zero-Shot Text-to-Image Generation (code)
Perceiver IO: A General Architecture for Structured Inputs & Outputs (code)

Generative Networks; Variational Auto-Encoders

Auto-Encoding Variational Bayes
Stochastic Backpropagation and Approximate Inference in Deep Generative Models
beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework
Categorical Reparameterization with Gumbel-Softmax

Generative Networks; Unconditional GANs

Generative Adversarial Nets (code)
Unsupervised representation learning with deep convolutional generative adversarial networks (code)
Improved Techniques for Training GANs (code)
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets (code)
Least Squares Generative Adversarial Networks (code)
Wasserstein GAN (code)
Improved Training of Wasserstein GANs (code)
Progressive growing of GANs for improved quality, stability, and variation (code)
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium (code)
Spectral Normalization for Generative Adversarial Networks (code)
Large Scale GAN Training for High Fidelity Natural Image Synthesis (code)
A Style-Based Generator Architecture for Generative Adversarial Networks (code)
Self-Attention Generative Adversarial Networks (code)
Analyzing and Improving the Image Quality of StyleGAN (code)

Generative Networks; Conditional GANs

Conditional Generative Adversarial Nets
Context Encoders: Feature Learning by Inpainting (code)
Conditional Image Synthesis with Auxiliary Classifier GANs
Image-to-Image Translation with Conditional Adversarial Networks (code)
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (code)
Unsupervised Image-to-Image Translation Networks (code)
Multimodal Unsupervised Image-to-Image Translation (code)
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network
ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks (code)
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs (code)
StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation (code)
Semantic Image Synthesis with Spatially-Adaptive Normalization (code)

Generative Networks; Diffusion Models

Denoising Diffusion Probabilistic Models (code)
Diffusion Models Beat GANs on Image Synthesis (code)
Score-Based Generative Modeling through Stochastic Differential Equations (code)

Advanced Topics; Domain Adaptation

Learning Transferable Features with Deep Adaptation Networks (code)
Domain-Adversarial Training of Neural Networks (code)
Adversarial Discriminative Domain Adaptation
Unsupervised Pixel–Level Domain Adaptation with Generative Adversarial Networks
CyCADA: Cycle-Consistent Adversarial Domain Adaptation (code)

Advanced Topics; Few-shot Learning

Matching Networks for One Shot Learning
Prototypical Networks for Few-shot Learning (code)
Learning to Compare: Relation Network for Few-Shot Learning

Advanced Topics; Federated Learning

Communication-Efficient Learning of Deep Networks from Decentralized Data
Federated Learning: Strategies for Improving Communication Efficiency
How To Backdoor Federated Learning (code)
Deep Learning with Differential Privacy

Advanced Topics; Semi-Supervised Learning

Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning (code) (code)
Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results (code)
MixMatch: A Holistic Approach to Semi-Supervised Learning (code)
Self-training with Noisy Student improves ImageNet classification (code)
FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence (code)
Training data-efficient image transformers & distillation through attention (code) (code)

Advanced Topics; Self-Supervised Learning

Deep Clustering for Unsupervised Learning of Visual Features (code)
Data-Efficient Image Recognition with Contrastive Predictive Coding
Contrastive Multiview Coding (code)
Momentum Contrast for Unsupervised Visual Representation Learning (code)
Self-Supervised Learning of Pretext-Invariant Representations (code)
A Simple Framework for Contrastive Learning of Visual Representations (code)
Supervised Contrastive Learning (code) (code)
Big Self-Supervised Models are Strong Semi-Supervised Learners (code) (data)
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning (code)
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments (code)
Exploring Simple Siamese Representation Learning (code)
Emerging Properties in Self-Supervised Vision Transformers (code)
BEIT: BERT Pre-Training of Image Transformers (code)
VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning (code)
DiT: Self-supervised Pre-training for Document Image Transformer (code) (code)
Unsupervised Semantic Segmentation By Distilling Feature Correspondences (code)

Speech & Music; Recognition

Mel-Spectrogram and Mel-Frequency Cepstral Coefficients (MFCCs)
Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks
Speech Recognition with Deep Recurrent Neural Networks
Towards End-to-End Speech Recognition with Recurrent Neural Networks
Deep Speech: Scaling up end-to-end speech recognition
LSTM: A Search Space Odyssey
Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin
X-vectors: Robust DNN Embeddings for Speaker Recognition (code)
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
Jasper: An End-to-End Convolutional Neural Acoustic Model (code)

Speech & Music; Synthesis

Generating Sequences With Recurrent Neural Networks
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling (code)
WaveNet: A Generative Model for Raw Audio
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis (code) (code) (code) (code) (code) (code)

Speech & Music; Modeling

Representation Learning with Contrastive Predictive Coding
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (code)
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units (code) (code)
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language (code)
Generative Spoken Dialogue Language Modeling (code) (code) (code)

Reinforcement Learning; Games

Playing Atari with Deep Reinforcement Learning
Human-level Control through Deep Reinforcement Learning
Deep Reinforcement Learning with Double Q-Learning
Prioritized Experience Replay
Dueling Network Architectures for Deep Reinforcement Learning
Rainbow: Combining Improvements in Deep Reinforcement Learning
Mastering the game of Go with deep neural networks and tree search
Mastering the game of Go without human knowledge
A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
Grandmaster level in StarCraft II using multi-agent reinforcement learning (code)

Reinforcement Learning; Simulated Environments

Continuous Control with Deep Reinforcement Learning
Trust Region Policy Optimization (code)
Conjugate Gradient Method
Asynchronous Methods for Deep Reinforcement Learning
High-Dimensional Continuous Control Using Generalized Advantage Estimation
Proximal Policy Optimization Algorithms
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor (code)

Reinforcement Learning; Real Environments

End to End Learning for Self-Driving Cars
End-To-End Training Of Deep Visuomotor Policies
Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection
Learning Dexterous In-Hand Manipulation

Reinforcement Learning; Uncertainty Quantification & Multitask Learning

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning (code)
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (code) (code)
Overcoming catastrophic forgetting in neural networks

Graph Neural Networks

Translating Embeddings for Modeling Multi-relational Data (code)
DeepWalk: Online Learning of Social Representations (code)
LINE: Large-scale Information Network Embedding (code)
node2vec: Scalable Feature Learning for Networks (code)
Semi-Supervised Classification with Graph Convolutional Networks (code)
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (code)
Inductive Representation Learning on Large Graphs (code)
Graph Attention Networks (code)
How Powerful Are Graph Neural Networks? (code)
Modeling Relational Data with Graph Convolutional Networks (code)

Recommender Systems

Session-based Recommendations with Recurrent Neural Networks (code)
AutoRec: Autoencoders Meet Collaborative Filtering
Wide & Deep Learning for Recommender Systems
Neural Collaborative Filtering (code)
Neural Factorization Machines for Sparse Predictive Analytics (code)
DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks (code)
Variational Autoencoders for Collaborative Filtering (code)
Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding (code)
Deep Learning Recommendation Model for Personalization and Recommendation Systems (code)

Computational Biology

Improved Protein Structure Prediction using Potentials from Deep Learning (code)
Highly Accurate Protein Structure Prediction with AlphaFold (code)

Name		Name	Last commit message	Last commit date
Latest commit History 380 Commits
01 - Computer Vision		01 - Computer Vision
02 - Natural Language Processing		02 - Natural Language Processing
04 - Generative Networks		04 - Generative Networks
05 - Advanced Topics		05 - Advanced Topics
06 - Speech & Music		06 - Speech & Music
07 - Reinforcement Learning		07 - Reinforcement Learning
00 - Training.pdf		00 - Training.pdf
03 - Multimodal Learning.pdf		03 - Multimodal Learning.pdf
08 - Graph Neural Networks.pdf		08 - Graph Neural Networks.pdf
09 - Recommender Systems.pdf		09 - Recommender Systems.pdf
10 - Computational Biology.pdf		10 - Computational Biology.pdf
README.md		README.md

aifylabs/Applied-Deep-Learning

Folders and files

Latest commit

History

Repository files navigation