Applied Deep Learning (YouTube Playlist)
This is a two-semester-long course primarily designed for graduate students. However, undergraduate students with demonstrated strong backgrounds in probability, statistics (e.g., linear & logistic regressions), numerical linear algebra and optimization are also welcome to register. We will be pursuing the objective of familiarizing the students with state-of-the-art deep learning techniques employed in the industry. Deep learning is a field that has been witnessing a mini-revolution every few months. It is therefore very important that the students registering for this course are eager to learn new concepts. So much of deep learning is just software engineering. Consequently, the students should be able to write clean code while doing their assignments. Python will be the programming language used in this course. Familiarity with TensorFlow and PyTorch is a plus but is not a requirement. However, it is very important that the students are willing to do the hard work to learn and use these two frameworks as the course progresses.
- Training Deep Neural Networks (Lecture Notes) (YouTube Playlist)
- Computer Vision
- Image Classification
- Large Networks (Lecture Notes) (YouTube Playlist)
- Small Networks (Lecture Notes) (YouTube Playlist)
- AutoML (Lecture Notes) (YouTube Playlist)
- Robustness (Lecture Notes) (YouTube Playlist)
- Visualizing & Understanding (Lecture Notes) (YouTube Playlist)
- Transfer Learning (Lecture Notes) (YouTube Playlist)
- Image Transformation
- Semantic Segmentation (Lecture Notes) (YouTube Playlist)
- Super-Resolution, Denoising, and Colorization (Lecture Notes) (YouTube Playlist)
- Pose Estimation (Lecture Notes) (YouTube Playlist)
- Optical Flow and Depth Estimation (Lecture Notes) (YouTube Playlist)
- Object Detection
- Two Stage Detectors (Lecture Notes) (YouTube Playlist)
- One Stage Detectors (Lecture Notes) (YouTube Playlist)
- Face Recognition and Detection (Lecture Notes) (YouTube Playlist)
- Video (Lecture Notes) (YouTube Playlist)
- 3D (Lecture Notes) (YouTube Playlist)
- Image Classification
- Natural Language Processing
- Word Representations (Lecture Notes) (YouTube Playlist)
- Text Classification (Lecture Notes) (YouTube Playlist)
- Neural Machine Translation (Lecture Notes) (YouTube Playlist)
- Language Modeling (Lecture Notes) (YouTube Playlist)
- Multimodal Learning (Lecture Notes) (YouTube Playlist)
- Generative Networks (YouTube Playlist)
- Variational Auto-Encoders (Lecture Notes) (YouTube Playlist)
- Unconditional GANs (Lecture Notes) (YouTube Playlist)
- Conditional GANs (Lecture Notes) (YouTube Playlist)
- Diffusion Models (Lecture Notes)
- Advanced Topics
- Domain Adaptation (Lecture Notes) (YouTube Playlist)
- Few Shot Learning (Lecture Notes) (YouTube Playlist)
- Federated Learning (Lecture Notes) (YouTube Playlist)
- Semi-Supervised Learning (Lecture Notes) (YouTube Playlist)
- Self-Supervised Learning (Lecture Notes) (YouTube Playlist)
- Speech & Music (YouTube Playlist)
- Recognition (Lecture Notes) (YouTube Playlist)
- Synthesis (Lecture Notes) (YouTube Playlist)
- Modeling (Lecture Notes) (YouTube Playlist)
- Reinforcement Learning (YouTube Playlist)
- Games (Lecture Notes) (YouTube Playlist)
- Simulated Environments (Lecture Notes) (YouTube Playlist)
- Real Environments (Lecture Notes) (YouTube Playlist)
- Uncertainty Quantification & Multitask Learning (Lecture Notes) (YouTube Playlist)
- Graph Neural Networks (Lecture Notes) (YouTube Playlist)
- Recommender Systems (Lecture Notes) (YouTube Playlist)
- Computational Biology (Lecture Notes)
- An overview of gradient descent optimization algorithms
- Multi-column Deep Neural Networks for Image Classification
- ImageNet Classification with Deep Convolutional Neural Networks (code)
- Dropout: A Simple Way to Prevent Neural Networks from Overfitting (code)
- Network In Network
- Very Deep Convolutional Networks for Large-Scale Image Recognition (code)
- Going Deeper with Convolutions
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
- Rethinking the Inception Architecture for Computer Vision
- Training Very Deep Networks
- Deep Residual Learning for Image Recognition (code)
- Identity Mappings in Deep Residual Networks (code)
- Deep Networks with Stochastic Depth (code)
- Wide Residual Networks (code)
- Aggregated Residual Transformations for Deep Neural Networks (code)
- Densely Connected Convolutional Networks (code)
- Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
- mixup: Beyond Empirical Risk Minimization (code)
- Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour (code)
- SGDR: Stochastic Gradient Descent with Warm Restarts (code)
- Decoupled Weight Decay Regularization (code)
- Residual Attention Network for Image Classification
- Squeeze-and-Excitation Networks (code)
- CBAM: Convolutional Block Attention Module (code)
- ResNeSt: Split-Attention Networks (code)
- Random Erasing Data Augmentation (code)
- CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features (code)
- Neural Ordinary Differential Equations (code)
- Spatial Transformer Networks
- Dynamic Routing Between Capsules
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (code)
- MLP-Mixer: An all-MLP Architecture for Vision (code)
- Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
- High-Performance Large-Scale Image Recognition Without Normalization (code)
- A ConvNet for the 2020s (code)
- Distilling the Knowledge in a Neural Network
- Learning both Weights and Connections for Efficient Neural Networks
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding (code)
- SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size (code)
- XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks (code)
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications (code)
- Xception: Deep Learning with Depthwise Separable Convolutions (code)
- MobileNetV2: Inverted Residuals and Linear Bottlenecks (code)
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices (code)
- ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design
- CSPNet: A New Backbone that can Enhance Learning Capability of CNN (code) (code)
- Neural Architecture Search With Reinforcement Learning (code)
- Learning Transferable Architectures for Scalable Image Recognition
- Regularized Evolution for Image Classifier Architecture Search (code)
- Evolving Deep Neural Networks
- Efficient Neural Architecture Search via Parameter Sharing (code)
- DARTS: Differentiable Architecture Search (code)
- EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (code)
- MnasNet: Platform-Aware Neural Architecture Search for Mobile (code)
- Searching for MobileNetV3
- Designing Network Design Spaces (code)
- AutoAugment: Learning Augmentation Strategies from Data
- RandAugment: Practical Automated Data Augmentation with a Reduced Search Space
- Intriguing properties of neural networks
- Explaining and harnessing adversarial examples
- Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images
- DeepFool: a simple and accurate method to fool deep neural networks (code)
- Adversarial Examples in the Physical World
- The Limitations of Deep Learning in Adversarial Settings
- Practical Black-Box Attacks against Machine Learning
- Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
- Towards Evaluating the Robustness of Neural Networks (code)
- Towards Deep Learning Models Resistant to Adversarial Attacks (code)
- Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples (code)
- Ensemble Adversarial Training: Attacks and Defenses (code)
- One Pixel Attack for Fooling Deep Neural Networks
- Visualizing and Understanding Convolutional Networks
- Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
- Striving for Simplicity: The All Convolutional Net
- Methods for interpreting and understanding deep neural networks (code)
- “Why Should I Trust You?” Explaining the Predictions of Any Classifier (code)
- Learning Deep Features for Discriminative Localization (code)
- Understanding Deep Learning Requires Rethinking Generalization
- Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization (code)
- A Unified Approach to Interpreting Model Predictions (code)
- Learning Important Features Through Propagating Activation Differences (code)
- Axiomatic Attribution for Deep Networks (code)
- On Calibration of Modern Neural Networks (code)
- Understanding the role of individual units in a deep neural network (code)
- Do Vision Transformers See Like Convolutional Neural Networks?
- How transferable are features in deep neural networks? (code)
- DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition (code)
- CNN Features off-the-shelf: an Astounding Baseline for Recognition
- Return of the Devil in the Details: Delving Deep into Convolutional Nets (code)
- Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks (code)
- Fully Convolutional Networks for Semantic Segmentation (code)
- Learning Deconvolution Network for Semantic Segmentation (code)
- U-Net: Convolutional Networks for Biomedical Image Segmentation (code)
- DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs (code)
- Conditional Random Fields as Recurrent Neural Networks (code)
- Multi-scale Context Aggregation by Dilated Convolutions (code)
- SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
- Pyramid Scene Parsing Network (code)
- Rethinking Atrous Convolution for Semantic Image Segmentation
- What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?
- RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation (code)
- Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation (code)
- Dual Attention Network for Scene Segmentation (code)
- Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers (code) (code)
- Learning a Deep Convolutional Network for Image Super-Resolution (code)
- Perceptual Losses for Real-Time Style Transfer and Super-Resolution
- Image Style Transfer Using Convolutional Neural Networks (code)
- Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization (code)
- Accurate Image Super-Resolution Using Very Deep Convolutional Networks (code)
- Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network
- Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising (code)
- Enhanced Deep Residual Networks for Single Image Super-Resolution (code)
- Deep Image Prior (code)
- Residual Dense Network for Image Super-Resolution (code)
- Image Super-Resolution Using Very Deep Residual Channel Attention Networks (code)
- The Unreasonable Effectiveness of Deep Features as a Perceptual Metric (code)
- Colorful Image Colorization (code)
- DeepPose: Human Pose Estimation via Deep Neural Networks
- Convolutional Pose Machines (code)
- Stacked Hourglass Networks for Human Pose Estimation (code)
- Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields (code)
- Deep High-Resolution Representation Learning for Human Pose Estimation (code)
- FlowNet: Learning Optical Flow with Convolutional Networks
- FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks (code)
- PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume (code)
- Depth Map Prediction from a Single Image using a Multi-Scale Deep Network
- Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture (code)
- Unsupervised Monocular Depth Estimation with Left-Right Consistency (code)
- Unsupervised Learning of Depth and Ego-Motion from Video (code)
- Robust Consistent Video Depth Estimation (code)
- A Survey on Performance Metrics for Object-Detection Algorithms (code)
- Rich feature hierarchies for accurate object detection and semantic segmentation (code)
- Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
- Fast R-CNN (code)
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (code)
- R-FCN: Object Detection via Region-based Fully Convolutional Networks (code)
- Feature Pyramid Networks for Object Detection
- Deformable Convolutional Networks (code)
- Mask R-CNN (code)
- Cascade R-CNN: Delving into High Quality Object Detection (code)
- OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks (code)
- You Only Look Once: Unified, Real-Time Object Detection (code)
- SSD: Single Shot MultiBox Detector (code)
- YOLO9000: Better, Faster, Stronger (code)
- Focal Loss for Dense Object Detection (code)
- Speed/Accuracy Trade-Offs For Modern Convolutional Object Detectors
- YOLOv3: An Incremental Improvement (code)
- CornerNet: Detecting Objects as Paired Keypoints (code)
- FCOS: Fully Convolutional One-Stage Object Detection (code)
- Objects as Points (code)
- EfficientDet: Scalable and Efficient Object Detection (code)
- YOLOv4: Optimal Speed and Accuracy of Object Detection (code)
- End-to-End Object Detection with Transformers (code)
- Deformable DETR: Deformable Transformers for End-to-End Object Detection (code)
- DeepFace: Closing the Gap to Human-Level Performance in Face Verification
- FaceNet: A Unified Embedding for Face Recognition and Clustering
- Deep Face Recognition
- Deep Learning Face Attributes in the Wild
- Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks (code)
- A Discriminative Feature Learning Approach for Deep Face Recognition
- In Defense of the Triplet Loss for Person Re-Identification (code)
- SphereFace: Deep Hypersphere Embedding for Face Recognition (code)
- ArcFace: Additive Angular Margin Loss for Deep Face Recognition (code)
- 3D Convolutional Neural Networks for Human Action Recognition
- Large-scale Video Classification with Convolutional Neural Networks (code)
- Two-Stream Convolutional Networks for Action Recognition in Videos
- Learning Spatiotemporal Features with 3D Convolutional Networks (code)
- Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors (code)
- Temporal Segment Networks: Towards Good Practices for Deep Action Recognition (code)
- Convolutional Two-Stream Network Fusion for Video Action Recognition (code)
- Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset (code)
- A Closer Look at Spatiotemporal Convolutions for Action Recognition (code)
- Non-local Neural Networks (code)
- Group Normalization (code)
- SlowFast Networks for Video Recognition (code)
- Learning Multi-Domain Convolutional Neural Networks for Visual Tracking (code)
- Fully-Convolutional Siamese Networks for Object Tracking (code)
- V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation (code)
- PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation (code)
- PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space (code)
- Dynamic Graph CNN for Learning on Point Clouds (code)
- Point Transformer
- VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection
- NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (code)
- Linguistic Regularities in Continuous Space Word Representations
- Distributed Representations of Words and Phrases and their Compositionality
- Efficient Estimation of Word Representations in Vector Space (code)
- GloVe: Global Vectors for Word Representation (code)
- Enriching Word Vectors with Subword Information (code)
- Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank (code)
- Convolutional Neural Networks for Sentence Classification (code)
- Distributed Representations of Sentences and Documents
- Effective Use of Word Order for Text Categorization with Convolutional Neural Networks (code)
- A Convolutional Neural Network for Modelling Sentences
- A Sensitivity Analysis Of (And Practitioners' Guide To) Convolutional Neural Networks For Sentence Classification
- Character-level Convolutional Networks for Text Classification (code)
- Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks (code)
- Bag Of Tricks For Efficient Text Classification (code)
- Hierarchical Attention Networks for Document Classification
- Bidirectional LSTM-CRF Models for Sequence Tagging
- Neural Architectures For Named Entity Recognition (code) (code)
- End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF
- Universal Language Model Fine-tuning for Text Classification (code)
- Neural Machine Translation by Jointly Learning to Align and Translate
- Sequence to Sequence Learning with Neural Networks
- Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
- On the Properties of Neural Machine Translation: Encoder–Decoder Approaches
- Effective Approaches to Attention-based Neural Machine Translation (code)
- Neural Machine Translation Of Rare Words With Subword Units (code)
- Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
- Convolutional Sequence to Sequence Learning (code)
- Attention Is All You Need (code)
- SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing (code)
- Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates
- Reformer: The Efficient Transformer (code)
- Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention (code)
- Rethinking Attention with Performers (code)
- Deep contextualized word representations (code)
- An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling (code)
- Improving Language Understanding by Generative Pre-Training (code)
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (code)
- Language Models are Unsupervised Multitask Learners (code)
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations (code)
- RoBERTa: A Robustly Optimized BERT Pretraining Approach (code)
- DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (code) (code)
- Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (code)
- XLNet: Generalized Autoregressive Pretraining for Language Understanding (code)
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (code)
- Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks (code)
- Cross-lingual Language Model Pretraining (code)
- Unsupervised Cross-lingual Representation Learning at Scale (code)
- SpanBERT: Improving Pre-training by Representing and Predicting Spans (code)
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
- Longformer: The Long-Document Transformer (code)
- Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks (code)
- Language Models are Few-Shot Learners (code)
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (code)
- SimCSE: Simple Contrastive Learning of Sentence Embeddings (code)
- Pay Attention to MLPs
- Evaluating Large Language Models Trained on Code (code)
- The Curious Case of Neural Text Degeneration (code)
- Long-term Recurrent Convolutional Networks for Visual Recognition and Description
- Show and Tell: A Neural Image Caption Generator
- Deep Visual-Semantic Alignments for Generating Image Descriptions
- Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (code)
- Layer Normalization
- Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering (code)
- Generative Adversarial Text to Image Synthesis (code)
- StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks (code)
- Learning Transferable Visual Models From Natural Language Supervision (code)
- Zero-Shot Text-to-Image Generation (code)
- Perceiver IO: A General Architecture for Structured Inputs & Outputs (code)
- Auto-Encoding Variational Bayes
- Stochastic Backpropagation and Approximate Inference in Deep Generative Models
- beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework
- Categorical Reparameterization with Gumbel-Softmax
- Generative Adversarial Nets (code)
- Unsupervised representation learning with deep convolutional generative adversarial networks (code)
- Improved Techniques for Training GANs (code)
- InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets (code)
- Least Squares Generative Adversarial Networks (code)
- Wasserstein GAN (code)
- Improved Training of Wasserstein GANs (code)
- Progressive growing of GANs for improved quality, stability, and variation (code)
- GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium (code)
- Spectral Normalization for Generative Adversarial Networks (code)
- Large Scale GAN Training for High Fidelity Natural Image Synthesis (code)
- A Style-Based Generator Architecture for Generative Adversarial Networks (code)
- Self-Attention Generative Adversarial Networks (code)
- Analyzing and Improving the Image Quality of StyleGAN (code)
- Conditional Generative Adversarial Nets
- Context Encoders: Feature Learning by Inpainting (code)
- Conditional Image Synthesis with Auxiliary Classifier GANs
- Image-to-Image Translation with Conditional Adversarial Networks (code)
- Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (code)
- Unsupervised Image-to-Image Translation Networks (code)
- Multimodal Unsupervised Image-to-Image Translation (code)
- Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network
- ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks (code)
- High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs (code)
- StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation (code)
- Semantic Image Synthesis with Spatially-Adaptive Normalization (code)
- Denoising Diffusion Probabilistic Models (code)
- Diffusion Models Beat GANs on Image Synthesis (code)
- Score-Based Generative Modeling through Stochastic Differential Equations (code)
- Learning Transferable Features with Deep Adaptation Networks (code)
- Domain-Adversarial Training of Neural Networks (code)
- Adversarial Discriminative Domain Adaptation
- Unsupervised Pixel–Level Domain Adaptation with Generative Adversarial Networks
- CyCADA: Cycle-Consistent Adversarial Domain Adaptation (code)
- Matching Networks for One Shot Learning
- Prototypical Networks for Few-shot Learning (code)
- Learning to Compare: Relation Network for Few-Shot Learning
- Communication-Efficient Learning of Deep Networks from Decentralized Data
- Federated Learning: Strategies for Improving Communication Efficiency
- How To Backdoor Federated Learning (code)
- Deep Learning with Differential Privacy
- Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning (code) (code)
- Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results (code)
- MixMatch: A Holistic Approach to Semi-Supervised Learning (code)
- Self-training with Noisy Student improves ImageNet classification (code)
- FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence (code)
- Training data-efficient image transformers & distillation through attention (code) (code)
- Deep Clustering for Unsupervised Learning of Visual Features (code)
- Data-Efficient Image Recognition with Contrastive Predictive Coding
- Contrastive Multiview Coding (code)
- Momentum Contrast for Unsupervised Visual Representation Learning (code)
- Self-Supervised Learning of Pretext-Invariant Representations (code)
- A Simple Framework for Contrastive Learning of Visual Representations (code)
- Supervised Contrastive Learning (code) (code)
- Big Self-Supervised Models are Strong Semi-Supervised Learners (code) (data)
- Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning (code)
- Unsupervised Learning of Visual Features by Contrasting Cluster Assignments (code)
- Exploring Simple Siamese Representation Learning (code)
- Emerging Properties in Self-Supervised Vision Transformers (code)
- BEIT: BERT Pre-Training of Image Transformers (code)
- VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning (code)
- DiT: Self-supervised Pre-training for Document Image Transformer (code) (code)
- Unsupervised Semantic Segmentation By Distilling Feature Correspondences (code)
- Mel-Spectrogram and Mel-Frequency Cepstral Coefficients (MFCCs)
- Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks
- Speech Recognition with Deep Recurrent Neural Networks
- Towards End-to-End Speech Recognition with Recurrent Neural Networks
- Deep Speech: Scaling up end-to-end speech recognition
- LSTM: A Search Space Odyssey
- Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin
- X-vectors: Robust DNN Embeddings for Speaker Recognition (code)
- SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
- Jasper: An End-to-End Convolutional Neural Acoustic Model (code)
- Generating Sequences With Recurrent Neural Networks
- Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling (code)
- WaveNet: A Generative Model for Raw Audio
- HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis (code) (code) (code) (code) (code) (code)
- Representation Learning with Contrastive Predictive Coding
- wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (code)
- HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units (code) (code)
- data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language (code)
- Generative Spoken Dialogue Language Modeling (code) (code) (code)
- Playing Atari with Deep Reinforcement Learning
- Human-level Control through Deep Reinforcement Learning
- Deep Reinforcement Learning with Double Q-Learning
- Prioritized Experience Replay
- Dueling Network Architectures for Deep Reinforcement Learning
- Rainbow: Combining Improvements in Deep Reinforcement Learning
- Mastering the game of Go with deep neural networks and tree search
- Mastering the game of Go without human knowledge
- A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
- Grandmaster level in StarCraft II using multi-agent reinforcement learning (code)
- Continuous Control with Deep Reinforcement Learning
- Trust Region Policy Optimization (code)
- Conjugate Gradient Method
- Asynchronous Methods for Deep Reinforcement Learning
- High-Dimensional Continuous Control Using Generalized Advantage Estimation
- Proximal Policy Optimization Algorithms
- Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor (code)
- End to End Learning for Self-Driving Cars
- End-To-End Training Of Deep Visuomotor Policies
- Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection
- Learning Dexterous In-Hand Manipulation
- Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning (code)
- Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
- Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (code) (code)
- Overcoming catastrophic forgetting in neural networks
- Translating Embeddings for Modeling Multi-relational Data (code)
- DeepWalk: Online Learning of Social Representations (code)
- LINE: Large-scale Information Network Embedding (code)
- node2vec: Scalable Feature Learning for Networks (code)
- Semi-Supervised Classification with Graph Convolutional Networks (code)
- Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (code)
- Inductive Representation Learning on Large Graphs (code)
- Graph Attention Networks (code)
- How Powerful Are Graph Neural Networks? (code)
- Modeling Relational Data with Graph Convolutional Networks (code)
- Session-based Recommendations with Recurrent Neural Networks (code)
- AutoRec: Autoencoders Meet Collaborative Filtering
- Wide & Deep Learning for Recommender Systems
- Neural Collaborative Filtering (code)
- Neural Factorization Machines for Sparse Predictive Analytics (code)
- DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
- Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks (code)
- Variational Autoencoders for Collaborative Filtering (code)
- Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding (code)
- Deep Learning Recommendation Model for Personalization and Recommendation Systems (code)