Skip to content

Latest commit

 

History

History
721 lines (709 loc) · 125 KB

Papers-2021.md

File metadata and controls

721 lines (709 loc) · 125 KB

December 2021

  • Optimal Representations for Covariate Shift - [Arxiv] [QA]
  • Optimal Representations for Covariate Shift - [Arxiv] [QA]
  • An overview of the quantitative causality analysis and causal graph reconstruction based on a rigorous formalism of information flow - [Arxiv] [QA]
  • TransLog: A Unified Transformer-based Framework for Log Anomaly Detection - [Arxiv] [QA]
  • ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation - [Arxiv] [QA]
  • On the Role of Neural Collapse in Transfer Learning - [Arxiv] [QA]
  • On the Role of Neural Collapse in Transfer Learning - [Arxiv] [QA]
  • Self Reward Design with Fine-grained Interpretability - [Arxiv] [QA]
  • Self Reward Design with Fine-grained Interpretability - [Arxiv] [QA]
  • Generative Kernel Continual learning - [Arxiv] [QA]
  • Generative Kernel Continual learning - [Arxiv] [QA]
  • Generative Kernel Continual learning - [Arxiv] [QA]
  • 3D Skeleton-based Few-shot Action Recognition with JEANIE is not so Naïve - [Arxiv] [QA]
  • Revisiting Transformation Invariant Geometric Deep Learning: Are Initial Representations All You Need? - [Arxiv] [QA]
  • ML4CO: Is GCNN All You Need? Graph Convolutional Neural Networks Produce Strong Baselines For Combinatorial Optimization Problems, If Tuned and Trained Properly, on Appropriate Data - [Arxiv] [QA]
  • Cost Aggregation Is All You Need for Few-Shot Segmentation - [Arxiv] [QA]
  • Generalized Few-Shot Semantic Segmentation: All You Need is Fine-Tuning - [Arxiv] [QA]
  • High-Resolution Image Synthesis with Latent Diffusion Models - [Arxiv] [QA]
  • Are Large-scale Datasets Necessary for Self-Supervised Pre-training? - [Arxiv] [QA]
  • Transformers Can Do Bayesian Inference - [Arxiv] [QA]
  • Transformers Can Do Bayesian Inference - [Arxiv] [QA]
  • Transformers Can Do Bayesian Inference - [Arxiv] [QA]
  • Soundify: Matching Sound Effects to Video - [Arxiv] [QA]
  • Align and Prompt: Video-and-Language Pre-training with Entity Prompts - [Arxiv] [QA]
  • WebGPT: Browser-assisted question-answering with human feedback - [Arxiv] [QA]
  • Automated Deep Learning: Neural Architecture Search Is Not the End - [Arxiv] [QA]
  • All You Need is RAW: Defending Against Adversarial Attacks with Camera Image Pipelines - [Arxiv] [QA]
  • Masked Feature Prediction for Self-Supervised Visual Pre-Training - [Arxiv] [QA]
  • Unsupervised Dense Information Retrieval with Contrastive Learning - [Arxiv] [QA]
  • NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics - [Arxiv] [QA]
  • Reframing Human-AI Collaboration for Generating Free-Text Explanations - [Arxiv] [QA]
  • Learning to Prompt for Continual Learning - [Arxiv] [QA]
  • Learning to Prompt for Continual Learning - [Arxiv] [QA]
  • QAHOI: Query-Based Anchors for Human-Object Interaction Detection - [Arxiv] [QA]
  • Learning To Retrieve Prompts for In-Context Learning - [Arxiv] [QA]
  • Call for Customized Conversation: Customized Conversation Grounding Persona and Knowledge - [Arxiv] [QA]
  • Rethinking Nearest Neighbors for Visual Classification - [Arxiv] [QA]
  • Improving Conversational Recommendation Systems' Quality with Context-Aware Item Meta Information - [Arxiv] [QA]
  • Structure-Aware Image Segmentation with Homotopy Warping - [Arxiv] [QA]
  • Massive-scale Decoding for Text Generation using Lattices - [Arxiv] [QA]
  • Improving Human-Object Interaction Detection via Phrase Learning and Label Composition - [Arxiv] [QA]
  • MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation - [Arxiv] [QA]
  • Real-Time Neural Voice Camouflage - [Arxiv] [QA]
  • Real-Time Neural Voice Camouflage - [Arxiv] [QA]
  • GLaM: Efficient Scaling of Language Models with Mixture-of-Experts - [Arxiv] [QA]
  • VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks - [Arxiv] [QA]
  • Step-unrolled Denoising Autoencoders for Text Generation - [Arxiv] [QA]
  • Step-unrolled Denoising Autoencoders for Text Generation - [Arxiv] [QA]
  • CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability - [Arxiv] [QA]
  • The Overlooked Classifier in Human-Object Interaction Recognition - [Arxiv] [QA]
  • Critical configurations for three projective views - [Arxiv] [QA]
  • Self-Supervised Bot Play for Conversational Recommendation with Justifications - [Arxiv] [QA]
  • On Convergence of Federated Averaging Langevin Dynamics - [Arxiv] [QA]
  • On Convergence of Federated Averaging Langevin Dynamics - [Arxiv] [QA]
  • Critical configurations for two projective views, a new approach - [Arxiv] [QA]
  • Scaling Language Models: Methods, Analysis & Insights from Training Gopher - [Arxiv] [QA]
  • Prompting Visual-Language Models for Efficient Video Understanding - [Arxiv] [QA]
  • Improving language models by retrieving from trillions of tokens - [Arxiv] [QA]
  • Pareto Domain Adaptation - [Arxiv] [QA]
  • Pareto Domain Adaptation - [Arxiv] [QA]
  • DeepFace-EMD: Re-ranking Using Patch-wise Earth Mover's Distance Improves Out-Of-Distribution Face Identification - [Arxiv] [QA]
  • Universalizing Weak Supervision - [Arxiv] [QA]
  • Universalizing Weak Supervision - [Arxiv] [QA]
  • Unsupervised Learning of Compositional Scene Representations from Multiple Unspecified Viewpoints - [Arxiv] [QA]
  • Genetic Algorithm for Constrained Molecular Inverse Design - [Arxiv] [QA]
  • Genetic Algorithm for Constrained Molecular Inverse Design - [Arxiv] [QA]
  • Variational Wasserstein gradient flow - [Arxiv] [QA]
  • Variational Wasserstein gradient flow - [Arxiv] [QA]
  • YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone - [Arxiv] [QA]
  • Linear algebra with transformers - [Arxiv] [QA]
  • Linear algebra with transformers - [Arxiv] [QA]
  • Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer - [Arxiv] [QA]
  • DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting - [Arxiv] [QA]
  • Neural Stochastic Dual Dynamic Programming - [Arxiv] [QA]
  • Neural Stochastic Dual Dynamic Programming - [Arxiv] [QA]
  • A General Language Assistant as a Laboratory for Alignment - [Arxiv] [QA]
  • Routing with Self-Attention for Multimodal Capsule Networks - [Arxiv] [QA]
  • Routing with Self-Attention for Multimodal Capsule Networks - [Arxiv] [QA]
  • Human-Object Interaction Detection via Weak Supervision - [Arxiv] [QA]
  • MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions - [Arxiv] [QA]

November 2021

  • Show Your Work: Scratchpads for Intermediate Computation with Language Models - [Arxiv] [QA]
  • MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning - [Arxiv] [QA]
  • Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective - [Arxiv] [QA]
  • Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective - [Arxiv] [QA]
  • Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling - [Arxiv] [QA]
  • GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with Semi-Supervised Learning and Explicit Policy Injection - [Arxiv] [QA]
  • A category theory framework for Bayesian learning - [Arxiv] [QA]
  • Pre-training Methods in Information Retrieval - [Arxiv] [QA]
  • SQUID: Deep Feature In-Painting for Unsupervised Anomaly Detection - [Arxiv] [QA]
  • SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning - [Arxiv] [QA]
  • Group equivariant neural posterior estimation - [Arxiv] [QA]
  • Group equivariant neural posterior estimation - [Arxiv] [QA]
  • CDNet is all you need: Cascade DCN based underwater object detection RCNN - [Arxiv] [QA]
  • PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers - [Arxiv] [QA]
  • VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling - [Arxiv] [QA]
  • Hierarchical Modular Network for Video Captioning - [Arxiv] [QA]
  • NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion - [Arxiv] [QA]
  • Node-Level Differentially Private Graph Neural Networks - [Arxiv] [QA]
  • Node-Level Differentially Private Graph Neural Networks - [Arxiv] [QA]
  • Subgraph Permutation Equivariant Networks - [Arxiv] [QA]
  • Variance Reduction in Deep Learning: More Momentum is All You Need - [Arxiv] [QA]
  • Deep Point Cloud Reconstruction - [Arxiv] [QA]
  • Deep Point Cloud Reconstruction - [Arxiv] [QA]
  • Lossless Compression with Probabilistic Circuits - [Arxiv] [QA]
  • Lossless Compression with Probabilistic Circuits - [Arxiv] [QA]
  • Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction - [Arxiv] [QA]
  • Plant 'n' Seek: Can You Find the Winning Ticket? - [Arxiv] [QA]
  • Plant 'n' Seek: Can You Find the Winning Ticket? - [Arxiv] [QA]
  • Deep Probability Estimation - [Arxiv] [QA]
  • Deep Probability Estimation - [Arxiv] [QA]
  • Self-Supervised Point Cloud Completion via Inpainting - [Arxiv] [QA]
  • Are Vision Transformers Robust to Patch Perturbations? - [Arxiv] [QA]
  • Are Vision Transformers Robust to Patch Perturbations? - [Arxiv] [QA]
  • Deep Safe Multi-Task Learning - [Arxiv] [QA]
  • Deep Safe Multi-Task Learning - [Arxiv] [QA]
  • FBNetV5: Neural Architecture Search for Multiple Tasks in One Run - [Arxiv] [QA]
  • SimMIM: A Simple Framework for Masked Image Modeling - [Arxiv] [QA]
  • One-Shot Generative Domain Adaptation - [Arxiv] [QA]
  • Perceiving and Modeling Density is All You Need for Image Dehazing - [Arxiv] [QA]
  • Selective Ensembles for Consistent Predictions - [Arxiv] [QA]
  • Selective Ensembles for Consistent Predictions - [Arxiv] [QA]
  • Selective Ensembles for Consistent Predictions - [Arxiv] [QA]
  • iBOT: Image BERT Pre-Training with Online Tokenizer - [Arxiv] [QA]
  • Bolstering Stochastic Gradient Descent with Model Building - [Arxiv] [QA]
  • Bolstering Stochastic Gradient Descent with Model Building - [Arxiv] [QA]
  • Masked Autoencoders Are Scalable Vision Learners - [Arxiv] [QA]
  • Gradients are Not All You Need - [Arxiv] [QA]
  • Sliced Recursive Transformer - [Arxiv] [QA]
  • Sliced Recursive Transformer - [Arxiv] [QA]
  • Realizable Learning is All You Need - [Arxiv] [QA]
  • MT3: Multi-Task Multitrack Music Transcription - [Arxiv] [QA]
  • MT3: Multi-Task Multitrack Music Transcription - [Arxiv] [QA]
  • Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies - [Arxiv] [QA]
  • LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs - [Arxiv] [QA]
  • Can Vision Transformers Perform Convolution? - [Arxiv] [QA]
  • Can Vision Transformers Perform Convolution? - [Arxiv] [QA]
  • Deep neural networks as nested dynamical systems - [Arxiv] [QA]
  • Towards the Generalization of Contrastive Self-Supervised Learning - [Arxiv] [QA]

October 2021

  • Template Filling for Controllable Commonsense Reasoning - [Arxiv] [QA]
  • Hyperparameter Tuning is All You Need for LISTA - [Arxiv] [QA]
  • Improving Fairness via Federated Learning - [Arxiv] [QA]
  • Improving Fairness via Federated Learning - [Arxiv] [QA]
  • The magnitude vector of images - [Arxiv] [QA]
  • The magnitude vector of images - [Arxiv] [QA]
  • Semi-Siamese Bi-encoder Neural Ranking Model Using Lightweight Fine-Tuning - [Arxiv] [QA]
  • Diversity Enhanced Active Learning with Strictly Proper Scoring Rules - [Arxiv] [QA]
  • Training Verifiers to Solve Math Word Problems - [Arxiv] [QA]
  • s2s-ft: Fine-Tuning Pretrained Transformer Encoders for Sequence-to-Sequence Learning - [Arxiv] [QA]
  • The Efficiency Misnomer - [Arxiv] [QA]
  • The Efficiency Misnomer - [Arxiv] [QA]
  • Facilitating Database Tuning with Hyper-Parameter Optimization: A Comprehensive Experimental Evaluation - [Arxiv] [QA]
  • DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021 - [Arxiv] [QA]
  • Double Trouble: How to not explain a text classifier's decisions using counterfactuals synthesized by masked language models? - [Arxiv] [QA]
  • Center Loss Regularization for Continual Learning - [Arxiv] [QA]
  • Center Loss Regularization for Continual Learning - [Arxiv] [QA]
  • Fast Model Editing at Scale - [Arxiv] [QA]
  • Fast Model Editing at Scale - [Arxiv] [QA]
  • SILG: The Multi-environment Symbolic Interactive Language Grounding Benchmark - [Arxiv] [QA]
  • BERMo: What can BERT learn from ELMo? - [Arxiv] [QA]
  • BERMo: What can BERT learn from ELMo? - [Arxiv] [QA]
  • TLDR: Twin Learning for Dimensionality Reduction - [Arxiv] [QA]
  • TLDR: Twin Learning for Dimensionality Reduction - [Arxiv] [QA]
  • Natural Attribute-based Shift Detection - [Arxiv] [QA]
  • Natural Attribute-based Shift Detection - [Arxiv] [QA]
  • Value alignment: a formal approach - [Arxiv] [QA]
  • Illiterate DALL-E Learns to Compose - [Arxiv] [QA]
  • Illiterate DALL-E Learns to Compose - [Arxiv] [QA]
  • Multimodal Dialogue Response Generation - [Arxiv] [QA]
  • Comparing Human and Machine Bias in Face Recognition - [Arxiv] [QA]
  • Comparing Human and Machine Bias in Face Recognition - [Arxiv] [QA]
  • Generated Knowledge Prompting for Commonsense Reasoning - [Arxiv] [QA]
  • Trigger Hunting with a Topological Prior for Trojan Detection - [Arxiv] [QA]
  • On Learning the Transformer Kernel - [Arxiv] [QA]
  • On Learning the Transformer Kernel - [Arxiv] [QA]
  • Multitask Prompted Training Enables Zero-Shot Task Generalization - [Arxiv] [QA]
  • Guided Point Contrastive Learning for Semi-supervised Point Cloud Semantic Segmentation - [Arxiv] [QA]
  • Few-Shot Bot: Prompt-Based Learning for Dialogue Systems - [Arxiv] [QA]
  • Jurassic is (almost) All You Need: Few-Shot Meaning-to-Text Generation for Open-Domain Dialogue - [Arxiv] [QA]
  • On-Policy Model Errors in Reinforcement Learning - [Arxiv] [QA]
  • On-Policy Model Errors in Reinforcement Learning - [Arxiv] [QA]
  • ContraQA: Question Answering under Contradicting Contexts - [Arxiv] [QA]
  • Attacking Open-domain Question Answering by Injecting Misinformation - [Arxiv] [QA]
  • RecInDial: A Unified Framework for Conversational Recommendation with Pretrained Language Models - [Arxiv] [QA]
  • RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking - [Arxiv] [QA]
  • CLIP4Caption: CLIP for Video Caption - [Arxiv] [QA]
  • Parallel Deep Neural Networks Have Zero Duality Gap - [Arxiv] [QA]
  • Parallel Deep Neural Networks Have Zero Duality Gap - [Arxiv] [QA]
  • Causal discovery from conditionally stationary time-series - [Arxiv] [QA]
  • Causal discovery from conditionally stationary time-series - [Arxiv] [QA]
  • Molecular Graph Generation via Geometric Scattering - [Arxiv] [QA]
  • Molecular Graph Generation via Geometric Scattering - [Arxiv] [QA]
  • Open-Set Recognition: a Good Closed-Set Classifier is All You Need? - [Arxiv] [QA]
  • Efficient Neural Ranking using Forward Indexes - [Arxiv] [QA]
  • DiscoDVT: Generating Long Text with Discourse-Aware Discrete Variational Transformer - [Arxiv] [QA]
  • Relative Molecule Self-Attention Transformer - [Arxiv] [QA]
  • Relative Molecule Self-Attention Transformer - [Arxiv] [QA]
  • Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval - [Arxiv] [QA]
  • Certified Patch Robustness via Smoothed Vision Transformers - [Arxiv] [QA]
  • Certified Patch Robustness via Smoothed Vision Transformers - [Arxiv] [QA]
  • Global Vision Transformer Pruning with Hessian-Aware Saliency - [Arxiv] [QA]
  • Long Expressive Memory for Sequence Modeling - [Arxiv] [QA]
  • Long Expressive Memory for Sequence Modeling - [Arxiv] [QA]
  • Vector-quantized Image Modeling with Improved VQGAN - [Arxiv] [QA]
  • Multi-Agent MDP Homomorphic Networks - [Arxiv] [QA]
  • Multi-Agent MDP Homomorphic Networks - [Arxiv] [QA]
  • Neural Link Prediction with Walk Pooling - [Arxiv] [QA]
  • Neural Link Prediction with Walk Pooling - [Arxiv] [QA]
  • FRL: Federated Rank Learning - [Arxiv] [QA]
  • FRL: Federated Rank Learning - [Arxiv] [QA]
  • On the Limitations of Multimodal VAEs - [Arxiv] [QA]
  • On the Limitations of Multimodal VAEs - [Arxiv] [QA]
  • Token Pooling in Vision Transformers - [Arxiv] [QA]
  • Token Pooling in Vision Transformers - [Arxiv] [QA]
  • Token Pooling in Vision Transformers - [Arxiv] [QA]
  • FOCUS: Familiar Objects in Common and Uncommon Settings - [Arxiv] [QA]
  • FOCUS: Familiar Objects in Common and Uncommon Settings - [Arxiv] [QA]
  • Hyperparameter Tuning with Renyi Differential Privacy - [Arxiv] [QA]
  • Hyperparameter Tuning with Renyi Differential Privacy - [Arxiv] [QA]
  • Hyperparameter Tuning with Renyi Differential Privacy - [Arxiv] [QA]
  • Adversarial Retriever-Ranker for dense text retrieval - [Arxiv] [QA]
  • Adversarial Retriever-Ranker for dense text retrieval - [Arxiv] [QA]
  • RAR: Region-Aware Point Cloud Registration - [Arxiv] [QA]
  • Cartoon Explanations of Image Classifiers - [Arxiv] [QA]
  • Cartoon Explanations of Image Classifiers - [Arxiv] [QA]
  • Situated Dialogue Learning through Procedural Environment Generation - [Arxiv] [QA]
  • On the Optimal Memorization Power of ReLU Neural Networks - [Arxiv] [QA]
  • On the Optimal Memorization Power of ReLU Neural Networks - [Arxiv] [QA]
  • Attention is All You Need? Good Embeddings with Statistics are enough:Large Scale Audio Understanding without Transformers/ Convolutions/ BERTs/ Mixers/ Attention/ RNNs or .... - [Arxiv] [QA]
  • Generative Modeling with Optimal Transport Maps - [Arxiv] [QA]
  • Generative Modeling with Optimal Transport Maps - [Arxiv] [QA]
  • Federated Learning via Plurality Vote - [Arxiv] [QA]
  • Federated Learning via Plurality Vote - [Arxiv] [QA]
  • Nested Policy Reinforcement Learning - [Arxiv] [QA]
  • Nested Policy Reinforcement Learning - [Arxiv] [QA]
  • How BPE Affects Memorization in Transformers - [Arxiv] [QA]
  • How BPE Affects Memorization in Transformers - [Arxiv] [QA]
  • On The Transferability of Deep-Q Networks - [Arxiv] [QA]
  • On The Transferability of Deep-Q Networks - [Arxiv] [QA]
  • Test-time Batch Statistics Calibration for Covariate Shift - [Arxiv] [QA]
  • Test-time Batch Statistics Calibration for Covariate Shift - [Arxiv] [QA]
  • Geometric Algebra Attention Networks for Small Point Clouds - [Arxiv] [QA]
  • Geometric Algebra Attention Networks for Small Point Clouds - [Arxiv] [QA]
  • EntQA: Entity Linking as Question Answering - [Arxiv] [QA]
  • EntQA: Entity Linking as Question Answering - [Arxiv] [QA]
  • Autoregressive Diffusion Models - [Arxiv] [QA]
  • Autoregressive Diffusion Models - [Arxiv] [QA]
  • AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts - [Arxiv] [QA]
  • Generalized Kernel Thinning - [Arxiv] [QA]
  • Generalized Kernel Thinning - [Arxiv] [QA]
  • Solving even-parity problems using traceless genetic programming - [Arxiv] [QA]
  • One Timestep is All You Need: Training Spiking Neural Networks with Ultra Low Latency - [Arxiv] [QA]
  • Batch size-invariance for policy optimization - [Arxiv] [QA]
  • Batch size-invariance for policy optimization - [Arxiv] [QA]
  • Vision-Only Robot Navigation in a Neural Radiance World - [Arxiv] [QA]

September 2021

  • Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System - [Arxiv] [QA]
  • Stochastic Training is Not Necessary for Generalization - [Arxiv] [QA]
  • Stochastic Training is Not Necessary for Generalization - [Arxiv] [QA]
  • IGLU: Efficient GCN Training via Lazy Updates - [Arxiv] [QA]
  • IGLU: Efficient GCN Training via Lazy Updates - [Arxiv] [QA]
  • Unsolved Problems in ML Safety - [Arxiv] [QA]
  • OpenViDial 2.0: A Larger-Scale, Open-Domain Dialogue Generation Dataset with Visual Contexts - [Arxiv] [QA]
  • Learning Neural Templates for Recommender Dialogue System - [Arxiv] [QA]
  • CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models - [Arxiv] [QA]
  • A Survey on Cost Types, Interaction Schemes, and Annotator Performance Models in Selection Algorithms for Active Learning in Classification - [Arxiv] [QA]
  • Recursively Summarizing Books with Human Feedback - [Arxiv] [QA]
  • Scalable and Efficient MoE Training for Multitask Multilingual Models - [Arxiv] [QA]
  • SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval - [Arxiv] [QA]
  • Neural networks with trainable matrix activation functions - [Arxiv] [QA]
  • Neural networks with trainable matrix activation functions - [Arxiv] [QA]
  • PLATO-XL: Exploring the Large-scale Pre-training of Dialogue Generation - [Arxiv] [QA]
  • DuRecDial 2.0: A Bilingual Parallel Corpus for Conversational Recommendation - [Arxiv] [QA]
  • Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes - [Arxiv] [QA]
  • Primer: Searching for Efficient Transformers for Language Modeling - [Arxiv] [QA]
  • Is Curiosity All You Need? On the Utility of Emergent Behaviours from Curious Exploration - [Arxiv] [QA]
  • Torch.manual_seed(3407) is all you need: On the influence of random seeds in deep learning architectures for computer vision - [Arxiv] [QA]
  • Scaling Laws for Neural Machine Translation - [Arxiv] [QA]
  • Scaling Laws for Neural Machine Translation - [Arxiv] [QA]
  • Scaling Laws for Neural Machine Translation - [Arxiv] [QA]
  • Transferable Persona-Grounded Dialogues via Grounded Minimal Edits - [Arxiv] [QA]
  • Attention Is Indeed All You Need: Semantically Attention-Guided Decoding for Data-to-Text NLG - [Arxiv] [QA]
  • Benchmarking the Spectrum of Agent Capabilities - [Arxiv] [QA]
  • Benchmarking the Spectrum of Agent Capabilities - [Arxiv] [QA]
  • Benchmarking the Spectrum of Agent Capabilities - [Arxiv] [QA]
  • Continuous Homeostatic Reinforcement Learning for Self-Regulated Autonomous Agents - [Arxiv] [QA]
  • Exploring Prompt-based Few-shot Learning for Grounded Dialog Generation - [Arxiv] [QA]
  • Space Time Recurrent Memory Network - [Arxiv] [QA]
  • Space Time Recurrent Memory Network - [Arxiv] [QA]
  • Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation - [Arxiv] [QA]
  • WeakSTIL: Weak whole-slide image level stromal tumor infiltrating lymphocyte scores are all you need - [Arxiv] [QA]
  • CEM: Commonsense-aware Empathetic Response Generation - [Arxiv] [QA]
  • Bootstrapped Meta-Learning - [Arxiv] [QA]
  • Bootstrapped Meta-Learning - [Arxiv] [QA]
  • A Three-Stage Learning Framework for Low-Resource Knowledge-Grounded Dialogue Generation - [Arxiv] [QA]
  • Thinking Clearly, Talking Fast: Concept-Guided Non-Autoregressive Generation for Open-Domain Dialogue Systems - [Arxiv] [QA]
  • ACP++: Action Co-occurrence Priors for Human-Object Interaction Detection - [Arxiv] [QA]
  • Local Augmentation for Graph Neural Networks - [Arxiv] [QA]
  • Local Augmentation for Graph Neural Networks - [Arxiv] [QA]
  • Sqrt(d) Dimension Dependence of Langevin Monte Carlo - [Arxiv] [QA]
  • Sqrt(d) Dimension Dependence of Langevin Monte Carlo - [Arxiv] [QA]
  • Rethinking Data Augmentation for Low-Resource Neural Machine Translation: A Multi-Task Learning Approach - [Arxiv] [QA]
  • Mask is All You Need: Rethinking Mask R-CNN for Dense and Arbitrary-Shaped Scene Text Detection - [Arxiv] [QA]
  • Learning Neural Causal Models with Active Interventions - [Arxiv] [QA]
  • Learning Neural Causal Models with Active Interventions - [Arxiv] [QA]
  • Learning to Prompt for Vision-Language Models - [Arxiv] [QA]
  • Learning to Prompt for Vision-Language Models - [Arxiv] [QA]
  • Searching for Efficient Multi-Stage Vision Transformers - [Arxiv] [QA]
  • Boosting Search Engines with Interactive Agents - [Arxiv] [QA]
  • Boosting Search Engines with Interactive Agents - [Arxiv] [QA]

August 2021

  • Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback - [Arxiv] [QA]
  • Neural HMMs are all you need (for high-quality attention-free TTS) - [Arxiv] [QA]
  • Knowledge Base Completion Meets Transfer Learning - [Arxiv] [QA]
  • Subjective Learning for Open-Ended Data - [Arxiv] [QA]
  • Subjective Learning for Open-Ended Data - [Arxiv] [QA]
  • Photos Are All You Need for Reciprocal Recommendation in Online Dating - [Arxiv] [QA]
  • SimVLM: Simple Visual Language Model Pretraining with Weak Supervision - [Arxiv] [QA]
  • Contrastive Learning of User Behavior Sequence for Context-Aware Document Ranking - [Arxiv] [QA]
  • One TTS Alignment To Rule Them All - [Arxiv] [QA]
  • Anarchic Federated Learning - [Arxiv] [QA]
  • Anarchic Federated Learning - [Arxiv] [QA]
  • Pre-training for Ad-hoc Retrieval: Hyperlink is Also You Need - [Arxiv] [QA]
  • Fastformer: Additive Attention Can Be All You Need - [Arxiv] [QA]
  • Spatio-Temporal Interaction Graph Parsing Networks for Human-Object Interaction Recognition - [Arxiv] [QA]
  • Exploiting Scene Graphs for Human-Object Interaction Detection - [Arxiv] [QA]
  • D3D-HOI: Dynamic 3D Human-Object Interactions from Videos - [Arxiv] [QA]
  • A good body is all you need: avoiding catastrophic interference via agent architecture search - [Arxiv] [QA]
  • On the Opportunities and Risks of Foundation Models - [Arxiv] [QA]
  • MMChat: Multi-Modal Chat Dataset on Social Media - [Arxiv] [QA]
  • FedPara: Low-Rank Hadamard Product for Communication-Efficient Federated Learning - [Arxiv] [QA]
  • The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models - [Arxiv] [QA]
  • PAIR: Leveraging Passage-Centric Similarity Relation for Improving Dense Passage Retrieval - [Arxiv] [QA]
  • Logit Attenuating Weight Normalization - [Arxiv] [QA]
  • Logit Attenuating Weight Normalization - [Arxiv] [QA]
  • Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval - [Arxiv] [QA]
  • Mining the Benefits of Two-stage and One-stage HOI Detection - [Arxiv] [QA]
  • Are Neural Ranking Models Robust? - [Arxiv] [QA]
  • Rethinking Architecture Selection in Differentiable NAS - [Arxiv] [QA]
  • Pose is all you need: The pose only group activity recognition system (POGARS) - [Arxiv] [QA]
  • BIGRoC: Boosting Image Generation via a Robust Classifier - [Arxiv] [QA]
  • BIGRoC: Boosting Image Generation via a Robust Classifier - [Arxiv] [QA]
  • Source-Free Domain Adaptation for Image Segmentation - [Arxiv] [QA]
  • Improving Contrastive Learning by Visualizing Feature Transformation - [Arxiv] [QA]
  • Internal Video Inpainting by Implicit Long-range Propagation - [Arxiv] [QA]
  • Model-Based Opponent Modeling - [Arxiv] [QA]
  • Model-Based Opponent Modeling - [Arxiv] [QA]
  • Offline Decentralized Multi-Agent Reinforcement Learning - [Arxiv] [QA]
  • Offline Decentralized Multi-Agent Reinforcement Learning - [Arxiv] [QA]
  • SphereFace2: Binary Classification is All You Need for Deep Face Recognition - [Arxiv] [QA]
  • How to Evaluate Your Dialogue Models: A Review of Approaches - [Arxiv] [QA]
  • SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations - [Arxiv] [QA]
  • Evaluating Deep Graph Neural Networks - [Arxiv] [QA]
  • Evaluating Deep Graph Neural Networks - [Arxiv] [QA]
  • Jointly Optimizing Query Encoder and Product Quantization to Improve Retrieval Performance - [Arxiv] [QA]
  • GTNet:Guided Transformer Network for Detecting Human-Object Interactions - [Arxiv] [QA]

July 2021

  • Imbalanced Adversarial Training with Reweighting - [Arxiv] [QA]
  • Imbalanced Adversarial Training with Reweighting - [Arxiv] [QA]
  • Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing - [Arxiv] [QA]
  • Functorial String Diagrams for Reverse-Mode Automatic Differentiation - [Arxiv] [QA]
  • Unsupervised Learning of Neurosymbolic Encoders - [Arxiv] [QA]
  • Unsupervised Learning of Neurosymbolic Encoders - [Arxiv] [QA]
  • Is Object Detection Necessary for Human-Object Interaction Recognition? - [Arxiv] [QA]
  • Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation - [Arxiv] [QA]
  • Joint Shapley values: a measure of joint feature importance - [Arxiv] [QA]
  • Joint Shapley values: a measure of joint feature importance - [Arxiv] [QA]
  • Few Shots Are All You Need: A Progressive Few Shot Learning Approach for Low Resource Handwritten Text Recognition - [Arxiv] [QA]
  • Conditional GANs with Auxiliary Discriminative Classifier - [Arxiv] [QA]
  • Conditional GANs with Auxiliary Discriminative Classifier - [Arxiv] [QA]
  • Guided Generation of Cause and Effect - [Arxiv] [QA]
  • QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries - [Arxiv] [QA]
  • Structured Stochastic Gradient MCMC - [Arxiv] [QA]
  • Structured Stochastic Gradient MCMC - [Arxiv] [QA]
  • Is attention to bounding boxes all you need for pedestrian action prediction? - [Arxiv] [QA]
  • DNN is not all you need: Parallelizing Non-Neural ML Algorithms on Ultra-Low-Power IoT Processors - [Arxiv] [QA]
  • Align before Fuse: Vision and Language Representation Learning with Momentum Distillation - [Arxiv] [QA]
  • FastSHAP: Real-Time Shapley Value Estimation - [Arxiv] [QA]
  • FastSHAP: Real-Time Shapley Value Estimation - [Arxiv] [QA]
  • How Much Can CLIP Benefit Vision-and-Language Tasks? - [Arxiv] [QA]
  • How Much Can CLIP Benefit Vision-and-Language Tasks? - [Arxiv] [QA]
  • Per-Pixel Classification is Not All You Need for Semantic Segmentation - [Arxiv] [QA]
  • A Configurable Multilingual Model is All You Need to Recognize All Languages - [Arxiv] [QA]
  • SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking - [Arxiv] [QA]
  • Explore and Control with Adversarial Surprise - [Arxiv] [QA]
  • Explore and Control with Adversarial Surprise - [Arxiv] [QA]
  • ViTGAN: Training GANs with Vision Transformers - [Arxiv] [QA]
  • ViTGAN: Training GANs with Vision Transformers - [Arxiv] [QA]
  • Hoechst Is All You Need: Lymphocyte Classification with Deep Learning - [Arxiv] [QA]
  • Towards Robust Active Feature Acquisition - [Arxiv] [QA]
  • Towards Robust Active Feature Acquisition - [Arxiv] [QA]
  • Evaluating Large Language Models Trained on Code - [Arxiv] [QA]
  • Understanding Intrinsic Robustness Using Label Uncertainty - [Arxiv] [QA]
  • Understanding Intrinsic Robustness Using Label Uncertainty - [Arxiv] [QA]
  • Understanding Intrinsic Robustness Using Label Uncertainty - [Arxiv] [QA]
  • Neural Contextual Bandits without Regret - [Arxiv] [QA]
  • Neural Contextual Bandits without Regret - [Arxiv] [QA]
  • Structured Denoising Diffusion Models in Discrete State-Spaces - [Arxiv] [QA]
  • Depth-supervised NeRF: Fewer Views and Faster Training for Free - [Arxiv] [QA]
  • VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer - [Arxiv] [QA]
  • Attention-based Adversarial Appearance Learning of Augmented Pedestrians - [Arxiv] [QA]
  • Rethinking Positional Encoding - [Arxiv] [QA]
  • Rethinking Positional Encoding - [Arxiv] [QA]
  • When and How to Fool Explainable Models (and Humans) with Adversarial Examples - [Arxiv] [QA]
  • Mutation is all you need - [Arxiv] [QA]
  • Scale Mixtures of Neural Network Gaussian Processes - [Arxiv] [QA]
  • Scale Mixtures of Neural Network Gaussian Processes - [Arxiv] [QA]
  • On the Practicality of Deterministic Epistemic Uncertainty - [Arxiv] [QA]
  • On the Practicality of Deterministic Epistemic Uncertainty - [Arxiv] [QA]

June 2021

  • Automatically Select Emotion for Response via Personality-affected Emotion Transition - [Arxiv] [QA]
  • Local Reweighting for Adversarial Training - [Arxiv] [QA]
  • Local Reweighting for Adversarial Training - [Arxiv] [QA]
  • Open-Set Representation Learning through Combinatorial Embedding - [Arxiv] [QA]
  • Don't Take It Literally: An Edit-Invariant Sequence Loss for Text Generation - [Arxiv] [QA]
  • Multimodal Few-Shot Learning with Frozen Language Models - [Arxiv] [QA]
  • Animatable Neural Radiance Fields from Monocular RGB Videos - [Arxiv] [QA]
  • DCoM: A Deep Column Mapper for Semantic Data Type Detection - [Arxiv] [QA]
  • DCoM: A Deep Column Mapper for Semantic Data Type Detection - [Arxiv] [QA]
  • All You Need is a Second Look: Towards Arbitrary-Shaped Text Detection - [Arxiv] [QA]
  • IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision Transformers - [Arxiv] [QA]
  • Learning Multimodal VAEs through Mutual Supervision - [Arxiv] [QA]
  • Learning Multimodal VAEs through Mutual Supervision - [Arxiv] [QA]
  • Learning Multimodal VAEs through Mutual Supervision - [Arxiv] [QA]
  • Sampling with Mirrored Stein Operators - [Arxiv] [QA]
  • Sampling with Mirrored Stein Operators - [Arxiv] [QA]
  • Adapting Off-the-Shelf Source Segmenter for Target Medical Image Segmentation - [Arxiv] [QA]
  • CharacterChat: Supporting the Creation of Fictional Characters through Conversation and Progressive Manifestation with a Chatbot - [Arxiv] [QA]
  • Secure Domain Adaptation with Multiple Sources - [Arxiv] [QA]
  • Secure Domain Adaptation with Multiple Sources - [Arxiv] [QA]
  • Volume Rendering of Neural Implicit Surfaces - [Arxiv] [QA]
  • Policy Smoothing for Provably Robust Reinforcement Learning - [Arxiv] [QA]
  • Policy Smoothing for Provably Robust Reinforcement Learning - [Arxiv] [QA]
  • Policy Smoothing for Provably Robust Reinforcement Learning - [Arxiv] [QA]
  • Towards Long-Form Video Understanding - [Arxiv] [QA]
  • Boundary Graph Neural Networks for 3D Simulations - [Arxiv] [QA]
  • Boundary Graph Neural Networks for 3D Simulations - [Arxiv] [QA]
  • Pseudo-Relevance Feedback for Multiple Representation Dense Retrieval - [Arxiv] [QA]
  • CLIP2Video: Mastering Video-Text Retrieval via Image CLIP - [Arxiv] [QA]
  • Analytically Tractable Bayesian Deep Q-Learning - [Arxiv] [QA]
  • Analytically Tractable Bayesian Deep Q-Learning - [Arxiv] [QA]
  • Multiplying Matrices Without Multiplying - [Arxiv] [QA]
  • NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction - [Arxiv] [QA]
  • Shuffle Private Stochastic Convex Optimization - [Arxiv] [QA]
  • Shuffle Private Stochastic Convex Optimization - [Arxiv] [QA]
  • On Invariance Penalties for Risk Minimization - [Arxiv] [QA]
  • On Invariance Penalties for Risk Minimization - [Arxiv] [QA]
  • Visual Correspondence Hallucination - [Arxiv] [QA]
  • Visual Correspondence Hallucination - [Arxiv] [QA]
  • Poisoning and Backdooring Contrastive Learning - [Arxiv] [QA]
  • Poisoning and Backdooring Contrastive Learning - [Arxiv] [QA]
  • Robust Model-based Face Reconstruction through Weakly-Supervised Outlier Segmentation - [Arxiv] [QA]
  • Revisiting the Weaknesses of Reinforcement Learning for Neural Machine Translation - [Arxiv] [QA]
  • Transductive Few-Shot Learning: Clustering is All You Need? - [Arxiv] [QA]
  • Unsupervised Enrichment of Persona-grounded Dialog with Background Stories - [Arxiv] [QA]
  • BEiT: BERT Pre-Training of Image Transformers - [Arxiv] [QA]
  • Query Embedding on Hyper-relational Knowledge Graphs - [Arxiv] [QA]
  • Query Embedding on Hyper-relational Knowledge Graphs - [Arxiv] [QA]
  • UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation - [Arxiv] [QA]
  • HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units - [Arxiv] [QA]
  • Constraining Linear-chain CRFs to Regular Languages - [Arxiv] [QA]
  • Constraining Linear-chain CRFs to Regular Languages - [Arxiv] [QA]
  • Pre-Trained Models: Past, Present and Future - [Arxiv] [QA]
  • Category Theory in Machine Learning - [Arxiv] [QA]
  • Inverting Adversarially Robust Networks for Image Synthesis - [Arxiv] [QA]
  • Prompting Contrastive Explanations for Commonsense Reasoning Tasks - [Arxiv] [QA]
  • Learning to Pool in Graph Neural Networks for Extrapolation - [Arxiv] [QA]
  • Learning to Pool in Graph Neural Networks for Extrapolation - [Arxiv] [QA]
  • Is Homophily a Necessity for Graph Neural Networks? - [Arxiv] [QA]
  • Is Homophily a Necessity for Graph Neural Networks? - [Arxiv] [QA]
  • Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language Generation - [Arxiv] [QA]
  • Fair Normalizing Flows - [Arxiv] [QA]
  • Fair Normalizing Flows - [Arxiv] [QA]
  • MST: Masked Self-Supervised Transformer for Visual Representation - [Arxiv] [QA]
  • A Neural Tangent Kernel Perspective of GANs - [Arxiv] [QA]
  • A Neural Tangent Kernel Perspective of GANs - [Arxiv] [QA]
  • Knowledge distillation: A good teacher is patient and consistent - [Arxiv] [QA]
  • Do Transformers Really Perform Bad for Graph Representation? - [Arxiv] [QA]
  • DIGRAC: Digraph Clustering Based on Flow Imbalance - [Arxiv] [QA]
  • DIGRAC: Digraph Clustering Based on Flow Imbalance - [Arxiv] [QA]
  • Pretrained Encoders are All You Need - [Arxiv] [QA]
  • It Takes Two to Tango: Mixup for Deep Metric Learning - [Arxiv] [QA]
  • It Takes Two to Tango: Mixup for Deep Metric Learning - [Arxiv] [QA]
  • Taxonomy of Machine Learning Safety: A Survey and Primer - [Arxiv] [QA]
  • Mean-Shifted Contrastive Loss for Anomaly Detection - [Arxiv] [QA]
  • Mean-Shifted Contrastive Loss for Anomaly Detection - [Arxiv] [QA]
  • RegMix: Data Mixing Augmentation for Regression - [Arxiv] [QA]
  • RegMix: Data Mixing Augmentation for Regression - [Arxiv] [QA]
  • Tabular Data: Deep Learning is Not All You Need - [Arxiv] [QA]
  • Self-Supervision is All You Need for Solving Rubik's Cube - [Arxiv] [QA]
  • Model Zoo: A Growing "Brain" That Learns Continually - [Arxiv] [QA]
  • Model Zoo: A Growing "Brain" That Learns Continually - [Arxiv] [QA]
  • Context-Aware Sparse Deep Coordination Graphs - [Arxiv] [QA]
  • Context-Aware Sparse Deep Coordination Graphs - [Arxiv] [QA]
  • Learning Curves for SGD on Structured Features - [Arxiv] [QA]
  • Learning Curves for SGD on Structured Features - [Arxiv] [QA]
  • Meta-Learning with Fewer Tasks through Task Interpolation - [Arxiv] [QA]
  • Meta-Learning with Fewer Tasks through Task Interpolation - [Arxiv] [QA]
  • Churn Reduction via Distillation - [Arxiv] [QA]
  • Churn Reduction via Distillation - [Arxiv] [QA]
  • MERLOT: Multimodal Neural Script Knowledge Models - [Arxiv] [QA]
  • Conversations Are Not Flat: Modeling the Dynamic Information Flow across Dialogue Utterances - [Arxiv] [QA]
  • Three Sentences Are All You Need: Local Path Enhanced Document Relation Extraction - [Arxiv] [QA]
  • Convergent Graph Solvers - [Arxiv] [QA]
  • Convergent Graph Solvers - [Arxiv] [QA]
  • Convergent Graph Solvers - [Arxiv] [QA]
  • Self-Guided Contrastive Learning for BERT Sentence Representations - [Arxiv] [QA]
  • Steerable 3D Spherical Neurons - [Arxiv] [QA]
  • Steerable 3D Spherical Neurons - [Arxiv] [QA]
  • Evidential Turing Processes - [Arxiv] [QA]
  • Evidential Turing Processes - [Arxiv] [QA]
  • Towards Emotional Support Dialog Systems - [Arxiv] [QA]
  • Multiresolution Equivariant Graph Variational Autoencoder - [Arxiv] [QA]
  • Multiresolution Equivariant Graph Variational Autoencoder - [Arxiv] [QA]
  • RevCore: Review-augmented Conversational Recommendation - [Arxiv] [QA]
  • DialoGraph: Incorporating Interpretable Strategy-Graph Networks into Negotiation Dialogues - [Arxiv] [QA]
  • Efficient Passage Retrieval with Hashing for Open-domain Question Answering - [Arxiv] [QA]
  • Weighting vectors for machine learning: numerical harmonic analysis applied to boundary detection - [Arxiv] [QA]
  • DYPLOC: Dynamic Planning of Content Using Mixed Language Models for Text Generation - [Arxiv] [QA]
  • Towards Quantifiable Dialogue Coherence Evaluation - [Arxiv] [QA]
  • Concurrent Adversarial Learning for Large-Batch Training - [Arxiv] [QA]
  • Concurrent Adversarial Learning for Large-Batch Training - [Arxiv] [QA]

May 2021

  • Efficient and Modular Implicit Differentiation - [Arxiv] [QA]
  • Efficient and Modular Implicit Differentiation - [Arxiv] [QA]
  • Memory-Efficient Differentiable Transformer Architecture Search - [Arxiv] [QA]
  • How Attentive are Graph Attention Networks? - [Arxiv] [QA]
  • How Attentive are Graph Attention Networks? - [Arxiv] [QA]
  • An Attention Free Transformer - [Arxiv] [QA]
  • An Attention Free Transformer - [Arxiv] [QA]
  • Gotta Go Fast When Generating Data with Score-Based Models - [Arxiv] [QA]
  • Gotta Go Fast When Generating Data with Score-Based Models - [Arxiv] [QA]
  • Gotta Go Fast When Generating Data with Score-Based Models - [Arxiv] [QA]
  • Simple steps are all you need: Frank-Wolfe and generalized self-concordant functions - [Arxiv] [QA]
  • OTTers: One-turn Topic Transitions for Open-Domain Dialogue - [Arxiv] [QA]
  • Data Augmentation for Text Generation Without Any Augmented Data - [Arxiv] [QA]
  • ST-HOI: A Spatial-Temporal Baseline for Human-Object Interaction Detection in Videos - [Arxiv] [QA]
  • Pre-trained Language Model based Ranking in Baidu Search - [Arxiv] [QA]
  • Unsupervised Speech Recognition - [Arxiv] [QA]
  • Revisiting the Negative Data of Distantly Supervised Relation Extraction - [Arxiv] [QA]
  • DEHB: Evolutionary Hyperband for Scalable, Robust and Efficient Hyperparameter Optimization - [Arxiv] [QA]
  • Intra-Document Cascading: Learning to Select Passages for Neural Document Ranking - [Arxiv] [QA]
  • Unified Conversational Recommendation Policy Learning via Graph-based Reinforcement Learning - [Arxiv] [QA]
  • Value Function is All You Need: A Unified Learning Framework for Ride Hailing Platforms - [Arxiv] [QA]
  • KECRS: Towards Knowledge-Enriched Conversational Recommendation System - [Arxiv] [QA]
  • Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting - [Arxiv] [QA]
  • ItôTTS and ItôWave: Linear Stochastic Differential Equation Is All You Need For Audio Generation - [Arxiv] [QA]
  • An End-to-End Framework for Molecular Conformation Generation via Bilevel Programming - [Arxiv] [QA]
  • RetGen: A Joint framework for Retrieval and Grounded Text Generation Modeling - [Arxiv] [QA]
  • HyKnow: End-to-End Task-Oriented Dialog Modeling with Hybrid Knowledge Management - [Arxiv] [QA]
  • Looking at CTR Prediction Again: Is Attention All You Need? - [Arxiv] [QA]
  • The DEVIL is in the Details: A Diagnostic Evaluation Benchmark for Video Inpainting - [Arxiv] [QA]
  • Diffusion Models Beat GANs on Image Synthesis - [Arxiv] [QA]
  • VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning - [Arxiv] [QA]
  • EL-Attention: Memory Efficient Lossless Attention for Generation - [Arxiv] [QA]
  • Not All Relevance Scores are Equal: Efficient Uncertainty and Calibration Modeling for Deep Retrieval Models - [Arxiv] [QA]
  • Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey - [Arxiv] [QA]
  • Joint Learning of Deep Retrieval Model and Product Quantization based Embedding Index - [Arxiv] [QA]
  • Distribution Matching for Heterogeneous Multi-Task Learning: a Large-scale Face Study - [Arxiv] [QA]
  • Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems - [Arxiv] [QA]
  • Improving Document Representations by Generating Pseudo Query Embeddings for Dense Retrieval - [Arxiv] [QA]
  • Human Object Interaction Detection using Two-Direction Spatial Enhancement and Exclusive Object Prior - [Arxiv] [QA]
  • A Survey of Data Augmentation Approaches for NLP - [Arxiv] [QA]
  • Rethinking Search: Making Domain Experts out of Dilettantes - [Arxiv] [QA]
  • PD-GAN: Probabilistic Diverse GAN for Image Inpainting - [Arxiv] [QA]
  • Visual Relationship Detection Using Part-and-Sum Transformers with Composite Queries - [Arxiv] [QA]
  • Cuboids Revisited: Learning Robust 3D Shape Fitting to Single RGB Images - [Arxiv] [QA]
  • Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation - [Arxiv] [QA]

April 2021

  • RR-Net: Injecting Interactive Semantics in Human-Object Interaction Detection - [Arxiv] [QA]
  • Emerging Properties in Self-Supervised Vision Transformers - [Arxiv] [QA]
  • Open-vocabulary Object Detection via Vision and Language Knowledge Distillation - [Arxiv] [QA]
  • HOTR: End-to-End Human-Object Interaction Detection with Transformers - [Arxiv] [QA]
  • If your data distribution shifts, use self-learning - [Arxiv] [QA]
  • If your data distribution shifts, use self-learning - [Arxiv] [QA]
  • Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos - [Arxiv] [QA]
  • Easy and Efficient Transformer : Scalable Inference Solution For large NLP model - [Arxiv] [QA]
  • GPT2MVS: Generative Pre-trained Transformer-2 for Multi-modal Video Summarization - [Arxiv] [QA]
  • PanGu-$α$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation - [Arxiv] [QA]
  • Learning Passage Impacts for Inverted Indexes - [Arxiv] [QA]
  • VideoGPT: Video Generation using VQ-VAE and Transformers - [Arxiv] [QA]
  • UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction - [Arxiv] [QA]
  • Gradient Matching for Domain Generalization - [Arxiv] [QA]
  • Gradient Matching for Domain Generalization - [Arxiv] [QA]
  • B-PROP: Bootstrapped Pre-training with Representative Words Prediction for Ad-hoc Retrieval - [Arxiv] [QA]
  • Image Inpainting with External-internal Learning and Monochromic Bottleneck - [Arxiv] [QA]
  • Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation - [Arxiv] [QA]
  • The Power of Scale for Parameter-Efficient Prompt Tuning - [Arxiv] [QA]
  • Explaining Answers with Entailment Trees - [Arxiv] [QA]
  • Condenser: a Pre-training Architecture for Dense Retrieval - [Arxiv] [QA]
  • $Q^{2}$: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering - [Arxiv] [QA]
  • Optimizing Dense Retrieval Model Training with Hard Negatives - [Arxiv] [QA]
  • Matching-oriented Product Quantization For Ad-hoc Retrieval - [Arxiv] [QA]
  • Ultra-High Dimensional Sparse Representations with Binarization for Efficient Text Retrieval - [Arxiv] [QA]
  • COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List - [Arxiv] [QA]
  • Sparse Attention with Linear Units - [Arxiv] [QA]
  • Sparse Attention with Linear Units - [Arxiv] [QA]
  • Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling - [Arxiv] [QA]
  • Is Disentanglement all you need? Comparing Concept-based & Disentanglement Approaches - [Arxiv] [QA]
  • Learning How to Ask: Querying LMs with Mixtures of Soft Prompts - [Arxiv] [QA]
  • All you need are a few pixels: semantic segmentation with PixelPick - [Arxiv] [QA]
  • Spatiotemporal Entropy Model is All You Need for Learned Video Compression - [Arxiv] [QA]
  • Glance and Gaze: Inferring Action-aware Points for One-Stage Human-Object Interaction Detection - [Arxiv] [QA]
  • Not All Attention Is All You Need - [Arxiv] [QA]
  • Progressive Temporal Feature Alignment Network for Video Inpainting - [Arxiv] [QA]
  • Simple Imputation Rules for Prediction with Missing Data: Contrasting Theoretical Guarantees with Empirical Performance - [Arxiv] [QA]
  • Affordance Transfer Learning for Human-Object Interaction Detection - [Arxiv] [QA]
  • Learning to Estimate Hidden Motions with Global Motion Aggregation - [Arxiv] [QA]
  • SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model - [Arxiv] [QA]
  • Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training - [Arxiv] [QA]
  • Visual Semantic Role Labeling for Video Understanding - [Arxiv] [QA]
  • Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval - [Arxiv] [QA]
  • NeRF-VAE: A Geometry Aware 3D Scene Generative Model - [Arxiv] [QA]
  • Improved Image Generation via Sparse Modeling - [Arxiv] [QA]
  • Improved Image Generation via Sparse Modeling - [Arxiv] [QA]
  • Exploiting Relationship for Complex-scene Image Generation - [Arxiv] [QA]
  • Jigsaw Clustering for Unsupervised Visual Representation Learning - [Arxiv] [QA]
  • Domain Invariant Adversarial Learning - [Arxiv] [QA]
  • Domain Invariant Adversarial Learning - [Arxiv] [QA]

March 2021

  • CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields - [Arxiv] [QA]
  • Contrastive Embedding for Generalized Zero-Shot Learning - [Arxiv] [QA]
  • AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning - [Arxiv] [QA]
  • TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations - [Arxiv] [QA]
  • Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers - [Arxiv] [QA]
  • GNeRF: GAN-based Neural Radiance Field without Posed Camera - [Arxiv] [QA]
  • Adaptive Surface Normal Constraint for Depth Estimation - [Arxiv] [QA]
  • Efficient Explanations from Empirical Explainers - [Arxiv] [QA]
  • Categorical Representation Learning: Morphism is All You Need - [Arxiv] [QA]
  • More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval - [Arxiv] [QA]
  • KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs - [Arxiv] [QA]
  • DNN Quantization with Attention - [Arxiv] [QA]
  • DNN Quantization with Attention - [Arxiv] [QA]
  • FastMoE: A Fast Mixture-of-Expert Training System - [Arxiv] [QA]
  • Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges - [Arxiv] [QA]
  • API2Com: On the Improvement of Automatically Generated Code Comments Using API Documentations - [Arxiv] [QA]
  • Concentric Spherical GNN for 3D Representation Learning - [Arxiv] [QA]
  • Concentric Spherical GNN for 3D Representation Learning - [Arxiv] [QA]
  • FastNeRF: High-Fidelity Neural Rendering at 200FPS - [Arxiv] [QA]
  • GLM: General Language Model Pretraining with Autoregressive Blank Infilling - [Arxiv] [QA]
  • Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE - [Arxiv] [QA]
  • Topology-Aware Segmentation Using Discrete Morse Theory - [Arxiv] [QA]
  • A Robust Tube-Based Smooth-MPC for Robot Manipulator Planning - [Arxiv] [QA]
  • ENCONTER: Entity Constrained Progressive Sequence Generation via Insertion-based Transformer - [Arxiv] [QA]
  • Detecting Human-Object Interaction via Fabricated Compositional Learning - [Arxiv] [QA]
  • BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation - [Arxiv] [QA]
  • Reformulating HOI Detection as Adaptive Set Prediction - [Arxiv] [QA]
  • Partial Differential Equations is All You Need for Generating Neural Architectures -- A Theory for Physical Artificial Intelligence Systems - [Arxiv] [QA]
  • QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information - [Arxiv] [QA]
  • GAN Vocoder: Multi-Resolution Discriminator Is All You Need - [Arxiv] [QA]
  • Semantic Models for the First-stage Retrieval: A Comprehensive Review - [Arxiv] [QA]
  • End-to-End Human Object Interaction Detection with HOI Transformer - [Arxiv] [QA]
  • Can Pretext-Based Self-Supervised Learning Be Boosted by Downstream Data? A Theoretical Analysis - [Arxiv] [QA]
  • Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth - [Arxiv] [QA]
  • Barlow Twins: Self-Supervised Learning via Redundancy Reduction - [Arxiv] [QA]
  • Online Adversarial Attacks - [Arxiv] [QA]
  • Online Adversarial Attacks - [Arxiv] [QA]
  • Mixture of Volumetric Primitives for Efficient Neural Rendering - [Arxiv] [QA]
  • Categorical Foundations of Gradient-Based Learning - [Arxiv] [QA]
  • Learners' Languages - [Arxiv] [QA]
  • Automated Machine Learning on Graphs: A Survey - [Arxiv] [QA]

February 2021

  • Learning Transferable Visual Models From Natural Language Supervision - [Arxiv] [QA]
  • Node Proximity Is All You Need: Unified Structural and Positional Node and Graph Embedding - [Arxiv] [QA]
  • Do Input Gradients Highlight Discriminative Features? - [Arxiv] [QA]
  • Teach Me to Explain: A Review of Datasets for Explainable Natural Language Processing - [Arxiv] [QA]
  • On Relative Pose Recovery for Multi-Camera Systems - [Arxiv] [QA]
  • Meta-Learned Attribute Self-Gating for Continual Generalized Zero-Shot Learning - [Arxiv] [QA]
  • Deep ReLU Networks Preserve Expected Length - [Arxiv] [QA]
  • Deep ReLU Networks Preserve Expected Length - [Arxiv] [QA]
  • Meta-Learning Dynamics Forecasting Using Task Inference - [Arxiv] [QA]
  • Meta-Learning Dynamics Forecasting Using Task Inference - [Arxiv] [QA]
  • Essentials for Class Incremental Learning - [Arxiv] [QA]
  • Less is More: Pre-train a Strong Text Encoder for Dense Retrieval Using a Weak Decoder - [Arxiv] [QA]
  • ShaRF: Shape-conditioned Radiance Fields from a Single View - [Arxiv] [QA]
  • DEUP: Direct Epistemic Uncertainty Prediction - [Arxiv] [QA]
  • DEUP: Direct Epistemic Uncertainty Prediction - [Arxiv] [QA]
  • All You Need is DAG - [Arxiv] [QA]
  • Topological Graph Neural Networks - [Arxiv] [QA]
  • Topological Graph Neural Networks - [Arxiv] [QA]
  • Is Space-Time Attention All You Need for Video Understanding? - [Arxiv] [QA]
  • Contrastive Embeddings for Neural Architectures - [Arxiv] [QA]
  • Contrastive Embeddings for Neural Architectures - [Arxiv] [QA]
  • Hyperspherical embedding for novel class classification - [Arxiv] [QA]
  • Hyperspherical embedding for novel class classification - [Arxiv] [QA]
  • Unifying Vision-and-Language Tasks via Text Generation - [Arxiv] [QA]
  • Causal Sufficiency and Actual Causation - [Arxiv] [QA]
  • Learning Graph Embeddings for Compositional Zero-shot Learning - [Arxiv] [QA]

January 2021

  • VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs - [Arxiv] [QA]
  • Compositional Semantics for Probabilistic Programs with Exact Conditioning - [Arxiv] [QA]
  • RESPER: Computationally Modelling Resisting Strategies in Persuasive Conversations - [Arxiv] [QA]
  • Reverse Derivative Ascent: A Categorical Approach to Learning Boolean Circuits - [Arxiv] [QA]
  • Transferable Interactiveness Knowledge for Human-Object Interaction Detection - [Arxiv] [QA]
  • Advances and Challenges in Conversational Recommender Systems: A Survey - [Arxiv] [QA]
  • A Comprehensive Survey on Hardware-Aware Neural Architecture Search - [Arxiv] [QA]
  • Higher Order Automatic Differentiation of Higher Order Functions - [Arxiv] [QA]
  • The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models - [Arxiv] [QA]
  • Evaluating Disentanglement of Structured Representations - [Arxiv] [QA]
  • Evaluating Disentanglement of Structured Representations - [Arxiv] [QA]
  • Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity - [Arxiv] [QA]
  • Evolving Reinforcement Learning Algorithms - [Arxiv] [QA]
  • Max-Affine Spline Insights Into Deep Network Pruning - [Arxiv] [QA]
  • Max-Affine Spline Insights Into Deep Network Pruning - [Arxiv] [QA]
  • VinVL: Revisiting Visual Representations in Vision-Language Models - [Arxiv] [QA]
  • Prefix-Tuning: Optimizing Continuous Prompts for Generation - [Arxiv] [QA]
  • Multi-task Retrieval for Knowledge-Intensive Tasks - [Arxiv] [QA]