- Optimal Representations for Covariate Shift - [Arxiv] [QA]
- Optimal Representations for Covariate Shift - [Arxiv] [QA]
- An overview of the quantitative causality analysis and causal graph reconstruction based on a rigorous formalism of information flow - [Arxiv] [QA]
- TransLog: A Unified Transformer-based Framework for Log Anomaly Detection - [Arxiv] [QA]
- ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation - [Arxiv] [QA]
- On the Role of Neural Collapse in Transfer Learning - [Arxiv] [QA]
- On the Role of Neural Collapse in Transfer Learning - [Arxiv] [QA]
- Self Reward Design with Fine-grained Interpretability - [Arxiv] [QA]
- Self Reward Design with Fine-grained Interpretability - [Arxiv] [QA]
- Generative Kernel Continual learning - [Arxiv] [QA]
- Generative Kernel Continual learning - [Arxiv] [QA]
- Generative Kernel Continual learning - [Arxiv] [QA]
- 3D Skeleton-based Few-shot Action Recognition with JEANIE is not so Naïve - [Arxiv] [QA]
- Revisiting Transformation Invariant Geometric Deep Learning: Are Initial Representations All You Need? - [Arxiv] [QA]
- ML4CO: Is GCNN All You Need? Graph Convolutional Neural Networks Produce Strong Baselines For Combinatorial Optimization Problems, If Tuned and Trained Properly, on Appropriate Data - [Arxiv] [QA]
- Cost Aggregation Is All You Need for Few-Shot Segmentation - [Arxiv] [QA]
- Generalized Few-Shot Semantic Segmentation: All You Need is Fine-Tuning - [Arxiv] [QA]
- High-Resolution Image Synthesis with Latent Diffusion Models - [Arxiv] [QA]
- Are Large-scale Datasets Necessary for Self-Supervised Pre-training? - [Arxiv] [QA]
- Transformers Can Do Bayesian Inference - [Arxiv] [QA]
- Transformers Can Do Bayesian Inference - [Arxiv] [QA]
- Transformers Can Do Bayesian Inference - [Arxiv] [QA]
- Soundify: Matching Sound Effects to Video - [Arxiv] [QA]
- Align and Prompt: Video-and-Language Pre-training with Entity Prompts - [Arxiv] [QA]
- WebGPT: Browser-assisted question-answering with human feedback - [Arxiv] [QA]
- Automated Deep Learning: Neural Architecture Search Is Not the End - [Arxiv] [QA]
- All You Need is RAW: Defending Against Adversarial Attacks with Camera Image Pipelines - [Arxiv] [QA]
- Masked Feature Prediction for Self-Supervised Visual Pre-Training - [Arxiv] [QA]
- Unsupervised Dense Information Retrieval with Contrastive Learning - [Arxiv] [QA]
- NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics - [Arxiv] [QA]
- Reframing Human-AI Collaboration for Generating Free-Text Explanations - [Arxiv] [QA]
- Learning to Prompt for Continual Learning - [Arxiv] [QA]
- Learning to Prompt for Continual Learning - [Arxiv] [QA]
- QAHOI: Query-Based Anchors for Human-Object Interaction Detection - [Arxiv] [QA]
- Learning To Retrieve Prompts for In-Context Learning - [Arxiv] [QA]
- Call for Customized Conversation: Customized Conversation Grounding Persona and Knowledge - [Arxiv] [QA]
- Rethinking Nearest Neighbors for Visual Classification - [Arxiv] [QA]
- Improving Conversational Recommendation Systems' Quality with Context-Aware Item Meta Information - [Arxiv] [QA]
- Structure-Aware Image Segmentation with Homotopy Warping - [Arxiv] [QA]
- Massive-scale Decoding for Text Generation using Lattices - [Arxiv] [QA]
- Improving Human-Object Interaction Detection via Phrase Learning and Label Composition - [Arxiv] [QA]
- MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation - [Arxiv] [QA]
- Real-Time Neural Voice Camouflage - [Arxiv] [QA]
- Real-Time Neural Voice Camouflage - [Arxiv] [QA]
- GLaM: Efficient Scaling of Language Models with Mixture-of-Experts - [Arxiv] [QA]
- VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks - [Arxiv] [QA]
- Step-unrolled Denoising Autoencoders for Text Generation - [Arxiv] [QA]
- Step-unrolled Denoising Autoencoders for Text Generation - [Arxiv] [QA]
- CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability - [Arxiv] [QA]
- The Overlooked Classifier in Human-Object Interaction Recognition - [Arxiv] [QA]
- Critical configurations for three projective views - [Arxiv] [QA]
- Self-Supervised Bot Play for Conversational Recommendation with Justifications - [Arxiv] [QA]
- On Convergence of Federated Averaging Langevin Dynamics - [Arxiv] [QA]
- On Convergence of Federated Averaging Langevin Dynamics - [Arxiv] [QA]
- Critical configurations for two projective views, a new approach - [Arxiv] [QA]
- Scaling Language Models: Methods, Analysis & Insights from Training Gopher - [Arxiv] [QA]
- Prompting Visual-Language Models for Efficient Video Understanding - [Arxiv] [QA]
- Improving language models by retrieving from trillions of tokens - [Arxiv] [QA]
- Pareto Domain Adaptation - [Arxiv] [QA]
- Pareto Domain Adaptation - [Arxiv] [QA]
- DeepFace-EMD: Re-ranking Using Patch-wise Earth Mover's Distance Improves Out-Of-Distribution Face Identification - [Arxiv] [QA]
- Universalizing Weak Supervision - [Arxiv] [QA]
- Universalizing Weak Supervision - [Arxiv] [QA]
- Unsupervised Learning of Compositional Scene Representations from Multiple Unspecified Viewpoints - [Arxiv] [QA]
- Genetic Algorithm for Constrained Molecular Inverse Design - [Arxiv] [QA]
- Genetic Algorithm for Constrained Molecular Inverse Design - [Arxiv] [QA]
- Variational Wasserstein gradient flow - [Arxiv] [QA]
- Variational Wasserstein gradient flow - [Arxiv] [QA]
- YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone - [Arxiv] [QA]
- Linear algebra with transformers - [Arxiv] [QA]
- Linear algebra with transformers - [Arxiv] [QA]
- Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer - [Arxiv] [QA]
- DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting - [Arxiv] [QA]
- Neural Stochastic Dual Dynamic Programming - [Arxiv] [QA]
- Neural Stochastic Dual Dynamic Programming - [Arxiv] [QA]
- A General Language Assistant as a Laboratory for Alignment - [Arxiv] [QA]
- Routing with Self-Attention for Multimodal Capsule Networks - [Arxiv] [QA]
- Routing with Self-Attention for Multimodal Capsule Networks - [Arxiv] [QA]
- Human-Object Interaction Detection via Weak Supervision - [Arxiv] [QA]
- MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions - [Arxiv] [QA]
- Show Your Work: Scratchpads for Intermediate Computation with Language Models - [Arxiv] [QA]
- MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning - [Arxiv] [QA]
- Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective - [Arxiv] [QA]
- Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective - [Arxiv] [QA]
- Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling - [Arxiv] [QA]
- GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with Semi-Supervised Learning and Explicit Policy Injection - [Arxiv] [QA]
- A category theory framework for Bayesian learning - [Arxiv] [QA]
- Pre-training Methods in Information Retrieval - [Arxiv] [QA]
- SQUID: Deep Feature In-Painting for Unsupervised Anomaly Detection - [Arxiv] [QA]
- SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning - [Arxiv] [QA]
- Group equivariant neural posterior estimation - [Arxiv] [QA]
- Group equivariant neural posterior estimation - [Arxiv] [QA]
- CDNet is all you need: Cascade DCN based underwater object detection RCNN - [Arxiv] [QA]
- PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers - [Arxiv] [QA]
- VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling - [Arxiv] [QA]
- Hierarchical Modular Network for Video Captioning - [Arxiv] [QA]
- NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion - [Arxiv] [QA]
- Node-Level Differentially Private Graph Neural Networks - [Arxiv] [QA]
- Node-Level Differentially Private Graph Neural Networks - [Arxiv] [QA]
- Subgraph Permutation Equivariant Networks - [Arxiv] [QA]
- Variance Reduction in Deep Learning: More Momentum is All You Need - [Arxiv] [QA]
- Deep Point Cloud Reconstruction - [Arxiv] [QA]
- Deep Point Cloud Reconstruction - [Arxiv] [QA]
- Lossless Compression with Probabilistic Circuits - [Arxiv] [QA]
- Lossless Compression with Probabilistic Circuits - [Arxiv] [QA]
- Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction - [Arxiv] [QA]
- Plant 'n' Seek: Can You Find the Winning Ticket? - [Arxiv] [QA]
- Plant 'n' Seek: Can You Find the Winning Ticket? - [Arxiv] [QA]
- Deep Probability Estimation - [Arxiv] [QA]
- Deep Probability Estimation - [Arxiv] [QA]
- Self-Supervised Point Cloud Completion via Inpainting - [Arxiv] [QA]
- Are Vision Transformers Robust to Patch Perturbations? - [Arxiv] [QA]
- Are Vision Transformers Robust to Patch Perturbations? - [Arxiv] [QA]
- Deep Safe Multi-Task Learning - [Arxiv] [QA]
- Deep Safe Multi-Task Learning - [Arxiv] [QA]
- FBNetV5: Neural Architecture Search for Multiple Tasks in One Run - [Arxiv] [QA]
- SimMIM: A Simple Framework for Masked Image Modeling - [Arxiv] [QA]
- One-Shot Generative Domain Adaptation - [Arxiv] [QA]
- Perceiving and Modeling Density is All You Need for Image Dehazing - [Arxiv] [QA]
- Selective Ensembles for Consistent Predictions - [Arxiv] [QA]
- Selective Ensembles for Consistent Predictions - [Arxiv] [QA]
- Selective Ensembles for Consistent Predictions - [Arxiv] [QA]
- iBOT: Image BERT Pre-Training with Online Tokenizer - [Arxiv] [QA]
- Bolstering Stochastic Gradient Descent with Model Building - [Arxiv] [QA]
- Bolstering Stochastic Gradient Descent with Model Building - [Arxiv] [QA]
- Masked Autoencoders Are Scalable Vision Learners - [Arxiv] [QA]
- Gradients are Not All You Need - [Arxiv] [QA]
- Sliced Recursive Transformer - [Arxiv] [QA]
- Sliced Recursive Transformer - [Arxiv] [QA]
- Realizable Learning is All You Need - [Arxiv] [QA]
- MT3: Multi-Task Multitrack Music Transcription - [Arxiv] [QA]
- MT3: Multi-Task Multitrack Music Transcription - [Arxiv] [QA]
- Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies - [Arxiv] [QA]
- LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs - [Arxiv] [QA]
- Can Vision Transformers Perform Convolution? - [Arxiv] [QA]
- Can Vision Transformers Perform Convolution? - [Arxiv] [QA]
- Deep neural networks as nested dynamical systems - [Arxiv] [QA]
- Towards the Generalization of Contrastive Self-Supervised Learning - [Arxiv] [QA]
- Template Filling for Controllable Commonsense Reasoning - [Arxiv] [QA]
- Hyperparameter Tuning is All You Need for LISTA - [Arxiv] [QA]
- Improving Fairness via Federated Learning - [Arxiv] [QA]
- Improving Fairness via Federated Learning - [Arxiv] [QA]
- The magnitude vector of images - [Arxiv] [QA]
- The magnitude vector of images - [Arxiv] [QA]
- Semi-Siamese Bi-encoder Neural Ranking Model Using Lightweight Fine-Tuning - [Arxiv] [QA]
- Diversity Enhanced Active Learning with Strictly Proper Scoring Rules - [Arxiv] [QA]
- Training Verifiers to Solve Math Word Problems - [Arxiv] [QA]
- s2s-ft: Fine-Tuning Pretrained Transformer Encoders for Sequence-to-Sequence Learning - [Arxiv] [QA]
- The Efficiency Misnomer - [Arxiv] [QA]
- The Efficiency Misnomer - [Arxiv] [QA]
- Facilitating Database Tuning with Hyper-Parameter Optimization: A Comprehensive Experimental Evaluation - [Arxiv] [QA]
- DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021 - [Arxiv] [QA]
- Double Trouble: How to not explain a text classifier's decisions using counterfactuals synthesized by masked language models? - [Arxiv] [QA]
- Center Loss Regularization for Continual Learning - [Arxiv] [QA]
- Center Loss Regularization for Continual Learning - [Arxiv] [QA]
- Fast Model Editing at Scale - [Arxiv] [QA]
- Fast Model Editing at Scale - [Arxiv] [QA]
- SILG: The Multi-environment Symbolic Interactive Language Grounding Benchmark - [Arxiv] [QA]
- BERMo: What can BERT learn from ELMo? - [Arxiv] [QA]
- BERMo: What can BERT learn from ELMo? - [Arxiv] [QA]
- TLDR: Twin Learning for Dimensionality Reduction - [Arxiv] [QA]
- TLDR: Twin Learning for Dimensionality Reduction - [Arxiv] [QA]
- Natural Attribute-based Shift Detection - [Arxiv] [QA]
- Natural Attribute-based Shift Detection - [Arxiv] [QA]
- Value alignment: a formal approach - [Arxiv] [QA]
- Illiterate DALL-E Learns to Compose - [Arxiv] [QA]
- Illiterate DALL-E Learns to Compose - [Arxiv] [QA]
- Multimodal Dialogue Response Generation - [Arxiv] [QA]
- Comparing Human and Machine Bias in Face Recognition - [Arxiv] [QA]
- Comparing Human and Machine Bias in Face Recognition - [Arxiv] [QA]
- Generated Knowledge Prompting for Commonsense Reasoning - [Arxiv] [QA]
- Trigger Hunting with a Topological Prior for Trojan Detection - [Arxiv] [QA]
- On Learning the Transformer Kernel - [Arxiv] [QA]
- On Learning the Transformer Kernel - [Arxiv] [QA]
- Multitask Prompted Training Enables Zero-Shot Task Generalization - [Arxiv] [QA]
- Guided Point Contrastive Learning for Semi-supervised Point Cloud Semantic Segmentation - [Arxiv] [QA]
- Few-Shot Bot: Prompt-Based Learning for Dialogue Systems - [Arxiv] [QA]
- Jurassic is (almost) All You Need: Few-Shot Meaning-to-Text Generation for Open-Domain Dialogue - [Arxiv] [QA]
- On-Policy Model Errors in Reinforcement Learning - [Arxiv] [QA]
- On-Policy Model Errors in Reinforcement Learning - [Arxiv] [QA]
- ContraQA: Question Answering under Contradicting Contexts - [Arxiv] [QA]
- Attacking Open-domain Question Answering by Injecting Misinformation - [Arxiv] [QA]
- RecInDial: A Unified Framework for Conversational Recommendation with Pretrained Language Models - [Arxiv] [QA]
- RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking - [Arxiv] [QA]
- CLIP4Caption: CLIP for Video Caption - [Arxiv] [QA]
- Parallel Deep Neural Networks Have Zero Duality Gap - [Arxiv] [QA]
- Parallel Deep Neural Networks Have Zero Duality Gap - [Arxiv] [QA]
- Causal discovery from conditionally stationary time-series - [Arxiv] [QA]
- Causal discovery from conditionally stationary time-series - [Arxiv] [QA]
- Molecular Graph Generation via Geometric Scattering - [Arxiv] [QA]
- Molecular Graph Generation via Geometric Scattering - [Arxiv] [QA]
- Open-Set Recognition: a Good Closed-Set Classifier is All You Need? - [Arxiv] [QA]
- Efficient Neural Ranking using Forward Indexes - [Arxiv] [QA]
- DiscoDVT: Generating Long Text with Discourse-Aware Discrete Variational Transformer - [Arxiv] [QA]
- Relative Molecule Self-Attention Transformer - [Arxiv] [QA]
- Relative Molecule Self-Attention Transformer - [Arxiv] [QA]
- Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval - [Arxiv] [QA]
- Certified Patch Robustness via Smoothed Vision Transformers - [Arxiv] [QA]
- Certified Patch Robustness via Smoothed Vision Transformers - [Arxiv] [QA]
- Global Vision Transformer Pruning with Hessian-Aware Saliency - [Arxiv] [QA]
- Long Expressive Memory for Sequence Modeling - [Arxiv] [QA]
- Long Expressive Memory for Sequence Modeling - [Arxiv] [QA]
- Vector-quantized Image Modeling with Improved VQGAN - [Arxiv] [QA]
- Multi-Agent MDP Homomorphic Networks - [Arxiv] [QA]
- Multi-Agent MDP Homomorphic Networks - [Arxiv] [QA]
- Neural Link Prediction with Walk Pooling - [Arxiv] [QA]
- Neural Link Prediction with Walk Pooling - [Arxiv] [QA]
- FRL: Federated Rank Learning - [Arxiv] [QA]
- FRL: Federated Rank Learning - [Arxiv] [QA]
- On the Limitations of Multimodal VAEs - [Arxiv] [QA]
- On the Limitations of Multimodal VAEs - [Arxiv] [QA]
- Token Pooling in Vision Transformers - [Arxiv] [QA]
- Token Pooling in Vision Transformers - [Arxiv] [QA]
- Token Pooling in Vision Transformers - [Arxiv] [QA]
- FOCUS: Familiar Objects in Common and Uncommon Settings - [Arxiv] [QA]
- FOCUS: Familiar Objects in Common and Uncommon Settings - [Arxiv] [QA]
- Hyperparameter Tuning with Renyi Differential Privacy - [Arxiv] [QA]
- Hyperparameter Tuning with Renyi Differential Privacy - [Arxiv] [QA]
- Hyperparameter Tuning with Renyi Differential Privacy - [Arxiv] [QA]
- Adversarial Retriever-Ranker for dense text retrieval - [Arxiv] [QA]
- Adversarial Retriever-Ranker for dense text retrieval - [Arxiv] [QA]
- RAR: Region-Aware Point Cloud Registration - [Arxiv] [QA]
- Cartoon Explanations of Image Classifiers - [Arxiv] [QA]
- Cartoon Explanations of Image Classifiers - [Arxiv] [QA]
- Situated Dialogue Learning through Procedural Environment Generation - [Arxiv] [QA]
- On the Optimal Memorization Power of ReLU Neural Networks - [Arxiv] [QA]
- On the Optimal Memorization Power of ReLU Neural Networks - [Arxiv] [QA]
- Attention is All You Need? Good Embeddings with Statistics are enough:Large Scale Audio Understanding without Transformers/ Convolutions/ BERTs/ Mixers/ Attention/ RNNs or .... - [Arxiv] [QA]
- Generative Modeling with Optimal Transport Maps - [Arxiv] [QA]
- Generative Modeling with Optimal Transport Maps - [Arxiv] [QA]
- Federated Learning via Plurality Vote - [Arxiv] [QA]
- Federated Learning via Plurality Vote - [Arxiv] [QA]
- Nested Policy Reinforcement Learning - [Arxiv] [QA]
- Nested Policy Reinforcement Learning - [Arxiv] [QA]
- How BPE Affects Memorization in Transformers - [Arxiv] [QA]
- How BPE Affects Memorization in Transformers - [Arxiv] [QA]
- On The Transferability of Deep-Q Networks - [Arxiv] [QA]
- On The Transferability of Deep-Q Networks - [Arxiv] [QA]
- Test-time Batch Statistics Calibration for Covariate Shift - [Arxiv] [QA]
- Test-time Batch Statistics Calibration for Covariate Shift - [Arxiv] [QA]
- Geometric Algebra Attention Networks for Small Point Clouds - [Arxiv] [QA]
- Geometric Algebra Attention Networks for Small Point Clouds - [Arxiv] [QA]
- EntQA: Entity Linking as Question Answering - [Arxiv] [QA]
- EntQA: Entity Linking as Question Answering - [Arxiv] [QA]
- Autoregressive Diffusion Models - [Arxiv] [QA]
- Autoregressive Diffusion Models - [Arxiv] [QA]
- AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts - [Arxiv] [QA]
- Generalized Kernel Thinning - [Arxiv] [QA]
- Generalized Kernel Thinning - [Arxiv] [QA]
- Solving even-parity problems using traceless genetic programming - [Arxiv] [QA]
- One Timestep is All You Need: Training Spiking Neural Networks with Ultra Low Latency - [Arxiv] [QA]
- Batch size-invariance for policy optimization - [Arxiv] [QA]
- Batch size-invariance for policy optimization - [Arxiv] [QA]
- Vision-Only Robot Navigation in a Neural Radiance World - [Arxiv] [QA]
- Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System - [Arxiv] [QA]
- Stochastic Training is Not Necessary for Generalization - [Arxiv] [QA]
- Stochastic Training is Not Necessary for Generalization - [Arxiv] [QA]
- IGLU: Efficient GCN Training via Lazy Updates - [Arxiv] [QA]
- IGLU: Efficient GCN Training via Lazy Updates - [Arxiv] [QA]
- Unsolved Problems in ML Safety - [Arxiv] [QA]
- OpenViDial 2.0: A Larger-Scale, Open-Domain Dialogue Generation Dataset with Visual Contexts - [Arxiv] [QA]
- Learning Neural Templates for Recommender Dialogue System - [Arxiv] [QA]
- CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models - [Arxiv] [QA]
- A Survey on Cost Types, Interaction Schemes, and Annotator Performance Models in Selection Algorithms for Active Learning in Classification - [Arxiv] [QA]
- Recursively Summarizing Books with Human Feedback - [Arxiv] [QA]
- Scalable and Efficient MoE Training for Multitask Multilingual Models - [Arxiv] [QA]
- SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval - [Arxiv] [QA]
- Neural networks with trainable matrix activation functions - [Arxiv] [QA]
- Neural networks with trainable matrix activation functions - [Arxiv] [QA]
- PLATO-XL: Exploring the Large-scale Pre-training of Dialogue Generation - [Arxiv] [QA]
- DuRecDial 2.0: A Bilingual Parallel Corpus for Conversational Recommendation - [Arxiv] [QA]
- Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes - [Arxiv] [QA]
- Primer: Searching for Efficient Transformers for Language Modeling - [Arxiv] [QA]
- Is Curiosity All You Need? On the Utility of Emergent Behaviours from Curious Exploration - [Arxiv] [QA]
- Torch.manual_seed(3407) is all you need: On the influence of random seeds in deep learning architectures for computer vision - [Arxiv] [QA]
- Scaling Laws for Neural Machine Translation - [Arxiv] [QA]
- Scaling Laws for Neural Machine Translation - [Arxiv] [QA]
- Scaling Laws for Neural Machine Translation - [Arxiv] [QA]
- Transferable Persona-Grounded Dialogues via Grounded Minimal Edits - [Arxiv] [QA]
- Attention Is Indeed All You Need: Semantically Attention-Guided Decoding for Data-to-Text NLG - [Arxiv] [QA]
- Benchmarking the Spectrum of Agent Capabilities - [Arxiv] [QA]
- Benchmarking the Spectrum of Agent Capabilities - [Arxiv] [QA]
- Benchmarking the Spectrum of Agent Capabilities - [Arxiv] [QA]
- Continuous Homeostatic Reinforcement Learning for Self-Regulated Autonomous Agents - [Arxiv] [QA]
- Exploring Prompt-based Few-shot Learning for Grounded Dialog Generation - [Arxiv] [QA]
- Space Time Recurrent Memory Network - [Arxiv] [QA]
- Space Time Recurrent Memory Network - [Arxiv] [QA]
- Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation - [Arxiv] [QA]
- WeakSTIL: Weak whole-slide image level stromal tumor infiltrating lymphocyte scores are all you need - [Arxiv] [QA]
- CEM: Commonsense-aware Empathetic Response Generation - [Arxiv] [QA]
- Bootstrapped Meta-Learning - [Arxiv] [QA]
- Bootstrapped Meta-Learning - [Arxiv] [QA]
- A Three-Stage Learning Framework for Low-Resource Knowledge-Grounded Dialogue Generation - [Arxiv] [QA]
- Thinking Clearly, Talking Fast: Concept-Guided Non-Autoregressive Generation for Open-Domain Dialogue Systems - [Arxiv] [QA]
- ACP++: Action Co-occurrence Priors for Human-Object Interaction Detection - [Arxiv] [QA]
- Local Augmentation for Graph Neural Networks - [Arxiv] [QA]
- Local Augmentation for Graph Neural Networks - [Arxiv] [QA]
- Sqrt(d) Dimension Dependence of Langevin Monte Carlo - [Arxiv] [QA]
- Sqrt(d) Dimension Dependence of Langevin Monte Carlo - [Arxiv] [QA]
- Rethinking Data Augmentation for Low-Resource Neural Machine Translation: A Multi-Task Learning Approach - [Arxiv] [QA]
- Mask is All You Need: Rethinking Mask R-CNN for Dense and Arbitrary-Shaped Scene Text Detection - [Arxiv] [QA]
- Learning Neural Causal Models with Active Interventions - [Arxiv] [QA]
- Learning Neural Causal Models with Active Interventions - [Arxiv] [QA]
- Learning to Prompt for Vision-Language Models - [Arxiv] [QA]
- Learning to Prompt for Vision-Language Models - [Arxiv] [QA]
- Searching for Efficient Multi-Stage Vision Transformers - [Arxiv] [QA]
- Boosting Search Engines with Interactive Agents - [Arxiv] [QA]
- Boosting Search Engines with Interactive Agents - [Arxiv] [QA]
- Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback - [Arxiv] [QA]
- Neural HMMs are all you need (for high-quality attention-free TTS) - [Arxiv] [QA]
- Knowledge Base Completion Meets Transfer Learning - [Arxiv] [QA]
- Subjective Learning for Open-Ended Data - [Arxiv] [QA]
- Subjective Learning for Open-Ended Data - [Arxiv] [QA]
- Photos Are All You Need for Reciprocal Recommendation in Online Dating - [Arxiv] [QA]
- SimVLM: Simple Visual Language Model Pretraining with Weak Supervision - [Arxiv] [QA]
- Contrastive Learning of User Behavior Sequence for Context-Aware Document Ranking - [Arxiv] [QA]
- One TTS Alignment To Rule Them All - [Arxiv] [QA]
- Anarchic Federated Learning - [Arxiv] [QA]
- Anarchic Federated Learning - [Arxiv] [QA]
- Pre-training for Ad-hoc Retrieval: Hyperlink is Also You Need - [Arxiv] [QA]
- Fastformer: Additive Attention Can Be All You Need - [Arxiv] [QA]
- Spatio-Temporal Interaction Graph Parsing Networks for Human-Object Interaction Recognition - [Arxiv] [QA]
- Exploiting Scene Graphs for Human-Object Interaction Detection - [Arxiv] [QA]
- D3D-HOI: Dynamic 3D Human-Object Interactions from Videos - [Arxiv] [QA]
- A good body is all you need: avoiding catastrophic interference via agent architecture search - [Arxiv] [QA]
- On the Opportunities and Risks of Foundation Models - [Arxiv] [QA]
- MMChat: Multi-Modal Chat Dataset on Social Media - [Arxiv] [QA]
- FedPara: Low-Rank Hadamard Product for Communication-Efficient Federated Learning - [Arxiv] [QA]
- The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models - [Arxiv] [QA]
- PAIR: Leveraging Passage-Centric Similarity Relation for Improving Dense Passage Retrieval - [Arxiv] [QA]
- Logit Attenuating Weight Normalization - [Arxiv] [QA]
- Logit Attenuating Weight Normalization - [Arxiv] [QA]
- Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval - [Arxiv] [QA]
- Mining the Benefits of Two-stage and One-stage HOI Detection - [Arxiv] [QA]
- Are Neural Ranking Models Robust? - [Arxiv] [QA]
- Rethinking Architecture Selection in Differentiable NAS - [Arxiv] [QA]
- Pose is all you need: The pose only group activity recognition system (POGARS) - [Arxiv] [QA]
- BIGRoC: Boosting Image Generation via a Robust Classifier - [Arxiv] [QA]
- BIGRoC: Boosting Image Generation via a Robust Classifier - [Arxiv] [QA]
- Source-Free Domain Adaptation for Image Segmentation - [Arxiv] [QA]
- Improving Contrastive Learning by Visualizing Feature Transformation - [Arxiv] [QA]
- Internal Video Inpainting by Implicit Long-range Propagation - [Arxiv] [QA]
- Model-Based Opponent Modeling - [Arxiv] [QA]
- Model-Based Opponent Modeling - [Arxiv] [QA]
- Offline Decentralized Multi-Agent Reinforcement Learning - [Arxiv] [QA]
- Offline Decentralized Multi-Agent Reinforcement Learning - [Arxiv] [QA]
- SphereFace2: Binary Classification is All You Need for Deep Face Recognition - [Arxiv] [QA]
- How to Evaluate Your Dialogue Models: A Review of Approaches - [Arxiv] [QA]
- SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations - [Arxiv] [QA]
- Evaluating Deep Graph Neural Networks - [Arxiv] [QA]
- Evaluating Deep Graph Neural Networks - [Arxiv] [QA]
- Jointly Optimizing Query Encoder and Product Quantization to Improve Retrieval Performance - [Arxiv] [QA]
- GTNet:Guided Transformer Network for Detecting Human-Object Interactions - [Arxiv] [QA]
- Imbalanced Adversarial Training with Reweighting - [Arxiv] [QA]
- Imbalanced Adversarial Training with Reweighting - [Arxiv] [QA]
- Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing - [Arxiv] [QA]
- Functorial String Diagrams for Reverse-Mode Automatic Differentiation - [Arxiv] [QA]
- Unsupervised Learning of Neurosymbolic Encoders - [Arxiv] [QA]
- Unsupervised Learning of Neurosymbolic Encoders - [Arxiv] [QA]
- Is Object Detection Necessary for Human-Object Interaction Recognition? - [Arxiv] [QA]
- Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation - [Arxiv] [QA]
- Joint Shapley values: a measure of joint feature importance - [Arxiv] [QA]
- Joint Shapley values: a measure of joint feature importance - [Arxiv] [QA]
- Few Shots Are All You Need: A Progressive Few Shot Learning Approach for Low Resource Handwritten Text Recognition - [Arxiv] [QA]
- Conditional GANs with Auxiliary Discriminative Classifier - [Arxiv] [QA]
- Conditional GANs with Auxiliary Discriminative Classifier - [Arxiv] [QA]
- Guided Generation of Cause and Effect - [Arxiv] [QA]
- QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries - [Arxiv] [QA]
- Structured Stochastic Gradient MCMC - [Arxiv] [QA]
- Structured Stochastic Gradient MCMC - [Arxiv] [QA]
- Is attention to bounding boxes all you need for pedestrian action prediction? - [Arxiv] [QA]
- DNN is not all you need: Parallelizing Non-Neural ML Algorithms on Ultra-Low-Power IoT Processors - [Arxiv] [QA]
- Align before Fuse: Vision and Language Representation Learning with Momentum Distillation - [Arxiv] [QA]
- FastSHAP: Real-Time Shapley Value Estimation - [Arxiv] [QA]
- FastSHAP: Real-Time Shapley Value Estimation - [Arxiv] [QA]
- How Much Can CLIP Benefit Vision-and-Language Tasks? - [Arxiv] [QA]
- How Much Can CLIP Benefit Vision-and-Language Tasks? - [Arxiv] [QA]
- Per-Pixel Classification is Not All You Need for Semantic Segmentation - [Arxiv] [QA]
- A Configurable Multilingual Model is All You Need to Recognize All Languages - [Arxiv] [QA]
- SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking - [Arxiv] [QA]
- Explore and Control with Adversarial Surprise - [Arxiv] [QA]
- Explore and Control with Adversarial Surprise - [Arxiv] [QA]
- ViTGAN: Training GANs with Vision Transformers - [Arxiv] [QA]
- ViTGAN: Training GANs with Vision Transformers - [Arxiv] [QA]
- Hoechst Is All You Need: Lymphocyte Classification with Deep Learning - [Arxiv] [QA]
- Towards Robust Active Feature Acquisition - [Arxiv] [QA]
- Towards Robust Active Feature Acquisition - [Arxiv] [QA]
- Evaluating Large Language Models Trained on Code - [Arxiv] [QA]
- Understanding Intrinsic Robustness Using Label Uncertainty - [Arxiv] [QA]
- Understanding Intrinsic Robustness Using Label Uncertainty - [Arxiv] [QA]
- Understanding Intrinsic Robustness Using Label Uncertainty - [Arxiv] [QA]
- Neural Contextual Bandits without Regret - [Arxiv] [QA]
- Neural Contextual Bandits without Regret - [Arxiv] [QA]
- Structured Denoising Diffusion Models in Discrete State-Spaces - [Arxiv] [QA]
- Depth-supervised NeRF: Fewer Views and Faster Training for Free - [Arxiv] [QA]
- VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer - [Arxiv] [QA]
- Attention-based Adversarial Appearance Learning of Augmented Pedestrians - [Arxiv] [QA]
- Rethinking Positional Encoding - [Arxiv] [QA]
- Rethinking Positional Encoding - [Arxiv] [QA]
- When and How to Fool Explainable Models (and Humans) with Adversarial Examples - [Arxiv] [QA]
- Mutation is all you need - [Arxiv] [QA]
- Scale Mixtures of Neural Network Gaussian Processes - [Arxiv] [QA]
- Scale Mixtures of Neural Network Gaussian Processes - [Arxiv] [QA]
- On the Practicality of Deterministic Epistemic Uncertainty - [Arxiv] [QA]
- On the Practicality of Deterministic Epistemic Uncertainty - [Arxiv] [QA]
- Automatically Select Emotion for Response via Personality-affected Emotion Transition - [Arxiv] [QA]
- Local Reweighting for Adversarial Training - [Arxiv] [QA]
- Local Reweighting for Adversarial Training - [Arxiv] [QA]
- Open-Set Representation Learning through Combinatorial Embedding - [Arxiv] [QA]
- Don't Take It Literally: An Edit-Invariant Sequence Loss for Text Generation - [Arxiv] [QA]
- Multimodal Few-Shot Learning with Frozen Language Models - [Arxiv] [QA]
- Animatable Neural Radiance Fields from Monocular RGB Videos - [Arxiv] [QA]
- DCoM: A Deep Column Mapper for Semantic Data Type Detection - [Arxiv] [QA]
- DCoM: A Deep Column Mapper for Semantic Data Type Detection - [Arxiv] [QA]
- All You Need is a Second Look: Towards Arbitrary-Shaped Text Detection - [Arxiv] [QA]
- IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision Transformers - [Arxiv] [QA]
- Learning Multimodal VAEs through Mutual Supervision - [Arxiv] [QA]
- Learning Multimodal VAEs through Mutual Supervision - [Arxiv] [QA]
- Learning Multimodal VAEs through Mutual Supervision - [Arxiv] [QA]
- Sampling with Mirrored Stein Operators - [Arxiv] [QA]
- Sampling with Mirrored Stein Operators - [Arxiv] [QA]
- Adapting Off-the-Shelf Source Segmenter for Target Medical Image Segmentation - [Arxiv] [QA]
- CharacterChat: Supporting the Creation of Fictional Characters through Conversation and Progressive Manifestation with a Chatbot - [Arxiv] [QA]
- Secure Domain Adaptation with Multiple Sources - [Arxiv] [QA]
- Secure Domain Adaptation with Multiple Sources - [Arxiv] [QA]
- Volume Rendering of Neural Implicit Surfaces - [Arxiv] [QA]
- Policy Smoothing for Provably Robust Reinforcement Learning - [Arxiv] [QA]
- Policy Smoothing for Provably Robust Reinforcement Learning - [Arxiv] [QA]
- Policy Smoothing for Provably Robust Reinforcement Learning - [Arxiv] [QA]
- Towards Long-Form Video Understanding - [Arxiv] [QA]
- Boundary Graph Neural Networks for 3D Simulations - [Arxiv] [QA]
- Boundary Graph Neural Networks for 3D Simulations - [Arxiv] [QA]
- Pseudo-Relevance Feedback for Multiple Representation Dense Retrieval - [Arxiv] [QA]
- CLIP2Video: Mastering Video-Text Retrieval via Image CLIP - [Arxiv] [QA]
- Analytically Tractable Bayesian Deep Q-Learning - [Arxiv] [QA]
- Analytically Tractable Bayesian Deep Q-Learning - [Arxiv] [QA]
- Multiplying Matrices Without Multiplying - [Arxiv] [QA]
- NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction - [Arxiv] [QA]
- Shuffle Private Stochastic Convex Optimization - [Arxiv] [QA]
- Shuffle Private Stochastic Convex Optimization - [Arxiv] [QA]
- On Invariance Penalties for Risk Minimization - [Arxiv] [QA]
- On Invariance Penalties for Risk Minimization - [Arxiv] [QA]
- Visual Correspondence Hallucination - [Arxiv] [QA]
- Visual Correspondence Hallucination - [Arxiv] [QA]
- Poisoning and Backdooring Contrastive Learning - [Arxiv] [QA]
- Poisoning and Backdooring Contrastive Learning - [Arxiv] [QA]
- Robust Model-based Face Reconstruction through Weakly-Supervised Outlier Segmentation - [Arxiv] [QA]
- Revisiting the Weaknesses of Reinforcement Learning for Neural Machine Translation - [Arxiv] [QA]
- Transductive Few-Shot Learning: Clustering is All You Need? - [Arxiv] [QA]
- Unsupervised Enrichment of Persona-grounded Dialog with Background Stories - [Arxiv] [QA]
- BEiT: BERT Pre-Training of Image Transformers - [Arxiv] [QA]
- Query Embedding on Hyper-relational Knowledge Graphs - [Arxiv] [QA]
- Query Embedding on Hyper-relational Knowledge Graphs - [Arxiv] [QA]
- UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation - [Arxiv] [QA]
- HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units - [Arxiv] [QA]
- Constraining Linear-chain CRFs to Regular Languages - [Arxiv] [QA]
- Constraining Linear-chain CRFs to Regular Languages - [Arxiv] [QA]
- Pre-Trained Models: Past, Present and Future - [Arxiv] [QA]
- Category Theory in Machine Learning - [Arxiv] [QA]
- Inverting Adversarially Robust Networks for Image Synthesis - [Arxiv] [QA]
- Prompting Contrastive Explanations for Commonsense Reasoning Tasks - [Arxiv] [QA]
- Learning to Pool in Graph Neural Networks for Extrapolation - [Arxiv] [QA]
- Learning to Pool in Graph Neural Networks for Extrapolation - [Arxiv] [QA]
- Is Homophily a Necessity for Graph Neural Networks? - [Arxiv] [QA]
- Is Homophily a Necessity for Graph Neural Networks? - [Arxiv] [QA]
- Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language Generation - [Arxiv] [QA]
- Fair Normalizing Flows - [Arxiv] [QA]
- Fair Normalizing Flows - [Arxiv] [QA]
- MST: Masked Self-Supervised Transformer for Visual Representation - [Arxiv] [QA]
- A Neural Tangent Kernel Perspective of GANs - [Arxiv] [QA]
- A Neural Tangent Kernel Perspective of GANs - [Arxiv] [QA]
- Knowledge distillation: A good teacher is patient and consistent - [Arxiv] [QA]
- Do Transformers Really Perform Bad for Graph Representation? - [Arxiv] [QA]
- DIGRAC: Digraph Clustering Based on Flow Imbalance - [Arxiv] [QA]
- DIGRAC: Digraph Clustering Based on Flow Imbalance - [Arxiv] [QA]
- Pretrained Encoders are All You Need - [Arxiv] [QA]
- It Takes Two to Tango: Mixup for Deep Metric Learning - [Arxiv] [QA]
- It Takes Two to Tango: Mixup for Deep Metric Learning - [Arxiv] [QA]
- Taxonomy of Machine Learning Safety: A Survey and Primer - [Arxiv] [QA]
- Mean-Shifted Contrastive Loss for Anomaly Detection - [Arxiv] [QA]
- Mean-Shifted Contrastive Loss for Anomaly Detection - [Arxiv] [QA]
- RegMix: Data Mixing Augmentation for Regression - [Arxiv] [QA]
- RegMix: Data Mixing Augmentation for Regression - [Arxiv] [QA]
- Tabular Data: Deep Learning is Not All You Need - [Arxiv] [QA]
- Self-Supervision is All You Need for Solving Rubik's Cube - [Arxiv] [QA]
- Model Zoo: A Growing "Brain" That Learns Continually - [Arxiv] [QA]
- Model Zoo: A Growing "Brain" That Learns Continually - [Arxiv] [QA]
- Context-Aware Sparse Deep Coordination Graphs - [Arxiv] [QA]
- Context-Aware Sparse Deep Coordination Graphs - [Arxiv] [QA]
- Learning Curves for SGD on Structured Features - [Arxiv] [QA]
- Learning Curves for SGD on Structured Features - [Arxiv] [QA]
- Meta-Learning with Fewer Tasks through Task Interpolation - [Arxiv] [QA]
- Meta-Learning with Fewer Tasks through Task Interpolation - [Arxiv] [QA]
- Churn Reduction via Distillation - [Arxiv] [QA]
- Churn Reduction via Distillation - [Arxiv] [QA]
- MERLOT: Multimodal Neural Script Knowledge Models - [Arxiv] [QA]
- Conversations Are Not Flat: Modeling the Dynamic Information Flow across Dialogue Utterances - [Arxiv] [QA]
- Three Sentences Are All You Need: Local Path Enhanced Document Relation Extraction - [Arxiv] [QA]
- Convergent Graph Solvers - [Arxiv] [QA]
- Convergent Graph Solvers - [Arxiv] [QA]
- Convergent Graph Solvers - [Arxiv] [QA]
- Self-Guided Contrastive Learning for BERT Sentence Representations - [Arxiv] [QA]
- Steerable 3D Spherical Neurons - [Arxiv] [QA]
- Steerable 3D Spherical Neurons - [Arxiv] [QA]
- Evidential Turing Processes - [Arxiv] [QA]
- Evidential Turing Processes - [Arxiv] [QA]
- Towards Emotional Support Dialog Systems - [Arxiv] [QA]
- Multiresolution Equivariant Graph Variational Autoencoder - [Arxiv] [QA]
- Multiresolution Equivariant Graph Variational Autoencoder - [Arxiv] [QA]
- RevCore: Review-augmented Conversational Recommendation - [Arxiv] [QA]
- DialoGraph: Incorporating Interpretable Strategy-Graph Networks into Negotiation Dialogues - [Arxiv] [QA]
- Efficient Passage Retrieval with Hashing for Open-domain Question Answering - [Arxiv] [QA]
- Weighting vectors for machine learning: numerical harmonic analysis applied to boundary detection - [Arxiv] [QA]
- DYPLOC: Dynamic Planning of Content Using Mixed Language Models for Text Generation - [Arxiv] [QA]
- Towards Quantifiable Dialogue Coherence Evaluation - [Arxiv] [QA]
- Concurrent Adversarial Learning for Large-Batch Training - [Arxiv] [QA]
- Concurrent Adversarial Learning for Large-Batch Training - [Arxiv] [QA]
- Efficient and Modular Implicit Differentiation - [Arxiv] [QA]
- Efficient and Modular Implicit Differentiation - [Arxiv] [QA]
- Memory-Efficient Differentiable Transformer Architecture Search - [Arxiv] [QA]
- How Attentive are Graph Attention Networks? - [Arxiv] [QA]
- How Attentive are Graph Attention Networks? - [Arxiv] [QA]
- An Attention Free Transformer - [Arxiv] [QA]
- An Attention Free Transformer - [Arxiv] [QA]
- Gotta Go Fast When Generating Data with Score-Based Models - [Arxiv] [QA]
- Gotta Go Fast When Generating Data with Score-Based Models - [Arxiv] [QA]
- Gotta Go Fast When Generating Data with Score-Based Models - [Arxiv] [QA]
- Simple steps are all you need: Frank-Wolfe and generalized self-concordant functions - [Arxiv] [QA]
- OTTers: One-turn Topic Transitions for Open-Domain Dialogue - [Arxiv] [QA]
- Data Augmentation for Text Generation Without Any Augmented Data - [Arxiv] [QA]
- ST-HOI: A Spatial-Temporal Baseline for Human-Object Interaction Detection in Videos - [Arxiv] [QA]
- Pre-trained Language Model based Ranking in Baidu Search - [Arxiv] [QA]
- Unsupervised Speech Recognition - [Arxiv] [QA]
- Revisiting the Negative Data of Distantly Supervised Relation Extraction - [Arxiv] [QA]
- DEHB: Evolutionary Hyperband for Scalable, Robust and Efficient Hyperparameter Optimization - [Arxiv] [QA]
- Intra-Document Cascading: Learning to Select Passages for Neural Document Ranking - [Arxiv] [QA]
- Unified Conversational Recommendation Policy Learning via Graph-based Reinforcement Learning - [Arxiv] [QA]
- Value Function is All You Need: A Unified Learning Framework for Ride Hailing Platforms - [Arxiv] [QA]
- KECRS: Towards Knowledge-Enriched Conversational Recommendation System - [Arxiv] [QA]
- Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting - [Arxiv] [QA]
- ItôTTS and ItôWave: Linear Stochastic Differential Equation Is All You Need For Audio Generation - [Arxiv] [QA]
- An End-to-End Framework for Molecular Conformation Generation via Bilevel Programming - [Arxiv] [QA]
- RetGen: A Joint framework for Retrieval and Grounded Text Generation Modeling - [Arxiv] [QA]
- HyKnow: End-to-End Task-Oriented Dialog Modeling with Hybrid Knowledge Management - [Arxiv] [QA]
- Looking at CTR Prediction Again: Is Attention All You Need? - [Arxiv] [QA]
- The DEVIL is in the Details: A Diagnostic Evaluation Benchmark for Video Inpainting - [Arxiv] [QA]
- Diffusion Models Beat GANs on Image Synthesis - [Arxiv] [QA]
- VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning - [Arxiv] [QA]
- EL-Attention: Memory Efficient Lossless Attention for Generation - [Arxiv] [QA]
- Not All Relevance Scores are Equal: Efficient Uncertainty and Calibration Modeling for Deep Retrieval Models - [Arxiv] [QA]
- Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey - [Arxiv] [QA]
- Joint Learning of Deep Retrieval Model and Product Quantization based Embedding Index - [Arxiv] [QA]
- Distribution Matching for Heterogeneous Multi-Task Learning: a Large-scale Face Study - [Arxiv] [QA]
- Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems - [Arxiv] [QA]
- Improving Document Representations by Generating Pseudo Query Embeddings for Dense Retrieval - [Arxiv] [QA]
- Human Object Interaction Detection using Two-Direction Spatial Enhancement and Exclusive Object Prior - [Arxiv] [QA]
- A Survey of Data Augmentation Approaches for NLP - [Arxiv] [QA]
- Rethinking Search: Making Domain Experts out of Dilettantes - [Arxiv] [QA]
- PD-GAN: Probabilistic Diverse GAN for Image Inpainting - [Arxiv] [QA]
- Visual Relationship Detection Using Part-and-Sum Transformers with Composite Queries - [Arxiv] [QA]
- Cuboids Revisited: Learning Robust 3D Shape Fitting to Single RGB Images - [Arxiv] [QA]
- Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation - [Arxiv] [QA]
- RR-Net: Injecting Interactive Semantics in Human-Object Interaction Detection - [Arxiv] [QA]
- Emerging Properties in Self-Supervised Vision Transformers - [Arxiv] [QA]
- Open-vocabulary Object Detection via Vision and Language Knowledge Distillation - [Arxiv] [QA]
- HOTR: End-to-End Human-Object Interaction Detection with Transformers - [Arxiv] [QA]
- If your data distribution shifts, use self-learning - [Arxiv] [QA]
- If your data distribution shifts, use self-learning - [Arxiv] [QA]
- Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos - [Arxiv] [QA]
- Easy and Efficient Transformer : Scalable Inference Solution For large NLP model - [Arxiv] [QA]
- GPT2MVS: Generative Pre-trained Transformer-2 for Multi-modal Video Summarization - [Arxiv] [QA]
- PanGu-$α$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation - [Arxiv] [QA]
- Learning Passage Impacts for Inverted Indexes - [Arxiv] [QA]
- VideoGPT: Video Generation using VQ-VAE and Transformers - [Arxiv] [QA]
- UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction - [Arxiv] [QA]
- Gradient Matching for Domain Generalization - [Arxiv] [QA]
- Gradient Matching for Domain Generalization - [Arxiv] [QA]
- B-PROP: Bootstrapped Pre-training with Representative Words Prediction for Ad-hoc Retrieval - [Arxiv] [QA]
- Image Inpainting with External-internal Learning and Monochromic Bottleneck - [Arxiv] [QA]
- Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation - [Arxiv] [QA]
- The Power of Scale for Parameter-Efficient Prompt Tuning - [Arxiv] [QA]
- Explaining Answers with Entailment Trees - [Arxiv] [QA]
- Condenser: a Pre-training Architecture for Dense Retrieval - [Arxiv] [QA]
-
$Q^{2}$ : Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering - [Arxiv] [QA] - Optimizing Dense Retrieval Model Training with Hard Negatives - [Arxiv] [QA]
- Matching-oriented Product Quantization For Ad-hoc Retrieval - [Arxiv] [QA]
- Ultra-High Dimensional Sparse Representations with Binarization for Efficient Text Retrieval - [Arxiv] [QA]
- COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List - [Arxiv] [QA]
- Sparse Attention with Linear Units - [Arxiv] [QA]
- Sparse Attention with Linear Units - [Arxiv] [QA]
- Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling - [Arxiv] [QA]
- Is Disentanglement all you need? Comparing Concept-based & Disentanglement Approaches - [Arxiv] [QA]
- Learning How to Ask: Querying LMs with Mixtures of Soft Prompts - [Arxiv] [QA]
- All you need are a few pixels: semantic segmentation with PixelPick - [Arxiv] [QA]
- Spatiotemporal Entropy Model is All You Need for Learned Video Compression - [Arxiv] [QA]
- Glance and Gaze: Inferring Action-aware Points for One-Stage Human-Object Interaction Detection - [Arxiv] [QA]
- Not All Attention Is All You Need - [Arxiv] [QA]
- Progressive Temporal Feature Alignment Network for Video Inpainting - [Arxiv] [QA]
- Simple Imputation Rules for Prediction with Missing Data: Contrasting Theoretical Guarantees with Empirical Performance - [Arxiv] [QA]
- Affordance Transfer Learning for Human-Object Interaction Detection - [Arxiv] [QA]
- Learning to Estimate Hidden Motions with Global Motion Aggregation - [Arxiv] [QA]
- SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model - [Arxiv] [QA]
- Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training - [Arxiv] [QA]
- Visual Semantic Role Labeling for Video Understanding - [Arxiv] [QA]
- Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval - [Arxiv] [QA]
- NeRF-VAE: A Geometry Aware 3D Scene Generative Model - [Arxiv] [QA]
- Improved Image Generation via Sparse Modeling - [Arxiv] [QA]
- Improved Image Generation via Sparse Modeling - [Arxiv] [QA]
- Exploiting Relationship for Complex-scene Image Generation - [Arxiv] [QA]
- Jigsaw Clustering for Unsupervised Visual Representation Learning - [Arxiv] [QA]
- Domain Invariant Adversarial Learning - [Arxiv] [QA]
- Domain Invariant Adversarial Learning - [Arxiv] [QA]
- CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields - [Arxiv] [QA]
- Contrastive Embedding for Generalized Zero-Shot Learning - [Arxiv] [QA]
- AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning - [Arxiv] [QA]
- TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations - [Arxiv] [QA]
- Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers - [Arxiv] [QA]
- GNeRF: GAN-based Neural Radiance Field without Posed Camera - [Arxiv] [QA]
- Adaptive Surface Normal Constraint for Depth Estimation - [Arxiv] [QA]
- Efficient Explanations from Empirical Explainers - [Arxiv] [QA]
- Categorical Representation Learning: Morphism is All You Need - [Arxiv] [QA]
- More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval - [Arxiv] [QA]
- KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs - [Arxiv] [QA]
- DNN Quantization with Attention - [Arxiv] [QA]
- DNN Quantization with Attention - [Arxiv] [QA]
- FastMoE: A Fast Mixture-of-Expert Training System - [Arxiv] [QA]
- Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges - [Arxiv] [QA]
- API2Com: On the Improvement of Automatically Generated Code Comments Using API Documentations - [Arxiv] [QA]
- Concentric Spherical GNN for 3D Representation Learning - [Arxiv] [QA]
- Concentric Spherical GNN for 3D Representation Learning - [Arxiv] [QA]
- FastNeRF: High-Fidelity Neural Rendering at 200FPS - [Arxiv] [QA]
- GLM: General Language Model Pretraining with Autoregressive Blank Infilling - [Arxiv] [QA]
- Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE - [Arxiv] [QA]
- Topology-Aware Segmentation Using Discrete Morse Theory - [Arxiv] [QA]
- A Robust Tube-Based Smooth-MPC for Robot Manipulator Planning - [Arxiv] [QA]
- ENCONTER: Entity Constrained Progressive Sequence Generation via Insertion-based Transformer - [Arxiv] [QA]
- Detecting Human-Object Interaction via Fabricated Compositional Learning - [Arxiv] [QA]
- BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation - [Arxiv] [QA]
- Reformulating HOI Detection as Adaptive Set Prediction - [Arxiv] [QA]
- Partial Differential Equations is All You Need for Generating Neural Architectures -- A Theory for Physical Artificial Intelligence Systems - [Arxiv] [QA]
- QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information - [Arxiv] [QA]
- GAN Vocoder: Multi-Resolution Discriminator Is All You Need - [Arxiv] [QA]
- Semantic Models for the First-stage Retrieval: A Comprehensive Review - [Arxiv] [QA]
- End-to-End Human Object Interaction Detection with HOI Transformer - [Arxiv] [QA]
- Can Pretext-Based Self-Supervised Learning Be Boosted by Downstream Data? A Theoretical Analysis - [Arxiv] [QA]
- Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth - [Arxiv] [QA]
- Barlow Twins: Self-Supervised Learning via Redundancy Reduction - [Arxiv] [QA]
- Online Adversarial Attacks - [Arxiv] [QA]
- Online Adversarial Attacks - [Arxiv] [QA]
- Mixture of Volumetric Primitives for Efficient Neural Rendering - [Arxiv] [QA]
- Categorical Foundations of Gradient-Based Learning - [Arxiv] [QA]
- Learners' Languages - [Arxiv] [QA]
- Automated Machine Learning on Graphs: A Survey - [Arxiv] [QA]
- Learning Transferable Visual Models From Natural Language Supervision - [Arxiv] [QA]
- Node Proximity Is All You Need: Unified Structural and Positional Node and Graph Embedding - [Arxiv] [QA]
- Do Input Gradients Highlight Discriminative Features? - [Arxiv] [QA]
- Teach Me to Explain: A Review of Datasets for Explainable Natural Language Processing - [Arxiv] [QA]
- On Relative Pose Recovery for Multi-Camera Systems - [Arxiv] [QA]
- Meta-Learned Attribute Self-Gating for Continual Generalized Zero-Shot Learning - [Arxiv] [QA]
- Deep ReLU Networks Preserve Expected Length - [Arxiv] [QA]
- Deep ReLU Networks Preserve Expected Length - [Arxiv] [QA]
- Meta-Learning Dynamics Forecasting Using Task Inference - [Arxiv] [QA]
- Meta-Learning Dynamics Forecasting Using Task Inference - [Arxiv] [QA]
- Essentials for Class Incremental Learning - [Arxiv] [QA]
- Less is More: Pre-train a Strong Text Encoder for Dense Retrieval Using a Weak Decoder - [Arxiv] [QA]
- ShaRF: Shape-conditioned Radiance Fields from a Single View - [Arxiv] [QA]
- DEUP: Direct Epistemic Uncertainty Prediction - [Arxiv] [QA]
- DEUP: Direct Epistemic Uncertainty Prediction - [Arxiv] [QA]
- All You Need is DAG - [Arxiv] [QA]
- Topological Graph Neural Networks - [Arxiv] [QA]
- Topological Graph Neural Networks - [Arxiv] [QA]
- Is Space-Time Attention All You Need for Video Understanding? - [Arxiv] [QA]
- Contrastive Embeddings for Neural Architectures - [Arxiv] [QA]
- Contrastive Embeddings for Neural Architectures - [Arxiv] [QA]
- Hyperspherical embedding for novel class classification - [Arxiv] [QA]
- Hyperspherical embedding for novel class classification - [Arxiv] [QA]
- Unifying Vision-and-Language Tasks via Text Generation - [Arxiv] [QA]
- Causal Sufficiency and Actual Causation - [Arxiv] [QA]
- Learning Graph Embeddings for Compositional Zero-shot Learning - [Arxiv] [QA]
- VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs - [Arxiv] [QA]
- Compositional Semantics for Probabilistic Programs with Exact Conditioning - [Arxiv] [QA]
- RESPER: Computationally Modelling Resisting Strategies in Persuasive Conversations - [Arxiv] [QA]
- Reverse Derivative Ascent: A Categorical Approach to Learning Boolean Circuits - [Arxiv] [QA]
- Transferable Interactiveness Knowledge for Human-Object Interaction Detection - [Arxiv] [QA]
- Advances and Challenges in Conversational Recommender Systems: A Survey - [Arxiv] [QA]
- A Comprehensive Survey on Hardware-Aware Neural Architecture Search - [Arxiv] [QA]
- Higher Order Automatic Differentiation of Higher Order Functions - [Arxiv] [QA]
- The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models - [Arxiv] [QA]
- Evaluating Disentanglement of Structured Representations - [Arxiv] [QA]
- Evaluating Disentanglement of Structured Representations - [Arxiv] [QA]
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity - [Arxiv] [QA]
- Evolving Reinforcement Learning Algorithms - [Arxiv] [QA]
- Max-Affine Spline Insights Into Deep Network Pruning - [Arxiv] [QA]
- Max-Affine Spline Insights Into Deep Network Pruning - [Arxiv] [QA]
- VinVL: Revisiting Visual Representations in Vision-Language Models - [Arxiv] [QA]
- Prefix-Tuning: Optimizing Continuous Prompts for Generation - [Arxiv] [QA]
- Multi-task Retrieval for Knowledge-Intensive Tasks - [Arxiv] [QA]