This is a list of BERT-related papers. Any feedback is welcome.
- Survey paper
- Downstream task
- Generation
- Quality evaluator
- Modification (multi-task, masking strategy, etc.)
- Transformer variants
- Probe
- Inside BERT
- Multi-lingual
- Other than English models
- Domain specific
- Multi-modal
- Model compression
- Misc.
- Evolution of transfer learning in natural language processing
- Pre-trained Models for Natural Language Processing: A Survey
- A Survey on Contextual Embeddings
- A Survey on Transfer Learning in Natural Language Processing
- Which *BERT? A Survey Organizing Contextualized Encoders (EMNLP2020)
- Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond
- A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics, and Benchmark Datasets
- A BERT Baseline for the Natural Questions
- MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension (ACL2019)
- BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions (NAACL2019) [github]
- Natural Perturbation for Robust Question Answering
- Unsupervised Domain Adaptation on Reading Comprehension
- BERTQA -- Attention on Steroids
- Exploring BERT Parameter Efficiency on the Stanford Question Answering Dataset v2.0
- Adversarial Augmentation Policy Search for Domain and Cross-Lingual Generalization in Reading Comprehension
- Logic-Guided Data Augmentation and Regularization for Consistent Question Answering (ACL2020)
- UnifiedQA: Crossing Format Boundaries With a Single QA System
- A Multi-Type Multi-Span Network for Reading Comprehension that Requires Discrete Reasoning (EMNLP2019)
- A Simple and Effective Model for Answering Multi-span Questions [github]
- Injecting Numerical Reasoning Skills into Language Models (ACL2020)
- Towards Question Format Independent Numerical Reasoning: A Set of Prerequisite Tasks
- SDNet: Contextualized Attention-based Deep Network for Conversational Question Answering
- Multi-hop Question Answering via Reasoning Chains
- Select, Answer and Explain: Interpretable Multi-hop Reading Comprehension over Multiple Documents
- Multi-step Entity-centric Information Retrieval for Multi-Hop Question Answering (EMNLP2019 WS)
- Fine-tuning Multi-hop Question Answering with Hierarchical Graph Network
- Unsupervised Alignment-based Iterative Evidence Retrieval for Multi-hop Question Answering (ACL2020)
- HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data
- Unsupervised Multi-hop Question Answering by Question Generation
- End-to-End Open-Domain Question Answering with BERTserini (NAALC2019)
- Latent Retrieval for Weakly Supervised Open Domain Question Answering (ACL2019)
- Dense Passage Retrieval for Open-Domain Question Answering (EMNLP2020)
- Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval
- RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering
- Pre-training Tasks for Embedding-based Large-scale Retrieval (ICLR2020)
- Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering (EMNLP2019)
- QED: A Framework and Dataset for Explanations in Question Answering [github]
- Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering (ICLR2020)
- Relevance-guided Supervision for OpenQA with ColBERT
- RECONSIDER: Re-Ranking using Span-Focused Cross-Attention for Open Domain Question Answering
- SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval
- Don't Read Too Much into It: Adaptive Computation for Open-Domain Question Answering (EMNLP2020 WS)
- Is Retriever Merely an Approximator of Reader?
- Neural Retrieval for Question Answering with Cross-Attention Supervised Data Augmentation
- RikiNet: Reading Wikipedia Pages for Natural Question Answering (ACL2020)
- BERT-kNN: Adding a kNN Search Component to Pretrained Language Models for Better QA
- DC-BERT: Decoupling Question and Document for Efficient Contextual Encoding (SIGIR2020)
- Learning to Ask Unanswerable Questions for Machine Reading Comprehension (ACL2019)
- Unsupervised Question Answering by Cloze Translation (ACL2019)
- Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation (ICLR2020)
- A Recurrent BERT-based Model for Question Generation (EMNLP2019 WS)
- Unsupervised Question Decomposition for Question Answering [github]
- Conversational Question Reformulation via Sequence-to-Sequence Architectures and Pretrained Language Models
- Template-Based Question Generation from Retrieved Sentences for Improved Unsupervised Question Answering (ACL2020)
- What Are People Asking About COVID-19? A Question Classification Dataset
- Learning to Answer by Learning to Ask: Getting the Best of GPT-2 and BERT Worlds
- Enhancing Pre-Trained Language Representations with Rich Knowledge for Machine Reading Comprehension (ACL2019)
- Incorporating Relation Knowledge into Commonsense Reading Comprehension with Multi-task Learning (CIKM2019)
- SG-Net: Syntax-Guided Machine Reading Comprehension
- MMM: Multi-stage Multi-task Learning for Multi-choice Reading Comprehension
- Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning (EMNLP2019)
- ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning (ICLR2020)
- Robust Reading Comprehension with Linguistic Constraints via Posterior Regularization
- BAS: An Answer Selection Method Using BERT Language Model
- TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection (AAAI2020)
- The Cascade Transformer: an Application for Efficient Answer Sentence Selection (ACL2020)
- Support-BERT: Predicting Quality of Question-Answer Pairs in MSDN using Deep Bidirectional Transformer
- Beat the AI: Investigating Adversarial Human Annotations for Reading Comprehension
- Benchmarking Robustness of Machine Reading Comprehension Models
- Evaluating NLP Models via Contrast Sets
- Undersensitivity in Neural Reading Comprehension
- Developing a How-to Tip Machine Comprehension Dataset and its Evaluation in Machine Comprehension by BERT (ACL2020 WS)
- A Simple but Effective Method to Incorporate Multi-turn Context with BERT for Conversational Machine Comprehension (ACL2019 WS)
- FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension (ACL2019 WS)
- BERT with History Answer Embedding for Conversational Question Answering (SIGIR2019)
- GraphFlow: Exploiting Conversation Flow with Graph Neural Networks for Conversational Machine Comprehension (ICML2019 WS)
- TAPAS: Weakly Supervised Table Parsing via Pre-training (ACL2020)
- TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data (ACL2020)
- Understanding tables with intermediate pre-training (EMNLP2020 Findings)
- GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing
- Table Search Using a Deep Contextualized Language Model (SIGIR2020)
- TORQUE: A Reading Comprehension Dataset of Temporal Ordering Questions (EMNLP2020)
- Beyond English-only Reading Comprehension: Experiments in Zero-Shot Multilingual Transfer for Bulgarian (RANLP2019)
- XQA: A Cross-lingual Open-domain Question Answering Dataset (ACL2019)
- XOR QA: Cross-lingual Open-Retrieval Question Answering [website]
- Cross-Lingual Machine Reading Comprehension (EMNLP2019)
- Zero-shot Reading Comprehension by Cross-lingual Transfer Learning with Multi-lingual Language Representation Model
- Multilingual Question Answering from Formatted Text applied to Conversational Agents
- BiPaR: A Bilingual Parallel Dataset for Multilingual and Cross-lingual Reading Comprehension on Novels (EMNLP2019)
- MLQA: Evaluating Cross-lingual Extractive Question Answering
- Multilingual Synthetic Question and Answer Generation for Cross-Lingual Reading Comprehension
- Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering
- Cross-lingual Machine Reading Comprehension with Language Branch Knowledge Distillation (COLING2020)
- MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering [github]
- Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension (TACL)
- SberQuAD - Russian Reading Comprehension Dataset: Description and Analysis
- DuReaderrobust: A Chinese Dataset Towards Evaluating the Robustness of Machine Reading Comprehension Models
- Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension (EMNLP2019)
- DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue [website]
- BERT-DST: Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer (Interspeech2019)
- Dialog State Tracking: A Neural Reading Comprehension Approach
- A Simple but Effective BERT Model for Dialog State Tracking on Resource-Limited Systems (ICASSP2020)
- Fine-Tuning BERT for Schema-Guided Zero-Shot Dialogue State Tracking
- Goal-Oriented Multi-Task BERT-Based Dialogue State Tracker
- Dialogue State Tracking with Pretrained Encoder for Multi-domain Trask-oriented Dialogue Systems
- Zero-Shot Transfer Learning with Synthesized Data for Multi-Domain Dialogue State Tracking (ACL2020)
- A Fast and Robust BERT-based Dialogue State Tracker for Schema-Guided Dialogue Dataset (KDD2020 WS)
- ToD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogues (EMNLP2020)
- Domain Adaptive Training BERT for Response Selection
- Speaker-Aware BERT for Multi-Turn Response Selection in Retrieval-Based Chatbots
- Curriculum Learning Strategies for IR: An Empirical Study on Conversation Response Ranking (ECIR2020)
- MuTual: A Dataset for Multi-Turn Dialogue Reasoning (ACL2020)
- DialBERT: A Hierarchical Pre-Trained Model for Conversation Disentanglement
- Generalized Conditioned Dialogue Generation Based on Pre-trained Language Model
- BERT Goes to Law School: Quantifying the Competitive Advantage of Access to Large Legal Corpora in Contract Understanding
- BERT for Joint Intent Classification and Slot Filling
- A Co-Interactive Transformer for Joint Slot Filling and Intent Detection
- Multi-lingual Intent Detection and Slot Filling in a Joint BERT-based Model
- A Comparison of Deep Learning Methods for Language Understanding (Interspeech2019)
- Data Augmentation for Spoken Language Understanding via Pretrained Models
- Fine-grained Information Status Classification Using Discourse Context-Aware Self-Attention
- Neural Aspect and Opinion Term Extraction with Mined Rules as Weak Supervision (ACL2019)
- BERT-based Lexical Substitution (ACL2019)
- Assessing BERT’s Syntactic Abilities
- Investigating Novel Verb Learning in BERT: Selectional Preference Classes and Alternation-Based Syntactic Generalization (EMNLP2020 WS)
- Does BERT agree? Evaluating knowledge of structure dependence through agreement relations
- Simple BERT Models for Relation Extraction and Semantic Role Labeling
- LIMIT-BERT : Linguistic Informed Multi-Task BERT (EMNLP2020 Findings)
- Joint Semantic Analysis with Document-Level Cross-Task Coherence Rewards
- A Simple BERT-Based Approach for Lexical Simplification
- BERT-Based Arabic Social Media Author Profiling
- Sentence-Level BERT and Multi-Task Learning of Age and Gender in Social Media
- Evaluating the Factual Consistency of Abstractive Text Summarization
- Generating Fact Checking Explanations (ACL2020)
- NegBERT: A Transfer Learning Approach for Negation Detection and Scope Resolution
- xSLUE: A Benchmark and Analysis Platform for Cross-Style Language Understanding and Evaluation
- TabFact: A Large-scale Dataset for Table-based Fact Verification (ICLR2020)
- Rapid Adaptation of BERT for Information Extraction on Domain-Specific Business Documents
- A Focused Study to Compare Arabic Pre-training Models on Newswire IE Tasks
- LAMBERT: Layout-Aware language Modeling using BERT for information extraction
- Keyphrase Extraction from Scholarly Articles as Sequence Labeling using Contextualized Embeddings (ECIR2020) [github]
- Keyphrase Extraction with Span-based Feature Representations
- Keyphrase Prediction With Pre-trained Language Model
- Joint Keyphrase Chunking and Salience Ranking with BERT
- Generalizing Natural Language Analysis through Span-relation Representations (ACL2020) [github]
- What do you mean, BERT? Assessing BERT as a Distributional Semantics Model
- tBERT: Topic Models and BERT Joining Forces for Semantic Similarity Detection (ACL2020)
- Domain Adaptation with BERT-based Domain Classification and Data Selection (EMNLP2019 WS)
- Knowledge Distillation for BERT Unsupervised Domain Adaptation
- Sensitive Data Detection and Classification in Spanish Clinical Text: Experiments with BERT (LREC2020)
- On the Importance of Word and Sentence Representation Learning in Implicit Discourse Relation Classification (IJCAI2020)
- Adapting BERT to Implicit Discourse Relation Classification with a Focus on Discourse Connectives (LREC2020)
- Labeling Explicit Discourse Relations using Pre-trained Language Models (TSD2020)
- Cross-lingual Zero- and Few-shot Hate Speech Detection Utilising Frozen Transformer Language Models and AXEL
- Same Side Stance Classification Task: Facilitating Argument Stance Classification by Fine-tuning a BERT Model
- Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for Offensive Language Detection
- KEIS@JUST at SemEval-2020 Task 12: Identifying Multilingual Offensive Tweets Using Weighted Ensemble and Fine-Tuned BERT
- ALBERT-BiLSTM for Sequential Metaphor Detection (ACL2020 WS)
- A BERT-based Dual Embedding Model for Chinese Idiom Prediction (COLING2020)
- Should You Fine-Tune BERT for Automated Essay Scoring? (ACL2020 WS)
- KILT: a Benchmark for Knowledge Intensive Language Tasks [github]
- IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding (AACL-IJCNLP2020)
- MedFilter: Improving Extraction of Task-relevant Utterances through Integration of Discourse Structure and Ontological Knowledge (EMNLP2020)
- BERT Meets Chinese Word Segmentation
- Unified Multi-Criteria Chinese Word Segmentation with BERT
- RethinkCWS: Is Chinese Word Segmentation a Solved Task? (EMNLP2020) [github]
- Joint Persian Word Segmentation Correction and Zero-Width Non-Joiner Recognition Using BERT
- Toward Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning
- Establishing Strong Baselines for the New Decade: Sequence Tagging, Syntactic and Semantic Parsing with BERT (FLAIRS-33)
- Evaluating Contextualized Embeddings on 54 Languages in POS Tagging, Lemmatization and Dependency Parsing
- fastHan: A BERT-based Joint Many-Task Toolkit for Chinese NLP
- Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing -- A Tale of Two Parsers Revisited (EMNLP2019)
- Is POS Tagging Necessary or Even Helpful for Neural Dependency Parsing?
- Parsing as Pretraining (AAAI2020)
- Cross-Lingual BERT Transformation for Zero-Shot Dependency Parsing
- Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement
- pyBART: Evidence-based Syntactic Transformations for IE [github]
- Named Entity Recognition -- Is there a glass ceiling? (CoNLL2019)
- A Unified MRC Framework for Named Entity Recognition
- Biomedical named entity recognition using BERT in the machine reading comprehension framework
- Training Compact Models for Low Resource Entity Tagging using Pre-trained Language Models
- Robust Named Entity Recognition with Truecasing Pretraining (AAAI2020)
- LTP: A New Active Learning Strategy for Bert-CRF Based Named Entity Recognition
- Named Entity Recognition as Dependency Parsing (ACL2020)
- Exploring Cross-sentence Contexts for Named Entity Recognition with BERT
- Embeddings of Label Components for Sequence Labeling: A Case Study of Fine-grained Named Entity Recognition (ACL2020 SRW)
- BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision (KDD2020) [github]
- Interpretability Analysis for Named Entity Recognition to Understand System Predictions and How They Can Improve
- Single-/Multi-Source Cross-Lingual NER via Teacher-Student Learning on Unlabeled Data in Target Language (ACL2020)
- To BERT or Not to BERT: Comparing Task-specific and Task-agnostic Semi-Supervised Approaches for Sequence Tagging (EMNLP2020)
- Example-Based Named Entity Recognition
- FLERT: Document-Level Features for Named Entity Recognition
- What's in a Name? Are BERT Named Entity Representations just as Good for any other Name? (ACL2020 WS)
- Interpretable Multi-dataset Evaluation for Named Entity Recognition (EMNLP2020) [github]
- BioALBERT: A Simple and Effective Pre-trained Language Model for Biomedical Named Entity Recognition
- MT-BioNER: Multi-task Learning for Biomedical Named Entity Recognition using Deep Bidirectional Transformers
- Knowledge Guided Named Entity Recognition for BioMedical Text
- Portuguese Named Entity Recognition using BERT-CRF
- Towards Lingua Franca Named Entity Recognition with BERT
- A Brief Survey and Comparative Study of Recent Development of Pronoun Coreference Resolution
- Resolving Gendered Ambiguous Pronouns with BERT (ACL2019 WS)
- Anonymized BERT: An Augmentation Approach to the Gendered Pronoun Resolution Challenge (ACL2019 WS)
- Gendered Pronoun Resolution using BERT and an extractive question answering formulation (ACL2019 WS)
- MSnet: A BERT-based Network for Gendered Pronoun Resolution (ACL2019 WS)
- Scalable Cross Lingual Pivots to Model Pronoun Gender for Translation
- Fill the GAP: Exploiting BERT for Pronoun Resolution (ACL2019 WS)
- On GAP Coreference Resolution Shared Task: Insights from the 3rd Place Solution (ACL2019 WS)
- Look Again at the Syntax: Relational Graph Convolutional Network for Gendered Ambiguous Pronoun Resolution (ACL2019 WS)
- BERT Masked Language Modeling for Co-reference Resolution (ACL2019 WS)
- Coreference Resolution with Entity Equalization (ACL2019)
- BERT for Coreference Resolution: Baselines and Analysis (EMNLP2019) [github]
- WikiCREM: A Large Unsupervised Corpus for Coreference Resolution (EMNLP2019)
- Ellipsis and Coreference Resolution as Question Answering
- Coreference Resolution as Query-based Span Prediction
- Coreferential Reasoning Learning for Language Representation (EMNLP2020)
- Revisiting Memory-Efficient Incremental Coreference Resolution
- Revealing the Myth of Higher-Order Inference in Coreference Resolution (EMNLP2020)
- Neural Mention Detection (LREC2020)
- ZPR2: Joint Zero Pronoun Recovery and Resolution using Multi-Task Learning and BERT (ACL2020)
- Multi-task Learning Based Neural Bridging Reference Resolution
- Bridging Anaphora Resolution as Question Answering (ACL2020)
- Fine-grained Information Status Classification Using Discourse Context-Aware BERT (COLING2020)
- Language Models and Word Sense Disambiguation: An Overview and Analysis
- GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge (EMNLP2019)
- Adapting BERT for Word Sense Disambiguation with Gloss Selection Objective and Example Sentences (EMNLP2020 Findings)
- Improved Word Sense Disambiguation Using Pre-Trained Contextualized Word Representations (EMNLP2019)
- Using BERT for Word Sense Disambiguation
- Language Modelling Makes Sense: Propagating Representations through WordNet for Full-Coverage Word Sense Disambiguation (ACL2019)
- Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings (KONVENS2019)
- An Accurate Model for Predicting the (Graded) Effect of Context in Word Similarity Based on Bert
- CluBERT: A Cluster-Based Approach for Learning Sense Distributions in Multiple Languages (ACL2020)
- VCDM: Leveraging Variational Bi-encoding and Deep Contextualized Word Representations for Improved Definition Modeling (EMNLP2020)
- Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence (NAACL2019)
- BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis (NAACL2019)
- Exploiting BERT for End-to-End Aspect-based Sentiment Analysis (EMNLP2019 WS)
- Improving BERT Performance for Aspect-Based Sentiment Analysis
- Context-Guided BERT for Targeted Aspect-Based Sentiment Analysis
- Understanding Pre-trained BERT for Aspect-based Sentiment Analysis (COLING2020)
- Adapt or Get Left Behind: Domain Adaptation through BERT Language Model Finetuning for Aspect-Target Sentiment Classification (LREC2020)
- An Investigation of Transfer Learning-Based Sentiment Analysis in Japanese (ACL2019)
- "Mask and Infill" : Applying Masked Language Model to Sentiment Transfer
- Adversarial Training for Aspect-Based Sentiment Analysis with BERT
- Adversarial and Domain-Aware BERT for Cross-Domain Sentiment Analysis (ACL2020)
- Utilizing BERT Intermediate Layers for Aspect Based Sentiment Analysis and Natural Language Inference
- DomBERT: Domain-oriented Language Model for Aspect-based Sentiment Analysis
- SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics (ACL2020)
- Matching the Blanks: Distributional Similarity for Relation Learning (ACL2019)
- BERT-Based Multi-Head Selection for Joint Entity-Relation Extraction (NLPCC2019)
- Enriching Pre-trained Language Model with Entity Information for Relation Classification
- Span-based Joint Entity and Relation Extraction with Transformer Pre-training
- Fine-tune Bert for DocRED with Two-step Process
- Relation Extraction as Two-way Span-Prediction
- Entity, Relation, and Event Extraction with Contextualized Span Representations (EMNLP2019)
- Fine-tuning BERT for Joint Entity and Relation Extraction in Chinese Medical Text
- Downstream Model Design of Pre-trained Language Model for Relation Extraction Task
- Efficient long-distance relation extraction with DG-SpanBERT
- Global-to-Local Neural Networks for Document-Level Relation Extraction (EMNLP2020)
- DARE: Data Augmented Relation Extraction with GPT-2
- Distantly-Supervised Neural Relation Extraction with Side Information using BERT (IJCNN2020)
- Dialogue-Based Relation Extraction (ACL2020)
- A Novel Cascade Binary Tagging Framework for Relational Triple Extraction (ACL2020) [github]
- ExpBERT: Representation Engineering with Natural Language Explanations (ACL2020) [github]
- AutoRC: Improving BERT Based Relation Classification Models via Architecture Search
- Investigation of BERT Model on Biomedical Relation Extraction Based on Revised Fine-tuning Mechanism
- Experiments on transfer learning architectures for biomedical relation extraction
- Cross-Lingual Relation Extraction with Transformers
- Improving Scholarly Knowledge Representation: Evaluating BERT-based Models for Scientific Relation Classification
- Robustly Pre-trained Neural Model for Direct Temporal Relation Extraction
- A BERT-based One-Pass Multi-Task Model for Clinical Temporal Relation Extraction (ACL2020 WS)
- Exploring Contextualized Neural Language Models for Temporal Dependency Parsing
- Temporal Reasoning on Implicit Events from Distant Supervision
- IMoJIE: Iterative Memory-Based Joint Open Information Extraction (ACL2020)
- OpenIE6: Iterative Grid Labeling and Coordination Analysis for Open Information Extraction (EMNLP2020) [github]
- KG-BERT: BERT for Knowledge Graph Completion
- How Context Affects Language Models' Factual Predictions (AKBC2020)
- Inducing Relational Knowledge from BERT (AAAI2020)
- Latent Relation Language Models (AAAI2020)
- Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model (ICLR2020)
- Scalable Zero-shot Entity Linking with Dense Entity Retrieval (EMNLP2020) [github]
- Zero-shot Entity Linking with Efficient Long Range Sequence Modeling (EMNLP2020 Findings)
- Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking (CoNLL2019)
- Improving Entity Linking by Modeling Latent Entity Type Information (AAAI2020)
- Global Entity Disambiguation with Pretrained Contextualized Embeddings of Words and Entities
- YELM: End-to-End Contextualized Entity Linking
- Empirical Evaluation of Pretraining Strategies for Supervised Entity Linking (AKBC2020)
- LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention (EMNLP2020) [github]
- PEL-BERT: A Joint Model for Protocol Entity Linking
- Efficient One-Pass End-to-End Entity Linking for Questions (EMNLP2020) [github]
- Cross-Lingual Transfer in Zero-Shot Cross-Language Entity Linking
- Entity Linking in 100 Languages (EMNLP2020) [github]
- COMETA: A Corpus for Medical Entity Linking in the Social Media (EMNLP2020) [github]
- How Can We Know What Language Models Know?
- Deep Entity Matching with Pre-Trained Language Models
- Inducing Taxonomic Knowledge from Pretrained Transformers
- Language Models are Open Knowledge Graphs
- DualTKB: A Dual Learning Bridge between Text and Knowledge Base (EMNLP2020) [github]
- How to Avoid Being Eaten by a Grue: Structured Exploration Strategies for Textual Worlds [github]
- Deep Learning Based Text Classification: A Comprehensive Review
- A Text Classification Survey: From Shallow to Deep Learning
- How to Fine-Tune BERT for Text Classification?
- X-BERT: eXtreme Multi-label Text Classification with BERT
- DocBERT: BERT for Document Classification
- Enriching BERT with Knowledge Graph Embeddings for Document Classification
- Classification and Clustering of Arguments with Contextualized Word Embeddings (ACL2019)
- BERT for Evidence Retrieval and Claim Verification
- Stacked DeBERT: All Attention in Incomplete Data for Text Classification
- Cost-Sensitive BERT for Generalisable Sentence Classification with Imbalanced Data
- BAE: BERT-based Adversarial Examples for Text Classification (EMNLP2020)
- FireBERT: Hardening BERT-based classifiers against adversarial attack [github]
- GAN-BERT: Generative Adversarial Learning for Robust Text Classification with a Bunch of Labeled Examples (ACL2020)
- Description Based Text Classification with Reinforcement Learning
- VGCN-BERT: Augmenting BERT with Graph Embedding for Text Classification
- Zero-shot Text Classification via Reinforced Self-training (ACL2020)
- On Data Augmentation for Extreme Multi-label Classification
- Towards Evaluating the Robustness of Chinese BERT Classifiers
- COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter [github]
- Large Scale Legal Text Classification Using Transformer Models
- A Comparison of LSTM and BERT for Small Corpus
- Exploring Unsupervised Pretraining and Sentence Structure Modelling for Winograd Schema Challenge
- A Surprisingly Robust Trick for the Winograd Schema Challenge
- WinoGrande: An Adversarial Winograd Schema Challenge at Scale (AAAI2020)
- TTTTTackling WinoGrande Schemas
- WinoWhy: A Deep Diagnosis of Essential Commonsense Knowledge for Answering Winograd Schema Challenge (ACL2020)
- The Sensitivity of Language Models and Humans to Winograd Schema Perturbations (ACL2020)
- Precise Task Formalization Matters in Winograd Schema Evaluations (EMNLP2020)
- Tackling Domain-Specific Winograd Schemas with Knowledge-Based Reasoning and Machine Learning
- A Review of Winograd Schema Challenge Datasets and Approaches
- Improving Natural Language Inference with a Pretrained Parser
- Are Natural Language Inference Models IMPPRESsive? Learning IMPlicature and PRESupposition
- Adversarial NLI: A New Benchmark for Natural Language Understanding
- Adversarial Analysis of Natural Language Inference Systems (ICSC2020)
- ANLIzing the Adversarial Natural Language Inference Dataset
- Syntactic Data Augmentation Increases Robustness to Inference Heuristics (ACL2020)
- HypoNLI: Exploring the Artificial Patterns of Hypothesis-only Bias in Natural Language Inference (LREC2020)
- Use of Machine Translation to Obtain Labeled Datasets for Resource-Constrained Languages (EMNLP2020) [github]
- FarsTail: A Persian Natural Language Inference Dataset
- Evaluating BERT for natural language inference: A case study on the CommitmentBank (EMNLP2019)
- Do Neural Models Learn Systematicity of Monotonicity Inference in Natural Language? (ACL2020)
- Abductive Commonsense Reasoning (ICLR2020)
- Collecting Entailment Data for Pretraining: New Protocols and Negative Results
- Mining Knowledge for Natural Language Inference from Wikipedia Categories (EMNLP2020 Findings)
- CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge (NAACL2019)
- HellaSwag: Can a Machine Really Finish Your Sentence? (ACL2019) [website]
- A Method for Building a Commonsense Inference Dataset Based on Basic Events (EMNLP2020) [website]
- Story Ending Prediction by Transferable BERT (IJCAI2019)
- Explain Yourself! Leveraging Language Models for Commonsense Reasoning (ACL2019)
- Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning (ACL2020)
- Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models
- Informing Unsupervised Pretraining with External Linguistic Knowledge
- Commonsense Knowledge + BERT for Level 2 Reading Comprehension Ability Test
- BIG MOOD: Relating Transformers to Explicit Commonsense Knowledge
- Commonsense Knowledge Mining from Pretrained Models (EMNLP2019)
- KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning (EMNLP2019)
- Cracking the Contextual Commonsense Code: Understanding Commonsense Reasoning Aptitude of Deep Contextual Representations (EMNLP2019 WS)
- Do Massively Pretrained Language Models Make Better Storytellers? (CoNLL2019)
- PIQA: Reasoning about Physical Commonsense in Natural Language (AAAI2020)
- Evaluating Commonsense in Pre-trained Language Models (AAAI2020)
- Why Do Masked Neural Language Models Still Need Common Sense Knowledge?
- Does BERT Solve Commonsense Task via Commonsense Knowledge?
- Unsupervised Commonsense Question Answering with Self-Talk (EMNLP2020)
- G-DAUG: Generative Data Augmentation for Commonsense Reasoning
- Contrastive Self-Supervised Learning for Commonsense Reasoning (ACL2020)
- Adversarial Training for Commonsense Inference (ACL2020 WS)
- Do Fine-tuned Commonsense Language Models Really Generalize?
- XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [github]
- Do Neural Language Representations Learn Physical Commonsense? (CogSci2019)
- HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization (ACL2019)
- Deleter: Leveraging BERT to Perform Unsupervised Successive Text Compression
- Discourse-Aware Neural Extractive Model for Text Summarization
- AREDSUM: Adaptive Redundancy-Aware Iterative Sentence Ranking for Extractive Document Summarization
- Fact-level Extractive Summarization with Hierarchical Graph Mask on BERT (COLING2020)
- Multi-Document Summarization with Determinantal Point Processes and Contextualized Representations (EMNLP2019 WS)
- Continual BERT: Continual Learning for Adaptive Extractive Summarization of COVID-19 Literature
- Multi-headed Architecture Based on BERT for Grammatical Errors Correction (ACL2019 WS)
- Towards Minimal Supervision BERT-based Grammar Error Correction
- Learning to combine Grammatical Error Corrections (EMNLP2019 WS)
- Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction (ACL2020)
- Chinese Grammatical Correction Using BERT-based Pre-trained Model (AACL-IJCNLP2020)
- Spelling Error Correction with Soft-Masked BERT (ACL2020)
- Pretrained Transformers for Text Ranking: BERT and Beyond
- Passage Re-ranking with BERT
- Investigating the Successes and Failures of BERT for Passage Re-Ranking
- Understanding the Behaviors of BERT in Ranking
- Document Expansion by Query Prediction
- CEDR: Contextualized Embeddings for Document Ranking (SIGIR2019)
- Deeper Text Understanding for IR with Contextual Neural Language Modeling (SIGIR2019)
- FAQ Retrieval using Query-Question Similarity and BERT-Based Query-Answer Relevance (SIGIR2019)
- An Analysis of BERT FAQ Retrieval Models for COVID-19 Infobot
- COUGH: A Challenge Dataset and Models for COVID-19 FAQ Retrieval
- Unsupervised FAQ Retrieval with Question Generation and BERT (ACL2020)
- Multi-Stage Document Ranking with BERT
- Learning-to-Rank with BERT in TF-Ranking
- Transformer-Based Language Models for Similar Text Retrieval and Ranking
- DeText: A Deep Text Ranking Framework with BERT
- ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT (SIGIR2020)
- RepBERT: Contextualized Text Embeddings for First-Stage Retrieval [github]
- Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval
- Multi-Perspective Semantic Information Retrieval
- Expansion via Prediction of Importance with Contextualization (SIGIR2020)
- BERT-QE: Contextualized Query Expansion for Document Re-ranking (EMNLP2020 Findings)
- Beyond [CLS] through Ranking by Generation (EMNLP2020)
- Efficient Document Re-Ranking for Transformers by Precomputing Term Representations (SIGIR2020)
- Training Curricula for Open Domain Answer Re-Ranking (SIGIR2020)
- Guided Transformer: Leveraging Multiple External Sources for Representation Learning in Conversational Search (SIGIR2020)
- Fine-tune BERT for E-commerce Non-Default Search Ranking
- IR-BERT: Leveraging BERT for Semantic Search in Background Linking for News Articles
- ProphetNet-Ads: A Looking Ahead Strategy for Generative Retrieval Models in Sponsored Search Engine (NLPCC2020)
- Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset: Preliminary Thoughts and Lessons Learned (ACL2020 WS)
- SLEDGE-Z: A Zero-Shot Baseline for COVID-19 Literature Search (EMNLP2020)
- Effective Transfer Learning for Identifying Similar Questions: Matching User Questions to COVID-19 FAQs
- Cross-lingual Information Retrieval with BERT
- Cross-lingual Retrieval for Iterative Self-Supervised Training (NeurIPS2020)
- Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning (ECIR2020)
- BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model (NAACL2019 WS)
- Pretraining-Based Natural Language Generation for Text Summarization
- Text Summarization with Pretrained Encoders (EMNLP2019) [github (original)] [github (huggingface)]
- Multi-stage Pretraining for Abstractive Summarization
- PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
- Abstractive Summarization with Combination of Pre-trained Sequence-to-Sequence and Saliency Models
- GSum: A General Framework for Guided Neural Abstractive Summarization [github]
- STEP: Sequence-to-Sequence Transformer Pre-training for Document Summarization
- TLDR: Extreme Summarization of Scientific Documents [github]
- Product Title Generation for Conversational Systems using BERT
- WSL-DS: Weakly Supervised Learning with Distant Supervision for Query Focused Multi-Document Abstractive Summarization (COLING2020)
- Constrained Abstractive Summarization: Preserving Factual Consistency with Constrained Generation
- Abstractive Summarization of Spoken and Written Instructions with BERT
- BERT Fine-tuning For Arabic Text Summarization (ICLR2020 WS)
- Automatic Text Summarization of COVID-19 Medical Research Articles using BERT and GPT-2
- Mixed-Lingual Pre-training for Cross-lingual Summarization (AACL-IJCNLP2020)
- PoinT-5: Pointer Network and T-5 based Financial NarrativeSummarisation (COLING2020 WS)
- MASS: Masked Sequence to Sequence Pre-training for Language Generation (ICML2019) [github], [github]
- JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation (LREC2020)
- Unified Language Model Pre-training for Natural Language Understanding and Generation [github] (NeurIPS2019)
- UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training [github]
- Dual Inference for Improving Language Understanding and Generation (EMNLP2020 Findings)
- ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training (EMNLP2020 Findings) [github]
- Towards Making the Most of BERT in Neural Machine Translation
- Improving Neural Machine Translation with Pre-trained Representation
- On the use of BERT for Neural Machine Translation (EMNLP2019 WS)
- Incorporating BERT into Neural Machine Translation (ICLR2020)
- Recycling a Pre-trained BERT Encoder for Neural Machine Translation
- Reusing a Pretrained Language Model on Languages with Limited Corpora for Unsupervised NMT (EMNLP2020)
- Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
- Mask-Predict: Parallel Decoding of Conditional Masked Language Models (EMNLP2019)
- PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation (EMNLP2020)
- ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation
- Cross-Lingual Natural Language Generation via Pre-Training (AAAI2020) [github]
- PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable (ACL2020)
- A Tailored Pre-Training Model for Task-Oriented Dialog Generation
- Pretrained Language Models for Dialogue Generation with Multiple Input Sources (EMNLP2020 Findings)
- Knowledge-Grounded Dialogue Generation with Pre-trained Language Models (EMNLP2020)
- Are Pre-trained Language Models Knowledgeable to Ground Open Domain Dialogues?
- Open-Domain Dialogue Generation Based on Pre-trained Language Models
- CG-BERT: Conditional Text Generation with BERT for Generalized Few-shot Intent Detection
- QURIOUS: Question Generation Pretraining for Text Generation
- Few-Shot NLG with Pre-Trained Language Model (ACL2020)
- Text-to-Text Pre-Training for Data-to-Text Tasks
- KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation (EMNLP2020)
- Evaluating Semantic Accuracy of Data-to-Text Generation with Natural Language Inference (INLG2020)
- Large Scale Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training
- Structure-Grounded Pretraining for Text-to-SQL
- Data Agnostic RoBERTa-based Natural Language to SQL Query Generation
- ToTTo: A Controlled Table-To-Text Generation Dataset (EMNLP2020) [github]
- A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation (TACL2020) [github]
- MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models (EMNLP2020)
- CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning [github] [website] (EMNLP2020 Findings)
- Pre-training Text-to-Text Transformers for Concept-centric Common Sense
- Language Generation with Multi-Hop Reasoning on Commonsense Knowledge Graph (EMNLP2020)
- KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense Reasoning
- EIGEN: Event Influence GENeration using Pre-trained Language Models
- GeDi: Generative Discriminator Guided Sequence Generation
- Generating similes effortlessly like a Pro: A Style Transfer Approach for Simile Generation (EMNLP2020)
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (JMLR2020) [github]
- mT5: A massively multilingual pre-trained text-to-text transformer [github]
- WT5?! Training Text-to-Text Models to Explain their Predictions
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (ACL2020)
- Multilingual Denoising Pre-training for Neural Machine Translation
- Best Practices for Data-Efficient Modeling in NLG:How to Train Production-Ready Neural Models with Less Data (COLING2020)
- Unsupervised Pre-training for Natural Language Generation: A Literature Review
- BERTScore: Evaluating Text Generation with BERT (ICLR2020)
- Machine Translation Evaluation with BERT Regressor
- TransQuest: Translation Quality Estimation with Cross-lingual Transformers (COLING2020)
- SumQE: a BERT-based Summary Quality Estimation Model (EMNLP2019)
- MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance (EMNLP2019) [github]
- BERT as a Teacher: Contextual Embeddings for Sequence-Level Reward
- BLEURT: Learning Robust Metrics for Text Generation (ACL2020)
- Masked Language Model Scoring (ACL2020)
- Multi-Task Deep Neural Networks for Natural Language Understanding (ACL2019)
- The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding
- BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning (ICML2019)
- Pre-training Text Representations as Meta Learning
- Unifying Question Answering and Text Classification via Span Extraction
- MATINF: A Jointly Labeled Large-Scale Dataset for Classification, Question Answering and Summarization (ACL2020)
- ERNIE: Enhanced Language Representation with Informative Entities (ACL2019)
- ERNIE: Enhanced Representation through Knowledge Integration
- ERNIE 2.0: A Continual Pre-training Framework for Language Understanding (AAAI2020)
- ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding
- XLNet: Generalized Autoregressive Pretraining for Language Understanding (NeurIPS2019) [github]
- MPNet: Masked and Permuted Pre-training for Language Understanding
- Pre-Training with Whole Word Masking for Chinese BERT
- SpanBERT: Improving Pre-training by Representing and Predicting Spans (TACL2020) [github]
- ConvBERT: Improving BERT with Span-based Dynamic Convolution
- AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization
- CharBERT: Character-aware Pre-trained Language Model (COLING2020) [github]
- MVP-BERT: Redesigning Vocabularies for Chinese BERT and Multi-Vocab Pretraining
- Adversarial Training for Large Neural Language Models
- Train No Evil: Selective Masking for Task-guided Pre-training
- Position Masking for Language Models
- Masking as an Efficient Alternative to Finetuning for Pretrained Language Models (EMNLP2020)
- Variance-reduced Language Pretraining via a Mask Proposal Network
- Neural Mask Generator: Learning to Generate Adaptive Word Maskings for Language Model Adaptation (EMNLP2020)
- Improving Self-supervised Pre-training via a Fully-Explored Masked Language Model
- It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
- Don't Stop Pretraining: Adapt Language Models to Domains and Tasks (ACL2020)
- An Empirical Investigation Towards Efficient Multi-Domain Language Model Pre-training [github]
- To Pretrain or Not to Pretrain: Examining the Benefits of Pretraining on Resource Rich Tasks (ACL2020)
- Revisiting Few-sample BERT Fine-tuning
- Blank Language Models
- Enabling Language Models to Fill in the Blanks (ACL2020)
- Efficient Training of BERT by Progressively Stacking (ICML2019) [github]
- RoBERTa: A Robustly Optimized BERT Pretraining Approach [github]
- On Losses for Modern Language Models (EMNLP2020) [github]
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations (ICLR2020)
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (ICLR2020) [github] [blog]
- Pre-Training Transformers as Energy-Based Cloze Models (EMNLP2020) [github]
- FreeLB: Enhanced Adversarial Training for Language Understanding (ICLR2020)
- KERMIT: Generative Insertion-Based Modeling for Sequences
- CALM: Continuous Adaptive Learning for Language Modeling
- SegaBERT: Pre-training of Segment-aware BERT for Language Understanding
- DisSent: Sentence Representation Learning from Explicit Discourse Relations (ACL2019)
- Pretraining with Contrastive Sentence Objectives Improves Discourse Performance of Language Models (ACL2020)
- SLM: Learning a Discourse Language Representation with Sentence Unshuffling (EMNLP2020)
- StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding (ICLR2020)
- Retrofitting Structure-aware Transformer Language Model for End Tasks (EMNLP2020)
- Syntax-Infused Transformer and BERT models for Machine Translation and Natural Language Understanding
- Do Syntax Trees Help Pre-trained Transformers Extract Information?
- SenseBERT: Driving Some Sense into BERT
- Semantics-aware BERT for Language Understanding (AAAI2020)
- GiBERT: Introducing Linguistic Knowledge into BERT through a Lightweight Gated Injection Method
- K-BERT: Enabling Language Representation with Knowledge Graph
- Knowledge Enhanced Contextual Word Representations (EMNLP2019)
- Knowledge-Aware Language Model Pretraining
- JAKET: Joint Pre-training of Knowledge Graph and Language Understanding
- E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT (EMNLP2020)
- KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation
- Entities as Experts: Sparse Memory Access with Entity Supervision (EMNLP2020)
- Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning (EMNLP2020)
- Contextualized Representations Using Textual Encyclopedic Knowledge
- CoLAKE: Contextualized Language and Knowledge Embedding (COLING2020)
- Coarse-to-Fine Pre-training for Named Entity Recognition (EMNLP2020)
- E.T.: Entity-Transformers. Coreference augmented Neural Language Model for richer mention representations via Entity-Transformer blocks (COLING2020 WS)
- REALM: Retrieval-Augmented Language Model Pre-Training (ICML2020) [github]
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (NeurIPS2020)
- On-The-Fly Information Retrieval Augmentation for Language Models
- Current Limitations of Language Models: What You Need is Retrieval
- Taking Notes on the Fly Helps BERT Pre-training
- Pre-training via Paraphrasing
- SKEP: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis (ACL2020)
- Improving Event Duration Prediction via Time-aware Pre-training (EMNLP2020 Findings)
- Knowledge-Aware Procedural Text Understanding with Multi-Stage Training
- Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring (ICLR2020)
- Rethinking Positional Encoding in Language Pre-training
- Improve Transformer Models with Better Relative Position Embeddings (EMNLP2020 Findings)
- BoostingBERT:Integrating Multi-Class Boosting into BERT for NLP Tasks
- Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks
- Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks (EMNLP2019)
- Parameter-free Sentence Embedding via Orthogonal Basis (EMNLP2019)
- SBERT-WK: A Sentence Embedding Method By Dissecting BERT-based Word Models
- On the Sentence Embeddings from Pre-trained Language Models (EMNLP2020)
- Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks
- BURT: BERT-inspired Universal Representation from Twin Structure
- Universal Text Representation from BERT: An Empirical Study
- Symmetric Regularization based BERT for Pair-wise Semantic Reasoning (SIGIR2020)
- Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Document Matching
- Transfer Fine-Tuning: A BERT Case Study (EMNLP2019)
- Improving Pre-Trained Multilingual Models with Vocabulary Expansion (CoNLL2019)
- Byte Pair Encoding is Suboptimal for Language Model Pretraining (EMNLP2020 Findings)
- An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks (AACL2020)
- BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance (ACL2020)
- A Mixture of h−1 Heads is Better than h Heads (ACL2020)
- SesameBERT: Attention for Anywhere
- Multi-Head Attention: Collaborate Instead of Concatenate
- Deepening Hidden Representations from Pre-trained Language Models
- On the Transformer Growth for Progressive BERT Training
- Improving BERT with Self-Supervised Attention
- Guiding Attention for Self-Supervised Learning with Transformers (EMNLP2020 Findings)
- Improving Disfluency Detection by Self-Training a Self-Attentive Model
- Self-training Improves Pre-training for Natural Language Understanding [github]
- CERT: Contrastive Self-supervised Learning for Language Understanding
- Large Product Key Memory for Pretrained Language Models (EMNLP2020 Findings)
- Contextual BERT: Conditioning the Language Model Using a Global State (COLING2020 WS)
- SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization (ACL2020)
- Efficient Transformers: A Survey
- Adaptive Attention Span in Transformers (ACL2019)
- Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (ACL2019) [github]
- Generating Long Sequences with Sparse Transformers
- Adaptively Sparse Transformers (EMNLP2019)
- Compressive Transformers for Long-Range Sequence Modelling
- The Evolved Transformer (ICML2019)
- Reformer: The Efficient Transformer (ICLR2020) [github]
- GRET: Global Representation Enhanced Transformer (AAAI2020)
- GMAT: Global Memory Augmentation for Transformers
- Memory Transformer
- Transformer on a Diet [github]
- A Tensorized Transformer for Language Modeling (NeurIPS2019)
- DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling (ICLR2020) [github]
- DeLighT: Very Deep and Light-weight Transformer [github]
- Lite Transformer with Long-Short Range Attention [github] (ICLR2020)
- Efficient Content-Based Sparse Attention with Routing Transformers
- BP-Transformer: Modelling Long-Range Context via Binary Partitioning
- Longformer: The Long-Document Transformer [github]
- Big Bird: Transformers for Longer Sequences
- Improving Transformer Models by Reordering their Sublayers (ACL2020)
- Highway Transformer: Self-Gating Enhanced Self-Attentive Networks
- Synthesizer: Rethinking Self-Attention in Transformer Models
- Query-Key Normalization for Transformers (EMNLP2020 Findings)
- Rethinking Attention with Performers
- Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change
- HAT: Hardware-Aware Transformers for Efficient Natural Language Processing (ACL2020) [github]
- Linformer: Self-Attention with Linear Complexity
- Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
- Understanding the Difficulty of Training Transformers (EMNLP2020)
- Towards Fully 8-bit Integer Inference for the Transformer Model (IJCAI2020)
- Extremely Low Bit Transformer Quantization for On-Device Neural Machine Translation
- A Structural Probe for Finding Syntax in Word Representations (NAACL2019)
- When Bert Forgets How To POS: Amnesic Probing of Linguistic Properties and MLM Predictions
- Finding Universal Grammatical Relations in Multilingual BERT (ACL2020)
- Probing Multilingual BERT for Genetic and Typological Signals (COLING2020)
- Linguistic Knowledge and Transferability of Contextual Representations (NAACL2019) [github]
- Probing What Different NLP Tasks Teach Machines about Function Word Comprehension (*SEM2019)
- BERT Rediscovers the Classical NLP Pipeline (ACL2019)
- Probing Neural Network Comprehension of Natural Language Arguments (ACL2019)
- Cracking the Contextual Commonsense Code: Understanding Commonsense Reasoning Aptitude of Deep Contextual Representations (EMNLP2019 WS)
- What do you mean, BERT? Assessing BERT as a Distributional Semantics Model
- Quantity doesn't buy quality syntax with neural language models (EMNLP2019)
- Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction (ICLR2020)
- oLMpics -- On what Language Model Pre-training Captures
- Do Neural Language Models Show Preferences for Syntactic Formalisms? (ACL2020)
- Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT (ACL2020)
- Intermediate-Task Transfer Learning with Pretrained Models for Natural Language Understanding: When and Why Does It Work? (ACL2020)
- Probing Linguistic Systematicity (ACL2020)
- A Matter of Framing: The Impact of Linguistic Formalism on Probing Results
- A Cross-Task Analysis of Text Span Representations (ACL2020 WS)
- When Do You Need Billions of Words of Pretraining Data? [github]
- Language Models as Knowledge Bases? (EMNLP2019) [github]
- BERT is Not a Knowledge Base (Yet): Factual Knowledge vs. Name-Based Reasoning in Unsupervised QA
- How Much Knowledge Can You Pack Into the Parameters of a Language Model? (EMNLP2020)
- Language Models as Knowledge Bases: On Entity Representations, Storage Capacity, and Paraphrased Queries
- X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models (EMNLP2020)
- Do NLP Models Know Numbers? Probing Numeracy in Embeddings (EMNLP2019)
- Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models [github] [website]
- Negated and Misprimed Probes for Pretrained Language Models: Birds Can Talk, But Cannot Fly (ACL2020)
- What Does My QA Model Know? Devising Controlled Probes using Expert Knowledge
- A Pairwise Probe for Understanding BERT Fine-Tuning on Machine Reading Comprehension
- Can BERT Reason? Logically Equivalent Probes for Evaluating the Inference Capabilities of Language Models
- Probing Task-Oriented Dialogue Representation from Language Models (EMNLP2020)
- BERTering RAMS: What and How Much does BERT Already Know About Event Arguments? -- A Study on the RAMS Dataset (EMNLP2020 WS)
- What does BERT learn about the structure of language? (ACL2019)
- Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned (ACL2019) [github]
- Open Sesame: Getting Inside BERT's Linguistic Knowledge (ACL2019 WS)
- Analyzing the Structure of Attention in a Transformer Language Model (ACL2019 WS)
- What Does BERT Look At? An Analysis of BERT's Attention (ACL2019 WS)
- Do Attention Heads in BERT Track Syntactic Dependencies?
- Blackbox meets blackbox: Representational Similarity and Stability Analysis of Neural Language Models and Brains (ACL2019 WS)
- Inducing Syntactic Trees from BERT Representations (ACL2019 WS)
- A Multiscale Visualization of Attention in the Transformer Model (ACL2019 Demo)
- Visualizing and Measuring the Geometry of BERT
- How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings (EMNLP2019)
- Are Sixteen Heads Really Better than One? (NeurIPS2019)
- On the Validity of Self-Attention as Explanation in Transformer Models
- Visualizing and Understanding the Effectiveness of BERT (EMNLP2019)
- Attention Interpretability Across NLP Tasks
- Revealing the Dark Secrets of BERT (EMNLP2019)
- Analyzing Redundancy in Pretrained Transformer Models (EMNLP2020)
- What's so special about BERT's layers? A closer look at the NLP pipeline in monolingual and multilingual models
- Attention Module is Not Only a Weight: Analyzing Transformers with Vector Norms (ACL2020 SRW)
- Quantifying Attention Flow in Transformers
- Telling BERT's full story: from Local Attention to Global Aggregation
- How Far Does BERT Look At:Distance-based Clustering and Analysis of BERT′s Attention
- What Do Position Embeddings Learn? An Empirical Study of Pre-Trained Language Model Positional Encoding (EMNLP2020)
- Investigating BERT's Knowledge of Language: Five Analysis Methods with NPIs (EMNLP2019)
- Rethinking the Value of Transformer Components (COLING2020)
- Investigating Transferability in Pretrained Language Models
- What Happens To BERT Embeddings During Fine-tuning?
- Analyzing Individual Neurons in Pre-trained Language Models (EMNLP2020)
- How fine can fine-tuning be? Learning efficient language models (AISTATS2020)
- The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives (EMNLP2019)
- A Primer in BERTology: What we know about how BERT works (TACL2020)
- Pretrained Language Model Embryology: The Birth of ALBERT (EMNLP2020) [github]
- Investigating Gender Bias in BERT
- Measuring and Reducing Gendered Correlations in Pre-trained Models [website]
- Unmasking Contextual Stereotypes: Measuring and Mitigating BERT's Gender Bias (COLING2020 WS)
- CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models (EMNLP2020)
- BERT Knows Punta Cana is not just beautiful, it's gorgeous: Ranking Scalar Adjectives with Contextualised Representations (EMNLP2020)
- Does Chinese BERT Encode Word Structure? (COLING2020) [github]
- How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations (CIKM2019)
- Whatcha lookin' at? DeepLIFTing BERT's Attention in Question Answering
- What does BERT Learn from Multiple-Choice Reading Comprehension Datasets?
- What do Models Learn from Question Answering Datasets?
- Towards Interpreting BERT for Reading Comprehension Based QA (EMNLP2020)
- Compositional and Lexical Semantics in RoBERTa, BERT and DistilBERT: A Case Study on CoQA (EMNLP2020)
- How does BERT’s attention change when you fine-tune? An analysis methodology and a case study in negation scope (ACL2020)
- Calibration of Pre-trained Transformers
- When BERT Plays the Lottery, All Tickets Are Winning (EMNLP2020)
- The Lottery Ticket Hypothesis for Pre-trained BERT Networks
- exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models [github]
- The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models [github]
- What Does BERT with Vision Look At? (ACL2020)
- Multilingual Constituency Parsing with Self-Attention and Pre-Training (ACL2019)
- Language Model Pretraining (NeurIPS2019) [github]
- 75 Languages, 1 Model: Parsing Universal Dependencies Universally (EMNLP2019) [github]
- Zero-shot Dependency Parsing with Pre-trained Multilingual Sentence Representations (EMNLP2019 WS)
- Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank (EMNLP2020 Findings)
- Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT (EMNLP2019)
- How multilingual is Multilingual BERT? (ACL2019)
- How Language-Neutral is Multilingual BERT?
- Load What You Need: Smaller Versions of Multilingual BERT (EMNLP2020) [github]
- Is Multilingual BERT Fluent in Language Generation?
- Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks (EMNLP2019)
- BERT is Not an Interlingua and the Bias of Tokenization (EMNLP2019 WS)
- Cross-Lingual Ability of Multilingual BERT: An Empirical Study (ICLR2020)
- Multilingual Alignment of Contextual Word Representations (ICLR2020)
- Emerging Cross-lingual Structure in Pretrained Language Models (ACL2020)
- On the Cross-lingual Transferability of Monolingual Representations
- Unsupervised Cross-lingual Representation Learning at Scale (ACL2020)
- FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding
- Cross-lingual Alignment Methods for Multilingual BERT: A Comparative Study (EMNLP2020 Findings)
- Emerging Cross-lingual Structure in Pretrained Language Models
- Can Monolingual Pretrained Models Help Cross-Lingual Classification?
- A Study of Cross-Lingual Ability and Language-specific Information in Multilingual BERT
- Fully Unsupervised Crosslingual Semantic Textual Similarity Metric Based on BERT for Identifying Parallel Data (CoNLL2019)
- What the [MASK]? Making Sense of Language-Specific BERT Models
- XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization (ICML2020)
- XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation
- A Systematic Analysis of Morphological Content in BERT Models for Multiple Languages
- Extending Multilingual BERT to Low-Resource Languages
- Learning Better Universal Representations from Pre-trained Contextualized Language Models
- Universal Dependencies according to BERT: both more specific and more general
- A Call for More Rigor in Unsupervised Cross-lingual Learning (ACL2020)
- Identifying Necessary Elements for BERT's Multilinguality (EMNLP2020)
- MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer
- From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual Transfer with Multilingual Transformers
- Language Representation in Multilingual BERT and its applications to improve Cross-lingual Generalization
- VECO: Variable Encoder-decoder Pre-training for Cross-lingual Understanding and Generation
- On the Language Neutrality of Pre-trained Multilingual Representations
- Are All Languages Created Equal in Multilingual BERT? (ACL2020 WS)
- When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models
- Language-agnostic BERT Sentence Embedding
- WikiBERT models: deep transfer learning for many languages
- Inducing Language-Agnostic Multilingual Representations
- To What Degree Can Language Borders Be Blurred In BERT-based Multilingual Spoken Language Understanding? (COLING2020)
- It's not Greek to mBERT: Inducing Word-Level Translations from Multilingual BERT (EMNLP2020 WS)
- A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios
- Translation Artifacts in Cross-lingual Transfer Learning (EMNLP2020)
- Identifying Cultural Differences through Multi-Lingual Wikipedia
- A Supervised Word Alignment Method based on Cross-Language Span Prediction using Multilingual BERT (EMNLP2020)
- BERT for Monolingual and Cross-Lingual Reverse Dictionary (EMNLP2020 Findings)
- Bilingual Text Extraction as Reading Comprehension
- Evaluating Multilingual BERT for Estonian
- CamemBERT: a Tasty French Language Model (ACL2020)
- On the importance of pre-training data volume for compact language models (EMNLP2020)
- FlauBERT: Unsupervised Language Model Pre-training for French
- Multilingual is not enough: BERT for Finnish
- BERTje: A Dutch BERT Model
- RobBERT: a Dutch RoBERTa-based Language Model
- Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language
- AraBERT: Transformer-based Model for Arabic Language Understanding
- PhoBERT: Pre-trained language models for Vietnamese
- Give your Text Representation Models some Love: the Case for Basque (LREC2020)
- ParsBERT: Transformer-based Model for Persian Language Understanding
- Pre-training Polish Transformer-based Language Models at Scale
- Playing with Words at the National Library of Sweden -- Making a Swedish BERT
- KR-BERT: A Small-Scale Korean-Specific Language Model
- FinEst BERT and CroSloEngual BERT: less is more in multilingual models (TSD2020)
- GREEK-BERT: The Greeks visiting Sesame Street (SETN2020)
- The birth of Romanian BERT (EMNLP2020 Findings)
- German's Next Language Model (COLING2020 Industry Truck)
- EstBERT: A Pretrained Language-Specific BERT for Estonian
- Pretraining and Fine-Tuning Strategies for Sentiment Analysis of Latvian Tweets
- PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data
- Indic-Transformers: An Analysis of Transformer Language Models for Indian Languages (NeurIPS2020 WS)
- BARThez: a Skilled Pretrained French Sequence-to-Sequence Model
- NEZHA: Neural Contextualized Representation for Chinese Language Understanding
- Revisiting Pre-Trained Models for Chinese Natural Language Processing (EMNLP2020 Findings)
- Intrinsic Knowledge Evaluation on Chinese Language Models
- CLUECorpus2020: A Large-scale Chinese Corpus for Pre-training Language Model
- CLUE: A Chinese Language Understanding Evaluation Benchmark
- AnchiBERT: A Pre-Trained Model for Ancient ChineseLanguage Understanding and Generation
- UER: An Open-Source Toolkit for Pre-training Models (EMNLP2019 Demo) [github]
- BioBERT: a pre-trained biomedical language representation model for biomedical text mining
- Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets (ACL2019 WS)
- BERT-based Ranking for Biomedical Entity Normalization
- PubMedQA: A Dataset for Biomedical Research Question Answering (EMNLP2019)
- Pre-trained Language Model for Biomedical Question Answering
- How to Pre-Train Your Model? Comparison of Different Pre-Training Models for Biomedical Question Answering
- On Adversarial Examples for Biomedical NLP Tasks
- An Empirical Study of Multi-Task Learning on BERT for Biomedical Text Mining (ACL2020 WS)
- Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing [github]
- BioMegatron: Larger Biomedical Domain Language Model (EMNLP2020) [website]
- Pretrained Language Models for Biomedical and Clinical Tasks: Understanding and Extending the State-of-the-Art (EMNLP2020 WS)
- A pre-training technique to localize medical BERT and enhance BioBERT [github]
- BERTology Meets Biology: Interpreting Attention in Protein Language Models
- ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission
- Predicting Clinical Diagnosis from Patients Electronic Health Records Using BERT-based Neural Networks (AIME2020)
- Publicly Available Clinical BERT Embeddings (NAACL2019 WS)
- UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus
- MT-Clinical BERT: Scaling Clinical Information Extraction with Multitask Learning
- A clinical specific BERT developed with huge size of Japanese clinical narrative
- Clinical Reading Comprehension: A Thorough Analysis of the emrQA Dataset (ACL2020) [github]
- Knowledge-Empowered Representation Learning for Chinese Medical Reading Comprehension: Task, Model and Resources
- Detecting Adverse Drug Reactions from Twitter through Domain-Specific Preprocessing and BERT Ensembling
- Progress Notes Classification and Keyword Extraction using Attention-based Deep Learning Models with BERT
- BERT-XML: Large Scale Automated ICD Coding Using BERT Pretraining
- Prediction of ICD Codes with Clinical BERT Embeddings and Text Augmentation with Label Balancing using MIMIC-III
- Infusing Disease Knowledge into BERT for Health Question Answering, Medical Inference and Disease Name Recognition (EMNLP2020)
- CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT (EMNLP2020)
- Students Need More Attention: BERT-based Attention Model for Small Data with Application to Automatic Patient Message Triage (MLHC2020)
- SciBERT: Pretrained Contextualized Embeddings for Scientific Text [github]
- PatentBERT: Patent Classification with Fine-Tuning a pre-trained BERT Model
- FinBERT: A Pretrained Language Model for Financial Communications
- LEGAL-BERT: The Muppets straight out of Law School (EMNLP2020 Findings)
- E-BERT: A Phrase and Product Knowledge Enhanced Language Model for E-commerce
- Code and Named Entity Recognition in StackOverflow (ACL2020) [github]
- BERTweet: A pre-trained language model for English Tweets (EMNLP2020 Demo)
- TweetBERT: A Pretrained Language Representation Model for Twitter Text Analysis
- Cost-effective Selection of Pretraining Data: A Case Study of Pretraining BERT on Social Media (EMNLP2020 Findings)
- VideoBERT: A Joint Model for Video and Language Representation Learning (ICCV2019)
- ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks (NeurIPS2019)
- VisualBERT: A Simple and Performant Baseline for Vision and Language
- Selfie: Self-supervised Pretraining for Image Embedding
- ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data
- Contrastive Bidirectional Transformer for Temporal Representation Learning
- M-BERT: Injecting Multimodal Information in the BERT Structure
- LXMERT: Learning Cross-Modality Encoder Representations from Transformers (EMNLP2019)
- Weakly-supervised VisualBERT: Pre-training without Parallel Images and Captions
- X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers (EMNLP2020)
- Adaptive Transformers for Learning Multimodal Representations (ACL2020SRW) [github]
- Fusion of Detected Objects in Text for Visual Question Answering (EMNLP2019)
- LambdaNetworks: Modeling long-range Interactions without Attention [github]
- BERT representations for Video Question Answering (WACV2020)
- What is More Likely to Happen Next? Video-and-Language Future Event Prediction (EMNLP2020)
- Unified Vision-Language Pre-Training for Image Captioning and VQA (AAAI2020) [github]
- VisualCOMET: Reasoning about the Dynamic Context of a Still Image (ECCV2020) [website]
- Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline
- VD-BERT: A Unified Vision and Dialog Transformer with BERT (EMNLP2020)
- VL-BERT: Pre-training of Generic Visual-Linguistic Representations (ICLR2020)
- Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
- UNITER: Learning UNiversal Image-TExt Representations
- Supervised Multimodal Bitransformers for Classifying Images and Text
- InterBERT: Vision-and-Language Interaction for Multi-modal Pretraining
- Multimodal Pretraining Unmasked: Unifying the Vision and Language BERTs
- ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
- Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision (EMNLP2020)
- Cycle Text-To-Image GAN with BERT
- Weak Supervision helps Emergence of Word-Object Alignment and improves Vision-Language Tasks
- Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
- DeVLBert: Learning Deconfounded Visio-Linguistic Representations (ACMMM2020)
- A Recurrent Vision-and-Language BERT for Navigation
- BERT Can See Out of the Box: On the Cross-modal Transferability of Text Representations
- Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
- Understanding Advertisements with BERT (ACL2020)
- BERTERS: Multimodal Representation Learning for Expert Recommendation System with Transformer
- FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval (SIGIR2020)
- LayoutLM: Pre-training of Text and Layout for Document Image Understanding (KDD2020) [github]
- BERT for Large-scale Video Segment Classification with Test-time Augmentation (ICCV2019WS)
- lamBERT: Language and Action Learning Using Multimodal BERT
- Generative Pretraining from Pixels [github] [website]
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
- A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer [website]
- Multimodal Pretraining for Dense Video Captioning (AACL-IJCNLP2020)
- SpeechBERT: Cross-Modal Pre-trained Language Model for End-to-end Spoken Question Answering
- An Audio-enriched BERT-based Framework for Spoken Multiple-choice Question Answering
- vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
- Effectiveness of self-supervised pre-training for speech recognition
- wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
- Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition
- Understanding Semantics from Speech Through Pre-training
- Speech-XLNet: Unsupervised Acoustic Model Pretraining For Self-Attention Networks
- Learning Speech Representations from Raw Audio by Joint Audiovisual Self-Supervision (ICML2020 WS)
- Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining
- ST-BERT: Cross-modal Language Model Pre-training For End-to-end Spoken Language Understanding
- End-to-end spoken language understanding using transformer networks and self-supervised pre-trained features
- Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding (Interspeech2020)
- Unsupervised Cross-lingual Representation Learning for Speech Recognition
- Curriculum Pre-training for End-to-End Speech Translation (ACL2020)
- MAM: Masked Acoustic Modeling for End-to-End Speech-to-Text Translation
- Towards Transfer Learning for End-to-End Speech Synthesis from Deep Pre-Trained Language Models
- To BERT or Not To BERT: Comparing Speech and Language-based Approaches for Alzheimer's Disease Detection (Interspeech2020)
- BERT for Joint Multichannel Speech Dereverberation with Spatial-aware Tasks
- Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
- Patient Knowledge Distillation for BERT Model Compression (EMNLP2019)
- Small and Practical BERT Models for Sequence Labeling (EMNLP2019)
- TinyBERT: Distilling BERT for Natural Language Understanding [github]
- DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (NeurIPS2019 WS) [github]
- Knowledge Distillation from Internal Representations (AAAI2020)
- PoWER-BERT: Accelerating BERT inference for Classification Tasks
- WaLDORf: Wasteless Language-model Distillation On Reading-comprehension
- Extreme Language Model Compression with Optimal Subwords and Shared Projections
- BERT-of-Theseus: Compressing BERT by Progressive Module Replacing (EMNLP2020)
- Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning (ACL2020 SRW)
- MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
- Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
- Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
- Well-Read Students Learn Better: On the Importance of Pre-training Compact Models
- MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices (ACL2020)
- Distilling Knowledge from Pre-trained Language Models via Text Smoothing
- DynaBERT: Dynamic BERT with Adaptive Width and Depth
- Reducing Transformer Depth on Demand with Structured Dropout
- DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference (ACL2020)
- BERT Loses Patience: Fast and Robust Inference with Early Exit [github] [github]
- FastBERT: a Self-distilling BERT with Adaptive Inference Time (ACL2020)
- Towards Non-task-specific Distillation of BERT via Sentence Representation Approximation
- LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression (COLING2020)
- Poor Man's BERT: Smaller and Faster Transformer Models
- schuBERT: Optimizing Elements of BERT (ACL2020)
- BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover's Distance (EMNLP2020) [github]
- TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER (ACL2020)
- XtremeDistil: Multi-stage Distillation for Massive Multilingual Models (ACL2020)
- Structured Pruning of Large Language Models
- Movement Pruning: Adaptive Sparsity by Fine-Tuning [github]
- Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured Pruning (EMNLP2020 Findings)
- FastFormers: Highly Efficient Transformer Models for Natural Language Understanding (EMNLP2020 WS)
- Distilling BERT into Simple Neural Networks with Unlabeled Transfer Data
- AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search
- SqueezeBERT: What can computer vision teach NLP about efficient neural networks?
- Optimizing Transformers with Approximate Computing for Faster, Smaller and more Accurate NLP Models
- An Approximation Algorithm for Optimal Subarchitecture Extraction [github]
- Structured Pruning of a BERT-based Question Answering Model
- DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering (ACL2020)
- Distilling Knowledge Learned in BERT for Text Generation (ACL2020)
- Distilling the Knowledge of BERT for Sequence-to-Sequence ASR (Interspeech2020)
- Pre-trained Summarization Distillation
- Understanding BERT Rankers Under Distillation (ICTIR2020)
- Simplified TinyBERT: Knowledge Distillation for Document Retrieval
- Exploring the Limits of Simple Learners in Knowledge Distillation for Document Classification with DocBERT (ACL2020 WS)
- TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing (ACL2020 Demo)
- TopicBERT for Energy Efficient Document Classification (EMNLP2020 Findings)
- Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
- Q8BERT: Quantized 8Bit BERT (NeurIPS2019 WS)
- Training with Quantization Noise for Extreme Model Compression
- TernaryBERT: Distillation-aware Ultra-low Bit BERT (EMNLP2020)
- Language Models are Unsupervised Multitask Learners [github]
- Language Models are Few-Shot Learners [github]
- Language Models as Few-Shot Learner for Task-Oriented Dialogue Systems
- Generative Language Modeling for Automated Theorem Proving
- Do you have the right scissors? Tailoring Pre-trained Language Models via Monte-Carlo Methods (ACL2020)
- jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models [github]
- Cloze-driven Pretraining of Self-attention Networks
- Learning and Evaluating General Linguistic Intelligence
- To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks (ACL2019 WS)
- Learning to Speak and Act in a Fantasy Text Adventure Game (EMNLP2019)
- A Two-Stage Masked LM Method for Term Set Expansion (ACL2020)
- Cold-start Active Learning through Self-supervised Language Modeling (EMNLP2020)
- Conditional BERT Contextual Augmentation
- Data Augmentation using Pre-trained Transformer Models
- Mixup-Transfomer: Dynamic Data Augmentation for NLP Tasks (COLING2020)
- Unsupervised Text Style Transfer with Padded Masked Language Models (EMNLP2020)
- Assessing Discourse Relations in Language Generation from Pre-trained Language Models
- CxGBERT: BERT meets Construction Grammar (COLING2020)
- Large Batch Optimization for Deep Learning: Training BERT in 76 minutes (ICLR2020)
- Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes
- IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization
- Multi-node Bert-pretraining: Cost-efficient Approach
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
- ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
- Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models (ICLR2020)
- A Mutual Information Maximization Perspective of Language Representation Learning (ICLR2020)
- Is BERT Really Robust? Natural Language Attack on Text Classification and Entailment (AAAI2020)
- Weight Poisoning Attacks on Pre-trained Models (ACL2020)
- BERT-ATTACK: Adversarial Attack Against BERT Using BERT (EMNLP2020)
- Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT
- Robust Encodings: A Framework for Combating Adversarial Typos (ACL2020)
- On the Robustness of Language Encoders against Grammatical Errors (ACL2020)
- Pretrained Transformers Improve Out-of-Distribution Robustness (ACL2020) [github]
- "You are grounded!": Latent Name Artifacts in Pre-trained Language Models (EMNLP2020)
- The Right Tool for the Job: Matching Model and Instance Complexities (ACL2020) [github]
- Unsupervised Domain Clusters in Pretrained Language Models (ACL2020)
- Thieves on Sesame Street! Model Extraction of BERT-based APIs (ICLR2020)
- Graph-Bert: Only Attention is Needed for Learning Graph Representations
- Graph-Aware Transformer: Is Attention All Graphs Need?
- CodeBERT: A Pre-Trained Model for Programming and Natural Languages (EMNLP2020 Findings)
- Unsupervised Translation of Programming Languages
- Item-based Collaborative Filtering with BERT (ACL2020 WS)
- RecoBERT: A Catalog Language Model for Text-Based Recommendations
- Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
- Extending Machine Language Models toward Human-Level Language Understanding
- Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data (ACL2020)
- Glyce: Glyph-vectors for Chinese Character Representations
- Back to the Future -- Sequential Alignment of Text Representations
- Improving Cuneiform Language Identification with BERT (NAACL2019 WS)
- Generating Derivational Morphology with BERT
- BERT has a Moral Compass: Improvements of ethical and moral values of machines
- SMILES-BERT: Large Scale Unsupervised Pre-Training for Molecular Property Prediction (ACM-BCB2019)
- BERT Learns (and Teaches) Chemistry
- ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction
- Sketch-BERT: Learning Sketch Bidirectional Encoder Representation from Transformers by Self-supervised Learning of Sketch Gestalt (CVPR2020)
- The Chess Transformer: Mastering Play using Generative Language Models
- The Go Transformer: Natural Language Modeling for Game Play
- On the comparability of Pre-trained Language Models
- Transformers: State-of-the-art Natural Language Processing
- The Cost of Training NLP Models: A Concise Overview