This code describes the expperiments performed in th paper "On Measuring Context Utilization in Document-Level MT Systems"
The script in run.sh
includes commands to:
- train a context-aware bilingual transformer model using the concatenation or multi-encoder setup
- translate test data with options for the amount and type (correct/ random) of context to use
- Get BLEU and COMET scores for the translations
- Get contrastive accuracy on ContraPro data
The script attribute.sh
includes commands to get attribution scores for antecedent, current and context tokens on the ContraPro and SCAT data. the demo.yaml
file can be used to configure the options: (multi-encoder model, checkpoint directory, data directory)
This code is adapted from the following repositories:
2- stopes
3- fairseq