Skip to content

Measuring context utilization of document-level machine translation systems

Notifications You must be signed in to change notification settings

Wafaa014/context-utilization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This code describes the expperiments performed in th paper "On Measuring Context Utilization in Document-Level MT Systems"

Perturbation Analysis

The script in run.sh includes commands to:

  • train a context-aware bilingual transformer model using the concatenation or multi-encoder setup
  • translate test data with options for the amount and type (correct/ random) of context to use
  • Get BLEU and COMET scores for the translations
  • Get contrastive accuracy on ContraPro data

Attribution analysis

The script attribute.sh includes commands to get attribution scores for antecedent, current and context tokens on the ContraPro and SCAT data. the demo.yaml file can be used to configure the options: (multi-encoder model, checkpoint directory, data directory)

References

This code is adapted from the following repositories:

1- contextual-mt

2- stopes

3- fairseq

About

Measuring context utilization of document-level machine translation systems

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published