This is an accompanying repository to our paper Average is not Enough: Caveats of Multilingual Evaluation
You can use this code in Google Colab. Demo is available here
First build the docker image:
docker build . -t multilingual_evaluation
Then you can run it and the notebooks will work
docker run -p 8888:8888 -v ${PWD}:/labs -it multilingual_evaluation
Examples of visualizations are in the n_figures.ipynb
notebook