Data and code for the experiments in
- Dominik Schlechtweg, Stefanie Eckmann, Enrico Santus, Sabine Schulte im Walde and Daniel Hole. 2017. German in Flux: Detecting Metaphoric Change via Word Entropy. In Proceedings of CoNLL 2017. Vancouver, Canada. slides
Find the test set, the annotation data and the results here.
In ./dataset/
we provide the reduced test set version (as described in the paper) which is also transformed to a suitable input format for the measure scripts.
The measure code is based on the scripts in
- Vered Shwartz, Enrico Santus, and Dominik Schlechtweg. 2017. Hypernyms under siege: Linguistically-motivated artillery for hypernymy detection. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Valencia, Spain. repository
Wherever part of the code is copyrighted this is indicated in the respective file.
The scripts should be run directly from the main directory. If you wish to do otherwise, you may have to change the path you add to the path attribute in sys.path.append('./modules/')
in the scripts.
We recommend you to run the scripts with the Python Anaconda distribution (Python 2.7). You will have to install some additional packages such as: docopt, gensim, i.a. Those that aren't available from the Anaconda installer can be installed via EasyInstall.
In order to reproduce the results described in the paper proceed in the following way:
- Download the DTA corpus (DTA-Kernkorpus und Ergänzungstexte, TCF-Version vom 11. Mai 2016)
- Obtain standard cooccurrence matrices for relevant time periods from corpus files with
./dsm/create_diachronic_cooc_files.py
, and an exemplar cooccurrence matrice for the whole corpus period with./dsm/create_exemplar_cooc.py
and the test set./dataset/testset_metaphoric_change_reduced_transformed.txt
- Transform matrices to pickle format with
./dsm/apply_ppmi_plmi.py
- Get word frequency ranks for relevant time periods from corpus files with
./dsm/get_freqs.py
- Calculate unscored results for non-normalized measures H and H_2 from standard cooccurrence matrices with
./measures/H.py
and./measures/H_2.py
(and test set) - Normalize word frequency ranks with
./normalization/Freq_n.py
, and make unscored results from normalized frequency ranks with./measures/Freq.py
- Get word entropy ranks for relevant time periods from corpus files with
./measures/H_rank.py
- Calculate unscored results for normalized word entropy by OLS from word frequency and entropy ranks with
./normalization/H_OLS.py
- Calculate unscored results for normalized word entropy by MON from exemplar cooccurrence matrice with
./normalization/H_MON.py
(right now, the script can only take a global number of contexts for all target words in the test set as input. This makes it tedious to calculate the measure in the case that you want to calculate it with the maximum number of contexts n possible for each target word, bounded by the smaller word frequency of the target word in one of the matrices, as you will have to specify n individually for each target word.) - Calculate predicted ranks from unscored results for each measure and relevant pairs of time periods with
./evaluation/score_results.py
- Calculate Spearman's rank correlation coefficient of gold rank and predicted ranks with
./evaluation/rank_correlation.py
Please do not hesitate to write an email in case of any questions.