This script takes as input the Epitran and WikiPron midpoint F1 and F2 files, the per-utt-mcd scores files, reading_info.csv (inventory), and returns counts of tokens, families, and languages presented in the paper, correlation tables, and the correlation scatterplot. It optionally saves text files of the midpoint F1 and F2 following outlier exclusion that serves as input to the Python dispersion analysis.
This script takes as input the Epitran and WikiPron sibilant info.csv and sibilant.csv files, the per-utt-mcd scores files, reading_info.csv (inventory), and returns counts of tokens, families, and languages presented in the paper, the correlation of mean mid-frequency peak /s/ and /z/, and the correlation scatterplot.
The python scripts in this folder have a number of dependencies detailed in environment.yml
file. To install dependencies using conda run:
$ conda env create -f environment.yml
And to activate the environment run:
$ conda activate wild
This script takes as input the folder where epiwiki dispersion files are (with file names ending in formants_mid_fin.csv
) and it outputs a preprocessed tsv file with per language--vowel dispersion entropies. Run it with command:
$ python extract_info.py --src-path <src-path> --tgt-file <tgt-file>
In this command, <src-path> is the data source path, while <tgt-file> is the extracted info output filename. For example:
$ python extract_info.py --src-path data/vowels/midpoint_f1_f2/dispersion_input_epiwiki/ --tgt-file data/vowels/midpoint_f1_f2/preprocessed.tsv
analyse_dispersion.py: analysis script for dispersion entropy vs number of vowel categories correlations
This script takes as input the preprocessed file generated by extract_info.py
and prints in the terminal Pearson and Spearman correlations (with their respective p-values). Run it with command:
$ python analyse_dispersion.py --info-file <info-file> --formants-type <formants-type>
where <formants-type> is either erb
or hz
. And <info-file> is the extract_info.py
output. For example:
$ python analyse_dispersion.py --info-file data/vowels/midpoint_f1_f2/preprocessed.tsv --formants-type erb