Camel-depeval

Compare two CoNLL-X files or directories, to obtain the tokenization F-score and POS tag accuracy, as well as the LAS, UAS, and label scores.

Since comparison usually occurs between gold and parsed files, the two files/directories will be differentiated using gold and parsed keywords. In other words, you do not need to have gold and parsed files to compare; any two will do.

The tree alignment part of the code uses ced_word_alignment.

Note: the evaluator is also CoNLL-U compatible.

Methodology

Two files or directories are passed to the evaluator. If two directories are passed, the directories must have matching file names.
The files are read, and the trees every two files are compared.
Align trees using ced_word_alignment
- involves inserting null alignment tokens
The evaluation scores are then calulated
- tokenization f-score is calculated on all aligned tokens, while the remaining metrics are calulated after removing insertions (null alignment tokens added to the gold tree)

Assumptions

Since ced_word_alignment is used, the second and third assumptions are the same.

No words are added to either the parsed or gold files.
No changes to the word order.
Text is in the same script and encoding.

align_trees.py aligns trees using the ced_word_alignment algorithm
class_conllx used to read CoNLL-X files
classes dataclasses used throughout the code
conllx_counts gets different statistics after comparing 2 CoNLL-X files
conllx_scores calculates scores given counts
evaluate_conllx_driver main script
handle_args simplifies use of the argparse library
requirements.txt necessary dependencies needed to run the scripts.
ced_word_alignment/ the ced alignment library
README.md this document.

Requirements

Python 3.8 and above.

To use, you need to first install the necessary dependencies by running the following command:

pip install -r requirements.txt

Usage

usage: evaluate_conllx_driver.py [-h] [-g] [-p] [-gd] [-pd]

This script takes 2 CoNLL-X files or 2 directories of CoNLL-X files and evaluates the scores.

required arguments:
  -g , --gold          the gold CoNLL-X file
  -p , --parsed        the parsed CoNLL-X file

or:
  -gd , --gold_dir     the gold directory containing CoNLL-X files
  -pd , --parsed_dir   the parsed directory containing CoNLL-X files

Examples

The sentences used are taken from CamelTB_1001_introduction_1.conllx and CamelTB_1001_night_1_1.conllx (data can be obtained from The Camel Treebank.

Sample 1:

The toknization is the same, and so the F_score is 100%, and the insertion/deletion counts are both 0.

python src/main.py -g data/samples_gold/sample_1.conllx -p data/samples_parsed/sample_1.conllx


tokenization_f_score	100.0
tokenization_precision	100.0
tokenization_recall	100.0
word_accuracy	100.0
pos	81.579
uas	55.263
label	65.789
las	44.737
pp_uas_score	0
pp_label_score	0
pp_las_score	0

Sample 2:

python src/main.py -g data/samples_gold/sample_2.conllx -p data/samples_parsed/sample_2.conllx


tokenization_f_score	90.385
tokenization_precision	90.385
tokenization_recall	90.385
word_accuracy	97.222
pos	86.538
uas	65.385
label	75.0
las	57.692
pp_uas_score	0.0
pp_label_score	0.0
pp_las_score	0.0

Sample normalization:

Using the arguments x (punctuation), n (number), and a (alef, yeh, and ta marbuta), the evaluation will ignore differences in tokenization. When using the arguments, the following comparisons will be equal:

1 and ١ , and ، ي and ى

Without normalization

python src/main.py -g data/samples_gold/sample_4_norm.conllx -p data/samples_parsed/sample_4_norm.conllx


tokenization_f_score	80.0
tokenization_precision	80.0
tokenization_recall	80.0
word_accuracy	75.0
pos	80.0
uas	80.0
label	80.0
las	80.0
pp_uas_score	50.0
pp_label_score	50.0
pp_las_score	50.0

With normalization of punctuation and numbers (you can also add a to make the arugment -xna)

python src/main.py -g data/samples_gold/sample_4_norm.conllx -p data/samples_parsed/sample_4_norm.conllx -xn


tokenization_f_score	100.0
tokenization_precision	100.0
tokenization_recall	100.0
word_accuracy	100.0
pos	100.0
uas	100.0
label	100.0
las	100.0
pp_uas_score	100.0
pp_label_score	100.0
pp_las_score	100.0

Run evaluation on a folder

python src/main.py --gold_dir=data/samples_gold --parsed_dir=data/samples_parsed


tokenization_f_score	tokenization_precision	tokenization_recall	word_accuracy	pos	uas	label	las	pp_uas_score	pp_label_score
sample_4_norm	80.0	80.0	80.0	75.0	80.0	80.0	80.0	80.0	50.0
sample_2	90.385	90.385	90.385	97.222	86.538	65.385	75.0	57.692	0.0
sample_1	100.0	100.0	100.0	100.0	81.579	55.263	65.789	44.737	0.0
sample_3	80.0	80.0	80.0	75.0	100.0	100.0	100.0	100.0	100.0

License

conllx_evaluator is available under the MIT license. See the LICENSE file for more info.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Camel-depeval

Methodology

Assumptions

Contents

Requirements

Usage

Examples

Sample 1:

Sample 2:

Sample normalization:

Without normalization

With normalization of punctuation and numbers (you can also add a to make the arugment -xna)

Run evaluation on a folder

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
data		data
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

CAMeL-Lab/conllx_evaluation

Folders and files

Latest commit

History

Repository files navigation

Camel-depeval

Methodology

Assumptions

Contents

Requirements

Usage

Examples

Sample 1:

Sample 2:

Sample normalization:

Without normalization

With normalization of punctuation and numbers (you can also add a to make the arugment -xna)

Run evaluation on a folder

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages