A PKU course project based on the "SemEval-2018 task 7 Semantic Relation Extraction and Classification in Scientific Papers" competition.
The Subtask 1
- 1.1 Relation classification on clean data
- 1.2 Relation classification on noisy data
- SemEval-2018 Task 7 Subtask 1 - Relation Classification
Relation instances are to be classified into one of the following relations: USAGE, RESULT, MODEL, PART_WHOLE, TOPIC, COMPARISON.
-
USAGE is an asymmetrical relation. It holds between two entities X and Y, where, for example:
X is used for Y X is a method used to perform a task Y X is a tool used to process data Y X is a type of information/representation of information used by/in a system Y)
-
RESULT is an asymmetrical relation. It holds between two entities X and Y, where, for example:
X gives as a result Y (where Y is typically a measure of evaluation) X yields Y (where Y is an improvement or decrease) a feature of a system or a phenomenon X yields Y (where Y is an improvement or decrease)
-
MODEL-FEATURE is an asymmetrical relation. It holds between two entities X and Y, where, for example:
X is a feature/an observed characteristic of Y X is a model of Y X is a tag(set) used to represent Y
-
PART_WHOLE is an asymmetrical relation. It holds between two entities X and Y, where, for example:
X is a part, a component of Y X is found in Y Y is built from/composed of X
-
TOPIC is an asymmetrical relation. It holds between two entities X and Y, where, for example:
X deals with topic Y X (author, paper) puts forward Y (an idea, an approach)
-
COMPARE is a symmetrical relation. It holds between two entities X and Y, where:
X is compared to Y (e.g. two systems, two feature sets or two results)
The counts for each relation:
USAGE
: 483TOPIC
: 18RESULT
: 72PART_WHOLE
: 234MODEL-FEATURE
: 326COMPARE
: 95
In test set:
USAGE
: 175TOPIC
: 3RESULT
: 20PART_WHOLE
: 70MODEL-FEATURE
: 66COMPARE
: 21
For each subtask, training and test data include abstracts of papers from the ACL Anthology Corpus with pre-annotated entities that represent concepts. Two types of tasks are proposed:
identifying pairs of entities that are instances of any of the six semantic relations (extraction task),
classifying instances into one of the six semantic relation types (classification task).
The subtask is decomposed into two scenarios according to the data used: classification on clean data and classification on noisy data. The task is identical for both scenarios: given a relation instance consisting of two entities in context, predict the semantic relation between the entities. A relation instance is identified by the unique ID of the two entities.
For the subtask 1, instances with directionality are provided in both the training data and the test data and they are not to be modified or completed in the test data; the relation label is provided in the training data and has to be predicted for the test data.
The classification task is performed on data where entities are manually annotated, following the ACL RD-TEC 2.0 guidelines. Entities represent domain concepts specific to NLP, while high-level scientific terms (e.g. "hypothesis", "experiment") are not annotated.
Example (annotated text):
<abstract>
The key features of the system include: (i) Robust efficient <entity id="H01-1041.8">parsing</entity> of <entity id="H01-1041.9">Korean</entity> (a <entity id="H01-1041.10">verb final language</entity> with <entity id="H01-1041.11">overt case markers</entity> , relatively <entity id="H01-1041.12">free word order</entity> , and frequent omissions of <entity id="H01-1041.13">arguments</entity> ). (ii) High quality <entity id="H01-1041.14">translation</entity> via <entity id="H01-1041.15">word sense disambiguation</entity> and accurate <entity id="H01-1041.16">word order generation</entity> of the <entity id="H01-1041.17">target language</entity> .(iii) <entity id="H01-1041.18">Rapid system development</entity> and porting to new <entity id="H01-1041.19">domains</entity> via <entity id="H01-1041.20">knowledge-based automated acquisition of grammars</entity> .
</abstract>
Relation instances in the annotated text (provided for test data):
(H01-1041.8,H01-1041.9)
(H01-1041.10,H01-1041.11,REVERSE)
(H01-1041.14,H01-1041.15,REVERSE)
Submission format with predictions:
USAGE(H01-1041.8, H01-1041.9)
MODEL-FEATURE(H01-1041.10, H01-1041.11,REVERSE)
USAGE(H01-1041.14, H01-1041.15,REVERSE)
The task is identical to 1.1., but the entities are annotated automatically and contain noise. The annotation comes from the ACL-RelAcS corpus and it is based on a combination of automatic terminology extraction and external ontologies1. Entities are therefore terms specific to the given corpus, and include high-level terms (e.g. "algorithm", "paper", "method"). They are not always full NPs and they may include noise (verbs, irrelevant words). Relations were manually annotated in the training data and in the gold standard, between automatically annotated entities. Do not try to correct entity annotation in any way in your submission.
Example (annotated text):
<abstract>
This <entity id="L08-1203.8">paper</entity> introduces a new <entity id="L08-1203.9">architecture</entity> that aims at combining molecular <entity id="L08-1203.10">biology</entity> <entity id="L08-1203.11">data</entity> with <entity id="L08-1203.12">information</entity> automatically <entity id="L08-1203.13">extracted</entity> from relevant <entity id="L08-1203.14">scientific literature</entity>
</abstract>
Relation instances in the annotated text (provided for test data):
(L08-1203.8,L08-1203.9)
(L08-1203.12,L08-1203.14)
Submission format for predictions:
TOPIC(L08-1203.8,L08-1203.9)
PART_WHOLE(L08-1203.12,L08-1203.14)
For subtasks 1.1 and 1.2 which are usual classification tasks, the following class-based evaluation metrics are used:
- for every distinct class: precision, recall and F1-measure (β=1)
- global evaluation, for the set of classes:
- macro-average of the F1-measures of every distinct class
- micro-average of the F1-measures of every distinct class
Approach and Results
# Prepared the dataset and embedding
python3 text_processing.py
# Training and result
python3 ml_model.py
- Load dataset
- Load the relations into a hash map of
(entity 1, entity 2) -> relation
- Load the text into a Paper class object
- paper id
- title
- abstract (plain text)
- entity id and text
- Load the relations into a hash map of
- Sentence embedding => Get the training feature with label
- Training model
- SVM
- Logistic Regression
- Test the result
- Using the SemEval 2018 task 7 scorer
- Calculate by ourself (5-fold cross validation)
Combine two entity with some relation words to form a sentence. And we use the sentence to classify whether it belong to this class.
Example:
USAGE(H01-1001.9,H01-1001.10)
<entity id="H01-1001.9">oral communication</entity>
<entity id="H01-1001.10">indices</entity>
USAGE(oral communication, indices) =>
oral communication used by indices
oral communication used for indices
oral communication applied to indices
oral communication performed on indices
We multiply each words' embedding and then normalize it. (or the value of result will be relevant to the sentence length)
But the result is not quite ideal.
TODO
Calculate three parts of embedding. entity 1
, relationship
, and entity 2
.
And two of them, doing dot product. Then we'll get 2 numbers. And use them to form the result.
Using LightRel as Baseline - Forked Project
- LightRel trained the dictionary for SemEval 2010 Task 8 and reused it for SemEval 2010 Task 7
LightRel used two different corpus from Citation Network Dataset.
Used only the abstract part of ACM-Citation-network V9
and DBLP-Citation-network V5
Because the abstract part is followed by #! --- Abstract
thus use regular expression to extract it.
time grep -p "^#\!" acm.txt | sed 's/^#\!//' > acm_abstracts.txt
And trained the embedding using word2vec
time word2vec -train abstracts-dblp-semeval2018.txt -output abstracts-dblp-semeval2018.wcs.txt -size 300 -min-count 5 -binary 0
- 300-dimension vector
- leaving out tokens occurring fewer than 5 times
-
index a unique vocabulary immediately following the entity1 and immediately preceding entity two.
['information_retrieval_techniques','use','a','histogram','of','keywords'] => 'use' and 'of'
-
word-shape feature: a unique vector representing certain character-level features found in a word
- any character is capitalized
- a comma is present
- the first character is capitalized and the word is the first in the relation
- representing the beginning of a sentence
- the first character is lower-case
- there is an underscore present (representing a multi-word entity)
- if quotes are present in the token
These feature is continue using from SemEval 2010 Task 8
Addition library:
- liblinear
- ast - Abstract Syntax Trees
- unicodedata
- svn2github/word2vec
Embedding corpus:
The model used by LightRel is not complicated. The performance is depend more on features (embedding).
These options can be on/off in
parameters.py
one-feature:
other parameter: L2-regularized logistic regression, LR=1, epoch=0.1, cost=0.05
- fire_word: 25.94%
- fire_shape: 19.19%
- fire_embedding: 49.84% (the most important feature)
- fire_cluster: 27.41%
- e1: 37.29%
- e2: 31.92%
- before_e2: 9.39%
- null: 9.39%
multiple-feature:
- embedding + e1 + e2: 49.90%
The original embedding was created by word2vec
We use different corpus, ACM-Citation-network V9
and DBLP-Citation-network V10
Extract the abstract from ACM using:
time grep -p "^#\!" acm.txt | sed -e '/^#\!First Page of the Article/d' | sed 's/^#\!//' >> abstracts.txt
And extract the abstract from DBLP with jq using:
for file in `ls dblp-ref`
do
time jq '.abstract' `dblp-ref/$file` | sed -e '/null/d' | sed 's/\"//g' >> abstracts.txt
done
or
for file in dblp-ref/*
do
time jq '.abstract' $file | sed -e '/null/d' | sed 's/\"//g' >> abstracts.txt
done
We put all the content in same file called abstracts.txt
.
# Done these things in one command
cd LightRel/embedding
bash corpusPreprocessing.sh
Try word2vec:
time word2vec -train abstracts.txt -output abstracts.wcs.txt -size 300 -min-count 5 -binary 0
Try BERT:
using BERT-Large, Uncased model (24-layer, 1024-hidden, 16-heads, 340M parameters)
cd LightRel/embedding
bash trainBERT.sh
Try fastText:
cd LightRel/embedding
bash trainFastText.sh
Performance:
LibLinear (Version 2.30) LR: flods=5, cost=0.05, epoch=0.1
cd LightRel
bash lightRel.sh 5
model = liblinear LR
, feature = embedding + cluster + one-hot
Embedding | Data | Test F1 | Test P | Test R | Train F1 (%) | Train P | Train R | USAGE | TOPIC | RESULT | PART_WHOLE | MODEL-FEATURE | COMPARE |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
word2vec | preTrain dblp v5 | 44.61 | 45.2 | 44.05 | 50.79 | 55.11 | 47.37 | 73.32 | 0.00 | 56.87 | 47.4 | 57.52 | 59.82 |
word2vec | ACM v9 + dblp v10 | 47.24 | 48.04 | 46.46 | 49.45 | 52.69 | 46.83 | 72.31 | 0.00 | 58.50 | 48.11 | 56.60 | 53.67 |
word2vec | dblp v5 | 46.27 | 47.76 | 44.87 | 50.08 | 54.32 | 46.73 | 72.42 | 0.00 | 53.95 | 48.40 | 56.84 | 58.79 |
word2vec | dblp v10 | 47.28 | 47.28 | 47.27 | 50.28 | 53.53 | 47.62 | 73.46 | 0.00 | 59.01 | 49.54 | 57.34 | 54.90 |
fastText | dblp v5 | 49.12 | 63.54 | 40.03 | 49.3 | 61.64 | 41.2 | 72.08 | 0.00 | 47.31 | 46.27 | 59.88 | 39.12 |
fastText | acm v9 + dblp v10 | 50.21 | 64.11 | 41.27 | 50.73 | 62.45 | 42.79 | 72.35 | 0.00 | 48.74 | 48.97 | 60.93 | 45.76 |
fastText | dblp v10 | 50.75 | 63.85 | 42.12 | 49.72 | 61.29 | 41.93 | 72.33 | 0.00 | 49.84 | 46.00 | 60.80 | 41.14 |
bert | acm v9 + dblp v10 | 26.02 | 26.02 | 26.02 | 32.82 | 37.23 | 29.74 | 67.18 | 0.00 | 0.00 | 41.39 | 55.09 | 6.92 |
We found that using fasText embedding on dblp v10 has the best performance, so we do the further experience on other models
cd LightRel
python3 loadFeatureAndTrain.py
scikit-learn v0.20.3
embedding = fastText + dblp v10
, feature = embedding + cluster + one-hot
Model | Test F1 | micro_f1 | Test P | Test R | Train F1 (%) | Train P | Train R | USAGE | TOPIC | RESULT | PART_WHOLE | MODEL-FEATURE | COMPARE |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
LinearSVC | 51.45 | - | 51.47 | 51.43 | 50.24 | 53.37 | 47.77 | 70.99 | 0.00 | 55.84 | 47.09 | 58.10 | 61.47 |
Sklearn LR | 52.77 | - | 56.32 | 49.64 | 53.09 | 58.03 | 49.06 | 74.70 | 0.00 | 60.06 | 50.35 | 59.79 | 62.96 |
DecisionTree | 36.65 | - | 35.37 | 38.03 | 39.26 | 38.29 | 40.54 | 61.21 | 6.67 | 34.17 | 38.69 | 48.24 | 38.38 |
textCNN | 51.20 | 65.5 | 54.0 | 48.6 | 51.2 | 54.0 | 48.6 | 78.0 | 0.00 | 61.0 | 54.0 | 54.0 | 56.0 |
model = 'liblinear LR', embedding = fastText + dblp v10
, basic_feature = embedding
feature | Test F1 | Test P | Test R | Train F1 (%) | Train P | Train R | USAGE | TOPIC | RESULT | PART_WHOLE | MODEL-FEATURE | COMPARE |
---|---|---|---|---|---|---|---|---|---|---|---|---|
embedding | 52.01 | 64.58 | 43.54 | 49.97 | 55.46 | 46.0 | 71.64 | 0.00 | 54.53 | 46.78 | 57.38 | 54.55 |
+one-hot | 51.43 | 64.54 | 42.7 | 49.25 | 61.23 | 41.28 | 71.64 | 0.00 | 51.21 | 44.85 | 59.25 | 39.17 |
+shape | 50.61 | 64.87 | 41.4 | 49.67 | 61.97 | 41.56 | 71.62 | 0.00 | 51.68 | 45.60 | 59.84 | 39.9 |
+cluster | 51.55 | 64.69 | 42.8 | 49.31 | 61.10 | 41.40 | 71.93 | 0.00 | 50.11 | 45.63 | 59.65 | 39.10 |
+e1,e2 | 49.93 | 62.91 | 41.39 | 49.97 | 61.07 | 42.36 | 72.45 | 0.00 | 48.99 | 48.44 | 60.93 | 43.3 |
model = 'sklearn LR', embedding = fastText + dblp v10
, basic_feature = embedding
feature | Test F1 | Test P | Test R | Train F1 (%) | Train P | Train R | USAGE | TOPIC | RESULT | PART_WHOLE | MODEL-FEATURE | COMPARE |
---|---|---|---|---|---|---|---|---|---|---|---|---|
embedding | 50.37 | 52.73 | 48.22 | 52.12 | 56.85 | 48.37 | 74.09 | 0.00 | 60.48 | 49.81 | 60.18 | 58.12 |
+one-hot | 51.1 | 53.07 | 49.26 | 52.37 | 57.39 | 48.37 | 74.60 | 0.00 | 60.82 | 50.38 | 60.53 | 57.0 |
+shape | 51.29 | 53.38 | 49.35 | 51.81 | 56.69 | 47.94 | 74.16 | 0.00 | 59.38 | 49.43 | 60.34 | 57.42 |
+cluster | 51.9 | 54.28 | 49.79 | 51.97 | 56.57 | 48.2 | 73.51 | 0.00 | 61.21 | 49.25 | 60.42 | 57.32 |
+e1,e2 | 50.92 | 53.34 | 48.71 | 52.99 | 57.52 | 49.29 | 74.35 | 0.00 | 60.25 | 50.98 | 60.13 | 62.4 |
basic feature = 'embedding + e1 + e2', embedding = fastText + dblp v10
feature | Test F1 | Test P | Test R | Train F1 (%) | Train P | Train R | USAGE | TOPIC | RESULT | PART_WHOLE | MODEL-FEATURE | COMPARE |
---|---|---|---|---|---|---|---|---|---|---|---|---|
basic | 50.92 | 53.34 | 48.71 | 52.99 | 57.52 | 49.29 | 74.35 | 0.00 | 60.25 | 50.98 | 60.13 | 62.4 |
+one-hot | 49.6 | 52.99 | 46.65 | 52.96 | 57.94 | 48.91 | 74.71 | 0.00 | 60.06 | 50.35 | 59.80 | 62.07 |
+shape | 51.66 | 53.88 | 49.61 | 52.50 | 57.63 | 48.40 | 73.56 | 0.00 | 60.06 | 49.79 | 59.78 | 61.0 |
+cluster | 49.39 | 52.03 | 47.01 | 52.60 | 57.18 | 48.86 | 73.82 | 0.00 | 60.79 | 50.23 | 59.97 | 60.72 |
model=sklearn LR
, embedding = fastText + dblp v10
, feature = 'embedding + e1 + e2 + one-hot + cluster'
feature | Model | Test macro_F1 | micro_f1 | Test P | Test R | Train F1 (%) | Train P | Train R | USAGE | TOPIC | RESULT | PART_WHOLE | MODEL-FEATURE | COMPARE |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
basic | LR | 50.92 | 65.17 | 53.34 | 48.71 | 52.99 | 57.52 | 49.29 | 74.35 | 0.00 | 60.25 | 50.98 | 60.13 | 62.4 |
+1.2Top | LR | 50.20 | 64.41 | 54.14 | 47.87 | 52.62 | 49.79 | 50.05 | 60.97 | 9.20 | 46.61 | 46.44 | 51.44 | 55.04 |
+Top | LR | 50.5 | 68.12 | 53.96 | 47.61 | 46.22 | 50.50 | 42.7 | 57.96 | 13.95 | 46.85 | 39.60 | 45.55 | 47.64 |
+Train | LR | 53.73 | 71.07 | 56.41 | 51.2 | 67.20 | 71.61 | 63.31 | 75.36 | 85.30 | 67.84 | 53.65 | 56.30 | 60.86 |
+Test | LR | 52.82 | 68.26 | 56.40 | 49.68 | 66.73 | 71.52 | 62.58 | 75.49 | 80.82 | 64.41 | 56.24 | 58.24 | 59.49 |
+All | LR | 52.82 | 70.22 | 56.40 | 49.69 | 66.42 | 71.62 | 62.21 | 74.39 | 78.53 | 62.73 | 56.18 | 56.92 | 61.99 |
+Train | TextCNN | 67.741 | 72.033 | 65.775 | 69.829 | 66.626 | 70.990 | 62.767 | 78.44 | 81.67 | 61.86 | 59.50 | 53.45 | 60.00 |
model=textCNN
, data=1.1+1.2Train
, embedding=fastText + dblp v10
, feature = embedding + cluster + one-hot
pad | Test macro_F1 | micro_f1 | Test P | Test R | Train F1 (%) | Train P | Train R | USAGE | TOPIC | RESULT | PART_WHOLE | MODEL-FEATURE | COMPARE |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
front | 59.742 | 66.947 | 60.849 | 58.674 | 66.984 | 67.840 | 66.149 | 77.21 | 87.50 | 67.72 | 55.81 | 60.09 | 47.62 |
behind entity1 | 56.791 | 66.666 | 56.295 | 57.296 | 66.358 | 72.544 | 61.144 | 73.88 | 85.71 | 53.06 | 55.07 | 57.14 | 63.64 |
before entity2 | 67.741 | 72.033 | 65.775 | 69.829 | 66.626 | 70.990 | 62.767 | 78.44 | 81.67 | 61.86 | 59.50 | 53.45 | 60.00 |
after | 57.696 | 65.633 | 57.820 | 57.57356.30 | 69.918 | 76.169 | 64.616 | 74.35 | 87.10 | 69.44 | 62.10 | 56.03 | 62.96 |
data=1.2Train
, embedding = fastText + dblp v10
, feature = embedding + cluster + one-hot
Model | Test macro_f1 | Test micro_f1 | Test P | Test R | Train F1 (%) | Train P | Train R | USAGE | TOPIC | RESULT | PART_WHOLE | MODEL-FEATURE | COMPARE |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sklearn LR | 75.06 | 76.04 | 81.61 | 69.47 | 66.05 | 72.33 | 60.86 | 79.48 | 90.61 | 59.81 | 64.00 | 50.52 | 35.29 |
liblinear LR | 69.15 | 81.41 | 70.42 | 67.93 | 61.36 | 68.23 | 56.03 | 77.91 | 92.02 | 55.28 | 64.72 | 50.84 | 8.00 |
LinearSVC | 70.29 | 75.56 | 71.49 | 69.12 | 63.28 | 66.18 | 60.66 | 78.61 | 90.53 | 58.32 | 61.08 | 49.94 | 33.59 |
textCNN | 77.35 | 80.53 | 84.62 | 71.24 | 68.15 | 75.15 | 62.35 | 81.23 | 95.10 | 74.07 | 72.90 | 71.43 | 0.00 |
model=textCNN
, data=1.2+1.1Train
, embedding=fastText + dblp v10
, feature = embedding + cluster + one-hot
pad | Test macro_F1 | micro_f1 | Test P | Test R | Train F1 (%) | Train P | Train R | USAGE | TOPIC | RESULT | PART_WHOLE | MODEL-FEATURE | COMPARE |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
front | 71.221 | 77.291 | 74.906 | 67.882 | 64.908 | 69.339 | 61.010 | 76.95 | 81.90 | 68.97 | 57.87 | 57.14 | 37.04 |
behind entity1 | 72.768 | 80.790 | 74.359 | 71.244 | 64.224 | 70.335 | 59.089 | 78.48 | 82.64 | 53.52 | 52.22 | 65.17 | 44.90 |
before entity2 | 73.032 | 81.012 | 74.235 | 71.868 | 66.158 | 71.011 | 61.925 | 75.69 | 83.72 | 53.33 | 58.94 | 58.46 | 59.02 |
after | 72.477 | 80.225 | 72.889 | 72.069 | 66.853 | 71.084 | 63.098 | 78.71 | 81.74 | 60.98 | 55.34 | 59.42 | 61.02 |
First, because there are only 18 training sample for TOPIC in training set. We currently get 0 on F1-score.
We list the relation of TOPIC in training set as the following.
from util import loadRelation
from constant import rela2id
relations = loadRelation('data/1.1.relations.txt')
topics_entities = [k for k, v in a.items() if v == rela2id['TOPIC']]
# [('P01-1009.1', 'P01-1009.3'),
# ('N03-1026.14', 'N03-1026.15'),
# ('N04-1024.18', 'N04-1024.19'),
# ('H01-1055.7', 'H01-1055.9'),
# ('E06-1004.1', 'E06-1004.2'),
# ('C80-1073.4', 'C80-1073.5'),
# ('A92-1023.4', 'A92-1023.5'),
# ('A92-1023.7', 'A92-1023.8'),
# ('P06-1053.1', 'P06-1053.2'),
# ('N06-1007.7', 'N06-1007.8'),
# ('E89-1016.1', 'E89-1016.3'),
# ('E93-1013.1', 'E93-1013.2'),
# ('E95-1036.11', 'E95-1036.12'),
# ('H93-1076.3', 'H93-1076.4'),
# ('A97-1028.11', 'A97-1028.12'),
# ('P99-1058.11', 'P99-1058.12'),
# ('X96-1041.6', 'X96-1041.7'),
# ('J87-3001.16', 'J87-3001.17')]
- ('P01-1009.1', 'P01-1009.3')
- Title: Alternative Phrases and Natural Language Information Retrieval
formal analysis
for a large class of words calledalternative markers
- ('N03-1026.14', 'N03-1026.15')
- Title: Statistical Sentence Condensation using Ambiguity Packing and Stochastic Disambiguation Methods for Lexical-Functional Grammar
- An
experimental evaluation
ofsummarization
- ('N04-1024.18', 'N04-1024.19')
- Title: Evaluating Multiple Aspects of Coherence in Student Essays
Intra-sentential quality
is evaluated withrule-based heuristics
- ('H01-1055.7', 'H01-1055.9')
- Title: Natural Language Generation in Dialog Systems
system response
to users has been extensively studied by thenatural language generation community
TextCNN, CNN -> BN -> RELU
$ python3 text_cnn.py
$ python3 text_cnn_train.py
python3 random_test.py
<<< RELATION EXTRACTION EVALUATION >>>
Precision = 355/355 = 100.00%
Recall = 355/355 = 100.00%
F1 = 100.00%
<<< The official score for the extraction scenario is F1 = 100.00% >>>
<<< RELATION CLASSIFICATION EVALUATION >>>:
Number of instances in submission: 355
Number of instances in submission missing from gold standard: 0
Number of instances in gold standard: 355
Number of instances in gold standard missing from submission: 0
Coverage (predictions for a correctly extracted instance with correct directionality) = 355/355 = 100.00%
Results for the individual relations:
COMPARE : P = 7/ 71 = 9.86% R = 7/ 21 = 33.33% F1 = 15.22%
MODEL-FEATURE : P = 13/ 57 = 22.81% R = 13/ 66 = 19.70% F1 = 21.14%
PART_WHOLE : P = 5/ 54 = 9.26% R = 5/ 70 = 7.14% F1 = 8.06%
RESULT : P = 6/ 60 = 10.00% R = 6/ 20 = 30.00% F1 = 15.00%
TOPIC : P = 0/ 57 = 0.00% R = 0/ 3 = 0.00% F1 = 0.00%
USAGE : P = 26/ 56 = 46.43% R = 26/ 175 = 14.86% F1 = 22.51%
Micro-averaged result :
P = 57/ 355 = 16.06% R = 57/ 355 = 16.06% F1 = 16.06%
Macro-averaged result :
P = 16.39% R = 17.51% F1 = 16.93%
<<< The official score for the classification scenario is macro-averaged F1 = 16.93% >>>
import subprocess
- How do I parse XML in Python?
- How to extract the text between some anchor tags?
- Remove a tag using BeautifulSoup but keep its contents
- google-research/bert issues - How to get the word embedding after pre-training?
- with variable name
bert/embeddings/word_embeddings
- with variable name
- imgarylai/bert-embedding
- Stackoverflow - How do I find the variable names and values that are saved in a checkpoint?
model_path = '.'
# Get latest checkpoint
latest_ckpt = tf.train.latest_checkpoint(model_path)
# just print
from tensorflow.python.tools.inspect_checkpoint import print_tensors_in_checkpoint_file
print_tensors_in_checkpoint_file(latest_ckpt, all_tensors=False, tensor_name='bert/embeddings/word_embeddings')
# get the embedding tensor
tensor_name = 'bert/embeddings/word_embeddings'
file_name = latest_ckpt
from tensorflow.python import pywrap_tensorflow
reader = pywrap_tensorflow.NewCheckpointReader(file_name)
embedding = reader.get_tensor(tensor_name) # np.array (default dimension 768)
# The length should be the same as vocab.txt
len(embedding) # 30522 (default)
# combine into embedding file (vocabulary: tensor)
with open('vocab.txt', 'r') as vocab:
words = vocab.readlines()
test = [' '.join([str(combined) for combined in [word.strip(), *embedding[i]]]) for i, word in enumerate(words)]
import module
from imp import reload
reload(module) # module updated
- feature dimension problem in 5-fold evaluation on version 1.2 dataset
- Competition
- Data