-
Notifications
You must be signed in to change notification settings - Fork 205
1605BigVocab
Changing the concept of vocabulary. Rather than building an embedding matrix from all words in the training set, build an embedding matrix just from the top 100 tokens of the training set, and substitute non-trainable GloVe embeddings for all other words (even words not in training set). See also
https://github.com/brmson/dataset-sts/issues/20
This was mainly motivated by an observation that argus hypev works much better when using (errorneously) a vocabulary that was built on a different split than the current training set.
Before:
Model | trn QAcc | val QAcc | val QF1 | tst QAcc | tst QF1 | settings |
---|---|---|---|---|---|---|
avg | 0.931244 | 0.797530 | 0.728479 | 0.731408 | 0.649600 | (defaults) |
±0.012570 | ±0.006695 | ±0.012416 | ±0.007907 | ±0.013410 | ||
DAN | 0.949085 | 0.827096 | 0.750504 | 0.742484 | 0.666239 |
inp_e_dropout=0 inp_w_dropout=1/3 deep=2 pact='relu' l2reg=1e-5
|
±0.013475 | ±0.015297 | ±0.028354 | ±0.008980 | ±0.018475 | ||
-------------------------- | ---------- | ---------- | ---------- | ---------- | ----------- | ---------- |
rnn | 0.901008 | 0.854416 | 0.782354 | 0.798259 | 0.742293 | (defaults) |
±0.018453 | ±0.009075 | ±0.015912 | ±0.011856 | ±0.018040 | ||
cnn | 0.902398 | 0.857410 | 0.791902 | 0.796677 | 0.741328 | (defaults) |
±0.019215 | ±0.005197 | ±0.009707 | ±0.010855 | ±0.019413 | ||
rnncnn | 0.915025 | 0.852171 | 0.782774 | 0.779668 | 0.708510 | (defaults) |
±0.023084 | ±0.009620 | ±0.016334 | ±0.014759 | ±0.022262 | ||
attn1511 | 0.853626 | 0.842066 | 0.772648 | 0.812500 | 0.770903 | sdim=2 |
±0.010105 | ±0.006757 | ±0.011771 | ±0.008588 | ±0.017540 |
After:
Model | trn QAcc | val QAcc | val QF1 | tst QAcc | tst QF1 | settings |
---|---|---|---|---|---|---|
avg | 0.626815 | 0.670659 | nan | 0.621308 | nan | (defaults) |
±0.024750 | ±0.020524 | ±nan | ±0.026126 | ±nan | ||
DAN | 0.913809 | 0.848303 | 0.787701 | 0.799578 | 0.754505 |
inp_e_dropout=0 inp_w_dropout=1/3 deep=2 pact='relu' l2reg=1e-5
|
±0.020632 | ±0.012912 | ±0.028049 | ±0.019803 | ±0.029174 | ||
-------------------------- | ---------- | ---------- | ---------- | ---------- | ----------- | ---------- |
rnn | 0.930491 | 0.859281 | 0.793455 | 0.806962 | 0.763644 | (defaults) |
±0.044004 | ±0.007907 | ±0.009231 | ±0.020562 | ±0.030651 | ||
cnn | 0.920297 | 0.862275 | 0.801795 | 0.819620 | 0.763181 | (defaults) |
±0.030030 | ±0.017017 | ±0.033116 | ±0.024479 | ±0.061008 | ||
rnncnn | 0.922768 | 0.861277 | 0.804602 | 0.812236 | 0.765567 | (defaults) |
±0.040065 | ±0.009881 | ±0.017815 | ±0.014686 | ±0.025850 | ||
attn1511 | 0.869632 | 0.841317 | 0.787862 | 0.812236 | 0.777503 | (defaults) |
±0.011013 | ±0.009426 | ±0.015477 | ±0.009129 | ±0.022301 |
There is a slight improvement, though it doesn't match what we have observed with that original vocabulary change.
Pruning size:
6x R_rg_2a51BV_EP100_mask - 0.836327 (95% [0.827690, 0.844964]):
6x R_rg_2a51BV_EP1000_mask - 0.818363 (95% [0.799982, 0.836744]):
11290398.arien.ics.muni.cz.R_rg_2a51BV_EP1000_mask etc.
[0.838323, 0.802395, 0.844311, 0.814371, 0.796407, 0.814371, ]
6x R_rg_2a51BV_EP20_mask - 0.836327 (95% [0.833365, 0.839289]):
11290400.arien.ics.muni.cz.R_rg_2a51BV_EP20_mask etc.
[0.838323, 0.838323, 0.838323, 0.838323, 0.832335, 0.832335, ]
No effect.
Other experiments done with BV_EP100 on hypev are documented in 1605EightGrade.
The popular sanity check:
Baseline R_ss_2rnncnn val 0.705950 ±0.005099.
16x R_ss_2rnncnnBV_EP100 - 0.703722 (95% [0.699193, 0.708251]):
11297565.arien.ics.muni.cz.R_ss_2rnncnnBV_EP100 etc.
[0.714254, 0.701655, 0.705811, 0.696601, 0.694212, 0.695169, 0.701795, 0.706860, 0.703687, 0.711265, 0.698890, 0.694599, 0.718736, 0.716835, 0.689257, 0.709924, ]