Skip to content

1605EightGrade

Petr Baudis edited this page May 13, 2016 · 5 revisions

1605 ai2-8grade HypEv Experiments

All these experiments are done with the new BV_EP100 vocabulary mode as considered in 1605BigVocab.

We have three variants: r8c for ck12 memory snippet sources, r8e for enwiki memory snippet sources, and r8 which has the two merged together.

Baselines

(acc is AbcdAccuracy)

r8c:

Model trn Acc val Acc val MRR tst Acc tst MRR settings
avg 0.290301 0.290850 0.638889 0.293686 0.592443 (defaults)
±0.033995 ±0.024967 ±0.019934 ±0.011219 ±0.011900
DAN 0.245902 0.303922 0.408565 0.287812 0.427867 inp_e_dropout=0 inp_w_dropout=1/3 deep=2 pact='relu'
±0.013140 ±0.030866 ±0.057874 ±0.015715 ±0.046143
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
rnn 0.722678 0.424837 0.769676 0.390602 0.680556 (defaults)
±0.156078 ±0.049935 ±0.041505 ±0.033515 ±0.028432
cnn 0.300546 0.271242 0.618056 0.284141 0.590576 (defaults)
±0.040626 ±0.030094 ±0.028330 ±0.021643 ±0.011880
rnncnn 0.353825 0.323529 0.668981 0.298091 0.608498 (defaults)
±0.092234 ±0.044053 ±0.043229 ±0.021297 ±0.019228
attn1511 0.477459 0.339869 0.687114 0.326725 0.630003 (defaults)
±0.193444 ±0.038801 ±0.029131 ±0.037975 ±0.030858

r8e:

Model trn Acc val Acc val MRR tst Acc tst MRR settings
avg 0.280330 0.348958 0.636616 0.266049 0.547683 (defaults)
±0.044841 ±0.037472 ±0.028759 ±0.018288 ±0.009624
DAN 0.244994 0.296875 0.534848 0.266049 0.513158 inp_e_dropout=0 inp_w_dropout=1/3 deep=2 pact='relu'
±0.027805 ±0.025047 ±0.082815 ±0.008811 ±0.056038
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
rnn 0.463486 0.361979 0.642424 0.295062 0.575405 (defaults)
±0.193759 ±0.029051 ±0.018036 ±0.025256 ±0.018475
cnn 0.219670 0.299479 0.596212 0.261111 0.548752 (defaults)
±0.014743 ±0.033359 ±0.022407 ±0.013417 ±0.009095
rnncnn 0.316254 0.335938 0.618182 0.258025 0.547402 (defaults)
±0.068402 ±0.024596 ±0.010901 ±0.014658 ±0.018273
attn1511 0.397527 0.390625 0.664141 0.236420 0.532276 (defaults)
±0.069729 ±0.042338 ±0.027718 ±0.028422 ±0.022669

r8:

Model trn Acc val Acc val MRR tst Acc tst MRR settings
avg 0.261484 0.315104 0.594303 0.275309 0.537929 (defaults)
±0.040622 ±0.029051 ±0.018859 ±0.027908 ±0.019492
DAN 0.290342 0.302083 0.575893 0.289506 0.542025 inp_e_dropout=0 inp_w_dropout=1/3 deep=2 pact='relu'
±0.017316 ±0.024444 ±0.016973 ±0.010625 ±0.009295
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
rnn 0.392815 0.341146 0.611529 0.300617 0.558611 (defaults)
±0.165470 ±0.025782 ±0.035079 ±0.014786 ±0.010929
cnn 0.213781 0.328125 0.568729 0.267284 0.515060 (defaults)
±0.009753 ±0.051853 ±0.039688 ±0.034358 ±0.066781
rnncnn 0.285630 0.351562 0.620571 0.273457 0.546830 (defaults)
±0.041758 ±0.045155 ±0.037842 ±0.019228 ±0.011815
attn1511 0.476443 0.359375 0.619522 0.293210 0.551151 (defaults)
±0.187898 ±0.035423 ±0.027051 ±0.044679 ±0.024191

It seems that r8c is the best dataset to use - r8e or ensemble of both seems no good.

Model Tuning

l2reg=1e-4

6x R_r8_2avgBV_EP100_L1e-4 - 0.463542 (95% [0.434107, 0.492976]):

11288548.arien.ics.muni.cz.R_r8_2avgBV_EP100_L1e-4 etc.
[0.468750, 0.468750, 0.468750, 0.468750, 0.500000, 0.406250, ]

6x R_r8_2danBV_EP100_L1e-4 - 0.460938 (95% [0.429898, 0.491977]):

11288549.arien.ics.muni.cz.R_r8_2danBV_EP100_L1e-4 etc.
[0.453125, 0.500000, 0.468750, 0.484375, 0.453125, 0.406250, ]

6x R_r8_2rnnBV_EP100_L1e-4 - 0.354167 (95% [0.326296, 0.382037]):

11288550.arien.ics.muni.cz.R_r8_2rnnBV_EP100_L1e-4 etc.
[0.312500, 0.328125, 0.359375, 0.390625, 0.359375, 0.375000, ]

5x R_r8_2cnnBV_EP100_L1e-4 - 0.400000 (95% [0.304481, 0.495519]):

11288551.arien.ics.muni.cz.R_r8_2cnnBV_EP100_L1e-4 etc.
[0.406250, 0.359375, 0.406250, 0.296875, 0.531250, ]

6x R_r8_2rnncnnBV_EP100_L1e-4 - 0.395833 (95% [0.319117, 0.472550]):

11288552.arien.ics.muni.cz.R_r8_2rnncnnBV_EP100_L1e-4 etc.
[0.484375, 0.406250, 0.437500, 0.421875, 0.375000, 0.250000, ]

6x R_r8_2a51BV_EP100_L1e-4 - 0.361979 (95% [0.324808, 0.399151]):

11288553.arien.ics.muni.cz.R_r8_2a51BV_EP100_L1e-4 etc.
[0.390625, 0.312500, 0.406250, 0.328125, 0.390625, 0.343750, ]

Seems like a good idea!

CNN checks

6x R_r8_2cnnBV_EP100_L1e-4_c121212 - 0.403646 (95% [0.356866, 0.450426]):

11288556.arien.ics.muni.cz.R_r8_2cnnBV_EP100_L1e-4_c121212 etc.
[0.406250, 0.312500, 0.453125, 0.437500, 0.406250, 0.406250, ]
Clone this wiki locally