Skip to content

1602Stats

Petr Baudis edited this page Mar 5, 2016 · 6 revisions

1602 Statistical Stability Experiments

We have noticed that results on some of the tasks are remarkably unstable and investigating that.

Ubuntu dataset

We do not see a very large variance with the Ubuntu dataset!

(Still enough to have statistical considerations enter comparisons between similar-performing systems, though.)

very small model:

python tools/ubuntu_train.py rnn data/anssel/ubuntu/v2-vocab.pickle data/anssel/ubuntu/v2-trainset.pickle data/anssel/ubuntu/v2-valset.pickle "pact='tanh'" sdim=1/6 pdim=1/6 ptscorer=B.dot_ptscorer dropout=1/2
Epoch 17/32
200064/200000 [==============================] - 1348s - loss: 0.1473                                                         val mrr 0.716158
Val MRR: 0.760466
Val 2-R@1: 0.894683
Val 10-R@1: 0.637628  10-R@2: 0.775256  10-R@5: 0.941053
rnn--14befea563038897
Val MRR: 0.739525
Val 2-R@1: 0.876943
Val 10-R@1: 0.611708  10-R@2: 0.749387  10-R@5: 0.924489
rnn--15045270c51250bf
Epoch 17/32
200064/200000 [==============================] - 1324s - loss: 0.1491                                                         val mrr 0.704187--
Val MRR: 0.750804
Val 2-R@1: 0.887117
Val 10-R@1: 0.626074  10-R@2: 0.763344  10-R@5: 0.931084
rnn--48e4d70df2db9111
Val MRR: 0.753133
Val 2-R@1: 0.888241
Val 10-R@1: 0.628067  10-R@2: 0.766360  10-R@5: 0.935992
rnn-619a7067ff9c2f1f
Val MRR: 0.760466
Val 2-R@1: 0.894683
Val 10-R@1: 0.637628  10-R@2: 0.775256  10-R@5: 0.941053

very small model + padding experiment (80):

python tools/ubuntu_train2.py rnn data/anssel/ubuntu/v2-vocab.pickle data/anssel/ubuntu/v2-trainset.pickle data/anssel/ubuntu/v2-valset.pickle "pact='tanh'" sdim=1/6 pdim=1/6 ptscorer=B.dot_ptscorer dropout=0
Epoch 11/32
200064/200000 [==============================] - 767s - loss: 0.2440                                                          val mrr 0.729863
data/anssel/ubuntu/v2-valset.pickle MRR: 0.748693
data/anssel/ubuntu/v2-valset.pickle 2-R@1: 0.885123
data/anssel/ubuntu/v2-valset.pickle 10-R@1: 0.622342  10-R@2: 0.762321  10-R@5: 0.930930
rnn-13309de338ed5ca9
Val MRR: 0.749029
Val 2-R@1: 0.888855
Val 10-R@1: 0.620961  10-R@2: 0.764417  10-R@5: 0.935583
rnn--503416baa4bf7609
Val MRR: 0.753204
Val 2-R@1: 0.889008
Val 10-R@1: 0.628885  10-R@2: 0.766667  10-R@5: 0.933947
rnn--58f30ee70695ae87
Epoch 17/32
200064/200000 [==============================] - 763s - loss: 0.1375                                                          val mrr 0.710547
Val MRR: 0.752327
Val 2-R@1: 0.890593
Val 10-R@1: 0.626329  10-R@2: 0.765337  10-R@5: 0.936094
rnn-3b38f6cc1e91ec9e
TODO re-eval (0.748693)
rnn-13309de338ed5ca9
Val MRR: 0.749029
Val 2-R@1: 0.888855
Val 10-R@1: 0.620961  10-R@2: 0.764417  10-R@5: 0.935583
rnn--59309a454c87d012
Val MRR: 0.750123
Val 2-R@1: 0.887065
Val 10-R@1: 0.623466  10-R@2: 0.762474  10-R@5: 0.934254
rnn-10c244d2f57cafd4
Val MRR: 0.746547--
Val 2-R@1: 0.885583
Val 10-R@1: 0.619939  10-R@2: 0.758640  10-R@5: 0.929294
rnn-2df03f10a8cf5003
Val MRR: 0.756253
Val 2-R@1: 0.891258
Val 10-R@1: 0.631544  10-R@2: 0.769888  10-R@5: 0.939264
rnn--503416baa4bf7609
Val MRR: 0.753204
Val 2-R@1: 0.889008
Val 10-R@1: 0.628885  10-R@2: 0.766667  10-R@5: 0.933947
rnn-785a86a0677c3731
Val MRR: 0.752015
Val 2-R@1: 0.892280
Val 10-R@1: 0.626483  10-R@2: 0.765644  10-R@5: 0.933640
rnn--4644a47ffc98cec5
Val MRR: 0.753015
Val 2-R@1: 0.887474
Val 10-R@1: 0.629294  10-R@2: 0.764417  10-R@5: 0.934049
rnn--7ea9ca0ecca48708
Val MRR: 0.752862
Val 2-R@1: 0.887986
Val 10-R@1: 0.629550  10-R@2: 0.763037  10-R@5: 0.933742
rnn-9bab77e3444e76e
Val MRR: 0.747036
Val 2-R@1: 0.886043
Val 10-R@1: 0.619530  10-R@2: 0.760020  10-R@5: 0.932311
rnn--5566ac149c1df702
Val MRR: 0.746756
Val 2-R@1: 0.881391
Val 10-R@1: 0.621472  10-R@2: 0.756391  10-R@5: 0.928425
rnn-5ca553289a85ec18
data/anssel/ubuntu/v2-valset.pickle MRR: 0.755227
data/anssel/ubuntu/v2-valset.pickle 2-R@1: 0.891973
data/anssel/ubuntu/v2-valset.pickle 10-R@1: 0.631288  10-R@2: 0.768405  10-R@5: 0.937014
rnn-23d33f7c865857d2
data/anssel/ubuntu/v2-valset.pickle MRR: 0.755279
data/anssel/ubuntu/v2-valset.pickle 2-R@1: 0.889519
data/anssel/ubuntu/v2-valset.pickle 10-R@1: 0.630010  10-R@2: 0.769836  10-R@5: 0.938753

large model with spad=80:

sdim=2 pdim=1 "pact='tanh'" ptscorer=B.dot_ptscorer

Epoch 17/32
200064/200000 [==============================] - 2448s - loss: 0.1755                                                         val mrr 0.738215
Predict&Eval (best epoch)
Val MRR: 0.766637
Val 2-R@1: 0.898160
Val 10-R@1: 0.644121  10-R@2: 0.786196  10-R@5: 0.943405

Epoch 4/32
200064/200000 [==============================] - 2423s - loss: 0.4320                                                         val mrr 0.761111

Epoch 9/32
200064/200000 [==============================] - 2389s - loss: 0.3566                                                         val mrr 0.773375

Epoch 10/32
200064/200000 [==============================] - 2467s - loss: 0.3566                                                         val mrr 0.778220

Epoch 4/32
200064/200000 [==============================] - 2436s - loss: 0.4363                                                         val mrr 0.760970

Epoch 4/32
200064/200000 [==============================] - 2421s - loss: 0.4337                                                         val mrr 0.758342

wang dataset

Old Keras experiments with dropout partially applied, but it shouldn't influence relative differences between these results.

The initial attn1511 implementation, 64 runs:

mrrv, mapv:
[0.87948717948717947, 0.85974358974358978, 0.85564102564102562, 0.86384615384615382, 0.87820512820512808, 0.86717948717948712, 0.86538461538461542, 0.88256410256410256, 0.87846153846153852, 0.88179487179487182, 0.85954415954415953, 0.88051282051282054, 0.88717948717948725, 0.87948717948717958, 0.87435897435897425, 0.85783882783882781, 0.87948717948717958, 0.85256410256410253, 0.88, 0.88705128205128203, 0.88205128205128203, 0.86692307692307691, 0.87282051282051287, 0.87358974358974351, 0.89205128205128204, 0.87230769230769234, 0.88256410256410256, 0.87358974358974351, 0.86040293040293037, 0.87102564102564106, 0.8684615384615384, 0.86564102564102563, 0.88717948717948714, 0.875, 0.87743589743589745, 0.87358974358974351, 0.88860805860805858, 0.86769230769230776, 0.87435897435897425, 0.88076923076923075, 0.88897435897435895, 0.86846153846153851, 0.88512820512820511, 0.86527472527472526, 0.87923076923076937, 0.87743589743589745, 0.86589743589743584, 0.8684615384615384, 0.88230769230769235, 0.86923076923076925, 0.86076923076923084, 0.86333333333333329, 0.85999999999999999, 0.90897435897435896, 0.87410256410256415, 0.88769230769230778, 0.8527106227106227, 0.8682051282051283, 0.8682051282051283, 0.8682051282051283, 0.8682051282051283, 0.88769230769230778, 0.88329670329670329, 0.87871794871794862]
[0.8089, 0.7863, 0.7897, 0.79, 0.8051, 0.8001, 0.7966, 0.7932, 0.7988, 0.7984, 0.7801, 0.8052, 0.7923, 0.7944, 0.8007, 0.7844, 0.7923, 0.7934, 0.8042, 0.8006, 0.8026, 0.7929, 0.7856, 0.7944, 0.814, 0.7883, 0.7891, 0.8013, 0.7758, 0.7953, 0.7699, 0.7879, 0.8106, 0.7961, 0.7903, 0.7957, 0.7793, 0.7742, 0.7804, 0.8069, 0.8027, 0.7902, 0.7996, 0.8015, 0.7918, 0.7874, 0.7879, 0.79, 0.7897, 0.7946, 0.7771, 0.7826, 0.7955, 0.8195, 0.7901, 0.8071, 0.7805, 0.8001, 0.8001, 0.8001, 0.8001, 0.8093, 0.7909, 0.7959]
mrrt, mapt:
[0.81044494720965299, 0.81560457516339868, 0.74914852304558188, 0.80479691876750703, 0.78295454545454557, 0.79157239819004521, 0.79202317290552593, 0.81607142857142867, 0.82686651583710413, 0.82377450980392164, 0.79301470588235301, 0.7760854341736696, 0.80882352941176483, 0.77327317290552589, 0.78582516339869291, 0.78707893413775765, 0.79369658119658126, 0.79966063348416305, 0.80870098039215688, 0.79944614209320097, 0.78984593837535011, 0.77916666666666667, 0.7890819964349377, 0.78504901960784323, 0.79369747899159671, 0.81815476190476188, 0.77254901960784317, 0.77761437908496733, 0.77832633053221301, 0.83455882352941191, 0.79267533936651591, 0.77638888888888902, 0.78951914098972942, 0.82486631016042788, 0.7948418003565062, 0.79888591800356501, 0.78490896358543427, 0.78050356506238849, 0.78731325863678803, 0.82599753187988489, 0.83054298642533941, 0.79888591800356512, 0.81350867269984928, 0.81093514328808458, 0.80493697478991599, 0.80857843137254903, 0.80453431372549022, 0.78984593837535022, 0.8093137254901962, 0.78242296918767518, 0.80539215686274523, 0.83002450980392151, 0.82933006535947718, 0.81126336898395734, 0.80318627450980395, 0.76183473389355738, 0.78371848739495797, 0.79296218487394965, 0.79296218487394965, 0.79296218487394965, 0.79296218487394965, 0.83946078431372551, 0.82023809523809543, 0.80526960784313728]
[0.7277, 0.7349, 0.7019, 0.7292, 0.7207, 0.7283, 0.7239, 0.7373, 0.7296, 0.7405, 0.7241, 0.7137, 0.7358, 0.7142, 0.7297, 0.7054, 0.7127, 0.7228, 0.7152, 0.7238, 0.7124, 0.7192, 0.7116, 0.7325, 0.7267, 0.7418, 0.7147, 0.7072, 0.7216, 0.7441, 0.7199, 0.7047, 0.7286, 0.7426, 0.72, 0.7236, 0.7203, 0.7213, 0.7201, 0.7403, 0.7348, 0.7298, 0.7227, 0.7189, 0.7208, 0.7365, 0.74, 0.7124, 0.7429, 0.7198, 0.7385, 0.7337, 0.7535, 0.747, 0.7303, 0.7098, 0.7161, 0.7258, 0.7258, 0.7258, 0.7258, 0.7539, 0.7274, 0.7453]

In [17]: ss.pearsonr(mrrv, mrrt)
Out[17]: (0.2044974133763463, 0.10503569612821166)
In [18]: ss.pearsonr(mapv, mapt)
Out[18]: (0.17587090427082455, 0.16449885001585987)
In [8]: np.mean(mrrv)
Out[8]: 0.8740141687016687
In [9]: np.std(mrrv)
Out[9]: 0.010561231563509519
In [11]: np.mean(mrrt)
Out[11]: 0.79887312251167963
In [10]: np.std(mrrt)
Out[10]: 0.018295636088356944
In [12]: np.max(mrrt)
Out[12]: 0.83946078431372551
In [13]: np.min(mrrt)
Out[13]: 0.74914852304558188
In [14]: np.max(mapt)
Out[14]: 0.75390000000000001
In [15]: np.mean(mapt)
Out[15]: 0.72627968749999994
n [16]: np.std(mapt)
Out[16]: 0.011727240986367755

This is awful!

CNN:

==> 10607226.arien.ics.muni.cz.aw_cnn <==
20000/20000 [==============================] - 178s - loss: 0.1354 - val_loss: 0.9067                                       val mrr 0.766916
Predict&Eval (best epoch)
Train Accuracy: raw 0.947806 (y=0 0.992798, y=1 0.586360), bal 0.789579
Train MRR: 0.941199  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.825425 (y=0 0.995614, y=1 0.068293), bal 0.531953
Val MRR: 0.851795

==> 10607227.arien.ics.muni.cz.aw_cnn <==
20000/20000 [==============================] - 177s - loss: 0.1628 - val_loss: 0.6934                                       val mrr 0.841966
Predict&Eval (best epoch)
Train Accuracy: raw 0.958231 (y=0 0.984861, y=1 0.744299), bal 0.864580
Train MRR: 0.944445  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.843330 (y=0 0.987939, y=1 0.200000), bal 0.593969
Val MRR: 0.860769

==> 10607228.arien.ics.muni.cz.aw_cnn <==
20000/20000 [==============================] - 178s - loss: 0.1366 - val_loss: 0.7379                                       val mrr 0.805531
Predict&Eval (best epoch)
Train Accuracy: raw 0.953767 (y=0 0.991300, y=1 0.652238), bal 0.821769
Train MRR: 0.954174  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.828111 (y=0 0.997807, y=1 0.073171), bal 0.535489
Val MRR: 0.863333

==> 10610660.arien.ics.muni.cz.aw_cnn <==
20000/20000 [==============================] - 179s - loss: 0.1441 - val_loss: 0.8120                                       val mrr 0.817308
Predict&Eval (best epoch)
Train Accuracy: raw 0.968703 (y=0 0.987069, y=1 0.821157), bal 0.904113
Train MRR: 0.955313  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.844226 (y=0 0.993421, y=1 0.180488), bal 0.586954
Val MRR: 0.888974

==> 10610661.arien.ics.muni.cz.aw_cnn <==
20000/20000 [==============================] - 177s - loss: 0.1261 - val_loss: 0.7154                                       val mrr 0.811282
Predict&Eval (best epoch)
Train Accuracy: raw 0.959283 (y=0 0.992956, y=1 0.688767), bal 0.840861
Train MRR: 0.960245  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.839749 (y=0 0.989035, y=1 0.175610), bal 0.582322
Val MRR: 0.872949

==> 10610662.arien.ics.muni.cz.aw_cnn <==
20000/20000 [==============================] - 176s - loss: 0.1471 - val_loss: 0.6331                                       val mrr 0.836044
Predict&Eval (best epoch)
Train Accuracy: raw 0.957951 (y=0 0.986333, y=1 0.729941), bal 0.858137
Train MRR: 0.948060  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.846016 (y=0 0.982456, y=1 0.239024), bal 0.610740
Val MRR: 0.869872

==> 10610663.arien.ics.muni.cz.aw_cnn <==
20000/20000 [==============================] - 176s - loss: 0.1803 - val_loss: 0.7234                                       val mrr 0.834762
Predict&Eval (best epoch)
Train Accuracy: raw 0.889302 (y=0 0.999947, y=1 0.000422), bal 0.500185
Train MRR: 0.866786  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.816473 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.862564

==> 10610669.arien.ics.muni.cz.aw_cnn <==
20000/20000 [==============================] - 177s - loss: 0.1798 - val_loss: 0.7526                                       val mrr 0.846325
Predict&Eval (best epoch)
Train Accuracy: raw 0.920856 (y=0 0.987331, y=1 0.386824), bal 0.687078
Train MRR: 0.884897  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.826321 (y=0 0.994518, y=1 0.078049), bal 0.536283
Val MRR: 0.866667

attn1511:

==> 10607198.arien.ics.muni.cz.aw_a1511 <==
20000/20000 [==============================] - 276s - loss: 0.1325 - val_loss: 0.6001                                       val mrr 0.791867
Predict&Eval (best epoch)
Train Accuracy: raw 0.913681 (y=0 0.979683, y=1 0.383446), bal 0.681564
Train MRR: 0.864677  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.825425 (y=0 0.991228, y=1 0.087805), bal 0.539516
Val MRR: 0.866667

==> 10607199.arien.ics.muni.cz.aw_a1511 <==
20000/20000 [==============================] - 276s - loss: 0.1069 - val_loss: 0.6251                                       val mrr 0.764676
Predict&Eval (best epoch)
Train Accuracy: raw 0.935442 (y=0 0.972876, y=1 0.634713), bal 0.803794
Train MRR: 0.890238  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.840645 (y=0 0.967105, y=1 0.278049), bal 0.622577
Val MRR: 0.865385

==> 10607200.arien.ics.muni.cz.aw_a1511 <==
20000/20000 [==============================] - 277s - loss: 0.1134 - val_loss: 0.5559                                       val mrr 0.793007
Predict&Eval (best epoch)
Train Accuracy: raw 0.940514 (y=0 0.969801, y=1 0.705236), bal 0.837518
Train MRR: 0.896228  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.840645 (y=0 0.958333, y=1 0.317073), bal 0.637703
Val MRR: 0.890769

==> 10610671.arien.ics.muni.cz.aw_a1511 <==
20000/20000 [==============================] - 275s - loss: 0.1196 - val_loss: 0.5428                                       val mrr 0.843846
Predict&Eval (best epoch)
Train Accuracy: raw 0.924105 (y=0 0.993324, y=1 0.368032), bal 0.680678
Train MRR: 0.911378  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.816473 (y=0 0.991228, y=1 0.039024), bal 0.515126
Val MRR: 0.886410

==> 10610672.arien.ics.muni.cz.aw_a1511 <==
20000/20000 [==============================] - 274s - loss: 0.1065 - val_loss: 0.6153                                       val mrr 0.825897
Predict&Eval (best epoch)
Train Accuracy: raw 0.933525 (y=0 0.976503, y=1 0.588260), bal 0.782381
Train MRR: 0.898266  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.842435 (y=0 0.970395, y=1 0.273171), bal 0.621783
Val MRR: 0.888462

==> 10610673.arien.ics.muni.cz.aw_a1511 <==
20000/20000 [==============================] - 275s - loss: 0.1062 - val_loss: 0.4998                                       val mrr 0.793333
Predict&Eval (best epoch)
Train Accuracy: raw 0.936400 (y=0 0.976634, y=1 0.613176), bal 0.794905
Train MRR: 0.894604  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.845121 (y=0 0.981360, y=1 0.239024), bal 0.610192
Val MRR: 0.865128

==> 10610674.arien.ics.muni.cz.aw_a1511 <==
20000/20000 [==============================] - 275s - loss: 0.0871 - val_loss: 1.0442                                       val mrr 0.745128
Predict&Eval (best epoch)
Train Accuracy: raw 0.943319 (y=0 0.975504, y=1 0.684755), bal 0.830130
Train MRR: 0.903599  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.842435 (y=0 0.967105, y=1 0.287805), bal 0.627455
Val MRR: 0.873333

==> 10610675.arien.ics.muni.cz.aw_a1511 <==
20000/20000 [==============================] - 276s - loss: 0.1012 - val_loss: 0.6416                                       val mrr 0.808974
Predict&Eval (best epoch)
Train Accuracy: raw 0.937873 (y=0 0.987910, y=1 0.535895), bal 0.761902
Train MRR: 0.910429  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.826321 (y=0 0.984649, y=1 0.121951), bal 0.553300
Val MRR: 0.873590

New Keras

64x RNN with binary_crossentropy loss (bounded 0,1 predictions will make it easier to compute per-sample variances):

10675999.arien.ics.muni.cz.aw_1rnnd0_lbc etc.
[0.862821, 0.877875, 0.882198, 0.850040, 0.851026, 0.863150, 0.873706, 0.860769, 0.848681, 0.851474, 0.847326, 0.854231, 0.838462, 0.855748, 0.880962, 0.866597, 0.879359, 0.887436, 0.878246, 0.843462, 0.881685, 0.875442, 0.856410, 0.850414, 0.886154, 0.874359, 0.866630, 0.862051, 0.861709, 0.839524, 0.859134, 0.876703, 0.858132, 0.877179, 0.841931, 0.860952, 0.844017, 0.861951, 0.872977, 0.858065, 0.851874, 0.846325, 0.846044, 0.831757, 0.850962, 0.855275, 0.856581, 0.857040, 0.839744, 0.860073, 0.847051, 0.838205, 0.857179, 0.852051, 0.876282, 0.859121, 0.865128, 0.884951, 0.871429, 0.892051, 0.885425, 0.853746, 0.861795, 0.854231, ]

yodaqa-curatedv2 dataset

Old Keras experiments with dropout partially applied.

CNN:

==> 10607229.arien.ics.muni.cz.ay_cnn <==
20000/20000 [==============================] - 221s - loss: 0.1166 - val_loss: 0.3841                                       val mrr 0.265574
Predict&Eval (best epoch)
Train Accuracy: raw 0.964884 (y=0 0.998043, y=1 0.384102), bal 0.691072
Train MRR: 0.700279  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.934229 (y=0 0.988121, y=1 0.023544), bal 0.505832
Val MRR: 0.364058

==> 10607230.arien.ics.muni.cz.ay_cnn <==
20000/20000 [==============================] - 219s - loss: 0.1115 - val_loss: 0.3619                                       val mrr 0.259407
Predict&Eval (best epoch)
Train Accuracy: raw 0.962491 (y=0 0.996293, y=1 0.370449), bal 0.683371
Train MRR: 0.699684  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.930975 (y=0 0.983207, y=1 0.048327), bal 0.515767
Val MRR: 0.334033

==> 10607231.arien.ics.muni.cz.ay_cnn <==
20000/20000 [==============================] - 221s - loss: 0.1256 - val_loss: 0.3087                                       val mrr 0.255388
Predict&Eval (best epoch)
Train Accuracy: raw 0.953725 (y=0 0.998389, y=1 0.171420), bal 0.584904
Train MRR: 0.593810  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.943991 (y=0 0.999853, y=1 0.000000), bal 0.499927
Val MRR: 0.380828

==> 10610664.arien.ics.muni.cz.ay_cnn <==
20000/20000 [==============================] - 219s - loss: 0.1383 - val_loss: 0.3556                                       val mrr 0.271949
Predict&Eval (best epoch)
Train Accuracy: raw 0.949972 (y=0 0.999965, y=1 0.074333), bal 0.537149
Train MRR: 0.503931  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944060 (y=0 0.999927, y=1 0.000000), bal 0.499963
Val MRR: 0.344283

==> 10610665.arien.ics.muni.cz.ay_cnn <==
20000/20000 [==============================] - 222s - loss: 0.1264 - val_loss: 0.3648                                       val mrr 0.288607
Predict&Eval (best epoch)
Train Accuracy: raw 0.957494 (y=0 0.997627, y=1 0.254551), bal 0.626089
Train MRR: 0.668582  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.941567 (y=0 0.996847, y=1 0.007435), bal 0.502141
Val MRR: 0.367982

==> 10610666.arien.ics.muni.cz.ay_cnn <==
20000/20000 [==============================] - 220s - loss: 0.1204 - val_loss: 0.3388                                       val mrr 0.269678
Predict&Eval (best epoch)
Train Accuracy: raw 0.955576 (y=0 0.998822, y=1 0.198119), bal 0.598471
Train MRR: 0.646638  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.943506 (y=0 0.999193, y=1 0.002478), bal 0.500836
Val MRR: 0.370531

==> 10610667.arien.ics.muni.cz.ay_cnn <==
20000/20000 [==============================] - 219s - loss: 0.1241 - val_loss: 0.3724                                       val mrr 0.255753
Predict&Eval (best epoch)
Train Accuracy: raw 0.950382 (y=0 0.999965, y=1 0.081917), bal 0.540941
Train MRR: 0.598393  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944129 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.399154

==> 10610668.arien.ics.muni.cz.ay_cnn <==
20000/20000 [==============================] - 220s - loss: 0.1301 - val_loss: 0.2897                                       val mrr 0.250608
Predict&Eval (best epoch)
Train Accuracy: raw 0.955445 (y=0 0.997177, y=1 0.224515), bal 0.610846
Train MRR: 0.572635  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.941291 (y=0 0.996480, y=1 0.008674), bal 0.502577
Val MRR: 0.345531

attn1511:

==> 10607195.arien.ics.muni.cz.ay_a1511 <==
20000/20000 [==============================] - 337s - loss: 0.1076 - val_loss: 0.3202                                       val mrr 0.247784
Predict&Eval (best epoch)
Train Accuracy: raw 0.953479 (y=0 0.997713, y=1 0.178701), bal 0.588207
Train MRR: 0.501824  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944337 (y=0 0.999927, y=1 0.004957), bal 0.502442
Val MRR: 0.483186

==> 10607196.arien.ics.muni.cz.ay_a1511 <==
20000/20000 [==============================] - 338s - loss: 0.1323 - val_loss: 0.3089                                       val mrr 0.211629
Predict&Eval (best epoch)
Train Accuracy: raw 0.951627 (y=0 0.999307, y=1 0.116505), bal 0.557906
Train MRR: 0.446953  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944129 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.489259

==> 10607197.arien.ics.muni.cz.ay_a1511 <==
20000/20000 [==============================] - 337s - loss: 0.1334 - val_loss: 0.3269                                       val mrr 0.274479
Predict&Eval (best epoch)
Train Accuracy: raw 0.951709 (y=0 0.999117, y=1 0.121359), bal 0.560238
Train MRR: 0.456003  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944129 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.506934

==> 10610676.arien.ics.muni.cz.ay_a1511 <==
20000/20000 [==============================] - 336s - loss: 0.1322 - val_loss: 0.3004                                       val mrr 0.354856
Predict&Eval (best epoch)
Train Accuracy: raw 0.950759 (y=0 0.999463, y=1 0.097694), bal 0.548579
Train MRR: 0.452666  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944129 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.483744

==> 10610677.arien.ics.muni.cz.ay_a1511 <==
20000/20000 [==============================] - 336s - loss: 0.1306 - val_loss: 0.2994                                       val mrr 0.273350
Predict&Eval (best epoch)
Train Accuracy: raw 0.951660 (y=0 0.999307, y=1 0.117112), bal 0.558209
Train MRR: 0.455662  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944129 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.457244

==> 10610678.arien.ics.muni.cz.ay_a1511 <==
20000/20000 [==============================] - 336s - loss: 0.1320 - val_loss: 0.3083                                       val mrr 0.332179
Predict&Eval (best epoch)
Train Accuracy: raw 0.951742 (y=0 0.999307, y=1 0.118629), bal 0.558968
Train MRR: 0.473461  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944129 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.454481

==> 10610679.arien.ics.muni.cz.ay_a1511 <==
20000/20000 [==============================] - 336s - loss: 0.1461 - val_loss: 0.2929                                       val mrr 0.243183
Predict&Eval (best epoch)
Train Accuracy: raw 0.949841 (y=0 0.999965, y=1 0.071905), bal 0.535935
Train MRR: 0.443360  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944129 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.473662

==> 10610680.arien.ics.muni.cz.ay_a1511 <==
20000/20000 [==============================] - 337s - loss: 0.1136 - val_loss: 0.2985                                       val mrr 0.293237
Predict&Eval (best epoch)
Train Accuracy: raw 0.952102 (y=0 0.998874, y=1 0.132888), bal 0.565881
Train MRR: 0.502078  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944129 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.527090

New Keras

With dropout fully fixed... (and not applied at all :)

attn1511:

==> 10658672.arien.ics.muni.cz.ay_1a51d0 <==
15256/15256 [==============================] - 271s - loss: 0.1090 - val_loss: 0.2547                                       val mrr 0.351885
Predict&Eval (best epoch)
Train Accuracy: raw 0.949612 (y=0 0.999723, y=1 0.071905), bal 0.535814
Train MRR: 0.436671  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944129 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.477024

==> 10658673.arien.ics.muni.cz.ay_1a51d0 <==
15256/15256 [==============================] - 268s - loss: 0.1167 - val_loss: 0.2528                                       val mrr 0.368745
Predict&Eval (best epoch)
Train Accuracy: raw 0.951676 (y=0 0.999238, y=1 0.118629), bal 0.558933
Train MRR: 0.476874  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944129 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.455434

==> 10658674.arien.ics.muni.cz.ay_1a51d0 <==
15256/15256 [==============================] - 269s - loss: 0.1627 - val_loss: 0.2321                                       val mrr 0.285615
Predict&Eval (best epoch)
Train Accuracy: raw 0.951922 (y=0 0.996986, y=1 0.162621), bal 0.579804
Train MRR: 0.449095  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944129 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.449533

==> 10658675.arien.ics.muni.cz.ay_1a51d0 <==
15256/15256 [==============================] - 267s - loss: 0.1071 - val_loss: 0.3014                                       val mrr 0.274728
Predict&Eval (best epoch)
Train Accuracy: raw 0.952643 (y=0 0.996432, y=1 0.185680), bal 0.591056
Train MRR: 0.468218  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944129 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.483179

(attn1511 could apparently make do with some dropout after all).

RNN:

==> 10658668.arien.ics.muni.cz.ay_1rnnd0 <==
20000/20000 [==============================] - 183s - loss: 0.0599 - val_loss: 0.4257                                       val mrr 0.258471
Predict&Eval (best epoch)
Train Accuracy: raw 0.955101 (y=0 0.998112, y=1 0.201760), bal 0.599936
Train MRR: 0.548954  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.943091 (y=0 0.998827, y=1 0.001239), bal 0.500033
Val MRR: 0.307231

==> 10658669.arien.ics.muni.cz.ay_1rnnd0 <==
15256/15256 [==============================] - 150s - loss: 0.0247 - val_loss: 0.4625                                       val mrr 0.266685
Predict&Eval (best epoch)
Train Accuracy: raw 0.990906 (y=0 0.997523, y=1 0.875000), bal 0.936261
Train MRR: 0.955644  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.921698 (y=0 0.973235, y=1 0.050805), bal 0.512020
Val MRR: 0.322597

==> 10658670.arien.ics.muni.cz.ay_1rnnd0 <==
15256/15256 [==============================] - 150s - loss: 0.0833 - val_loss: 0.4153                                       val mrr 0.316059
Predict&Eval (best epoch)
Train Accuracy: raw 0.957690 (y=0 0.998233, y=1 0.247573), bal 0.622903
Train MRR: 0.616726  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.940044 (y=0 0.995160, y=1 0.008674), bal 0.501917
Val MRR: 0.347030

==> 10658671.arien.ics.muni.cz.ay_1rnnd0 <==
15256/15256 [==============================] - 149s - loss: 0.0864 - val_loss: 0.3491                                       val mrr 0.280607
Predict&Eval (best epoch)
Train Accuracy: raw 0.951545 (y=0 0.999896, y=1 0.104672), bal 0.552284
Train MRR: 0.531235  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944129 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.309661

64x RNN with binary_crossentropy loss (bounded 0,1 predictions will make it easier to compute per-sample variances):

10675927.arien.ics.muni.cz.ay_1rnnd0_lbc etc.
[0.333701, 0.326125, 0.336284, 0.345328, 0.339932, 0.299496, 0.308057, 0.294640, 0.371689, 0.358677, 0.316229, 0.351746, 0.389020, 0.344435, 0.319640, 0.335084, 0.350999, 0.344522, 0.330878, 0.316592, 0.392825, 0.325323, 0.376527, 0.350433, 0.353456, 0.291590, 0.317691, 0.352287, 0.365691, 0.307232, 0.327808, 0.327562, 0.358058, 0.371684, 0.373486, 0.341047, 0.339725, 0.340547, 0.334846, 0.342750, 0.291530, 0.323766, 0.334265, 0.358144, 0.341592, 0.343868, 0.310702, 0.326541, 0.319661, 0.339354, 0.299108, 0.331717, 0.310318, 0.362667, 0.327149, 0.355891, 0.305087, 0.359355, 0.357987, 0.331698, 0.316316, 0.347250, 0.334750, 0.308903, ]

Observation: Disregarding number of pairs, the number of questions (on which MRR is measured) is actually pretty small on the val set (88)!

32x RNN with binary_crossentropy loss, using large2470-val for validation (MRR computed from 333 questions):

10676451.arien.ics.muni.cz.ayl_1rnnd0_lbc etc.
[0.354874, 0.318208, 0.317254, 0.393136, 0.326733, 0.371319, 0.338579, 0.367860, 0.343624, 0.375447, 0.349652, 0.365866, 0.336435, 0.345256, 0.341385, 0.357561, 0.353628, 0.344108, 0.324256, 0.351309, 0.369522, 0.348427, 0.334746, 0.354037, 0.362202, 0.356219, 0.363204, 0.348845, 0.366015, 0.359293, 0.337874, 0.330425, ]

Ok, that didn't help. Not so surprising given the per-pair accuracy reported above also fluctuating.

Observation: Tiny changes in rank may transfer to huge changes in MRR near the top. What if we try to alleviate this by looking at hard questions with many alternatives only?

32x RNN with binary_crossentropy loss, using large2470-val for validation, but only questions with 100 or more pairs (MRR computed from ~230 questions):

10676483.arien.ics.muni.cz.ayl_1rnnd0_lbc_mq100 etc.
[0.261798, 0.250904, 0.254477, 0.226031, 0.243203, 0.254298, 0.296402, 0.225364, 0.254544, 0.270242, 0.225916, 0.247560, 0.250359, 0.256854, 0.245027, 0.234488, 0.246186, 0.265761, 0.259943, 0.267567, 0.247762, 0.277862, 0.259452, 0.268017, 0.264453, 0.218607, 0.239640, 0.244359, 0.243594, 0.302583, 0.265366, 0.240797, ]

yodaqa-large2470 dataset

New Keras only...

attn1511:

==> 10658677.arien.ics.muni.cz.al_1a51d0 <==
45136/45136 [==============================] - 831s - loss: 0.1019 - val_loss: 0.2926                                       val mrr 0.348215
Predict&Eval (best epoch)
Train Accuracy: raw 0.944385 (y=0 0.999248, y=1 0.144755), bal 0.572002
Train MRR: 0.495441  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.939602 (y=0 0.998748, y=1 0.015060), bal 0.506904
Val MRR: 0.406924

==> 10658678.arien.ics.muni.cz.al_1a51d0 <==
45136/45136 [==============================] - 831s - loss: 0.1059 - val_loss: 0.2699                                       val mrr 0.399776
Predict&Eval (best epoch)
Train Accuracy: raw 0.941538 (y=0 0.999307, y=1 0.099551), bal 0.549429
Train MRR: 0.440954  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.940055 (y=0 0.999518, y=1 0.010542), bal 0.505030
Val MRR: 0.416866

==> 10658679.arien.ics.muni.cz.al_1a51d0 <==
45136/45136 [==============================] - 808s - loss: 0.1054 - val_loss: 0.2854                                       val mrr 0.345488
Predict&Eval (best epoch)
Train Accuracy: raw 0.944767 (y=0 0.996088, y=1 0.196774), bal 0.596431
Train MRR: 0.475772  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.933717 (y=0 0.990606, y=1 0.044428), bal 0.517517
Val MRR: 0.429878

==> 10658680.arien.ics.muni.cz.al_1a51d0 <==
45136/45136 [==============================] - 807s - loss: 0.1189 - val_loss: 0.2734                                       val mrr 0.368531
Predict&Eval (best epoch)
Train Accuracy: raw 0.941676 (y=0 0.989500, y=1 0.244651), bal 0.617076
Train MRR: 0.472865  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.907140 (y=0 0.960909, y=1 0.066642), bal 0.513775
Val MRR: 0.445978

RNN:

==> 10658681.arien.ics.muni.cz.al_1rnnd0 <==
45136/45136 [==============================] - 453s - loss: 0.0877 - val_loss: 0.2444                                       val mrr 0.390329
Predict&Eval (best epoch)
Train Accuracy: raw 0.940995 (y=0 0.999663, y=1 0.085921), bal 0.542792
Train MRR: 0.485474  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.940576 (y=0 0.999326, y=1 0.022214), bal 0.510770
Val MRR: 0.428687

==> 10658682.arien.ics.muni.cz.al_1rnnd0 <==
45136/45136 [==============================] - 456s - loss: 0.0843 - val_loss: 0.3327                                       val mrr 0.404100
Predict&Eval (best epoch)
Train Accuracy: raw 0.944163 (y=0 0.997751, y=1 0.163130), bal 0.580440
Train MRR: 0.474562  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.935686 (y=0 0.993208, y=1 0.036521), bal 0.514864
Val MRR: 0.426952

==> 10658683.arien.ics.muni.cz.al_1rnnd0 <==
45136/45136 [==============================] - 457s - loss: 0.2585 - val_loss: 0.2296                                       val mrr 0.076974
Predict&Eval (best epoch)
Train Accuracy: raw 0.984790 (y=0 0.995857, y=1 0.823499), bal 0.909678
Train MRR: 0.926491  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.923213 (y=0 0.972157, y=1 0.158133), bal 0.565145
Val MRR: 0.402299

==> 10658684.arien.ics.muni.cz.al_1rnnd0 <==
45136/45136 [==============================] - 451s - loss: 0.0227 - val_loss: 0.3588                                       val mrr 0.403606
Predict&Eval (best epoch)
Train Accuracy: raw 0.993808 (y=0 0.998621, y=1 0.923654), bal 0.961138
Train MRR: 0.974894  (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.927174 (y=0 0.976877, y=1 0.150226), bal 0.563552
Val MRR: 0.414855

8x RNN with binary_crossentropy loss (bounded 0,1 predictions will make it easier to compute per-sample variances):

10676209.arien.ics.muni.cz.al_1rnnd0_lbc etc.
[0.423496, 0.396040, 0.409522, 0.437343, 0.417056, 0.421266, 0.417228, 0.417645, ]

Ideas

  • Are some questions more variable than others? Per-question RR variability.

  • Maybe the validation performance is bound to train performance because we sometimes tend to overfit too drastically - finer epochs could enable us to catch a non-overfit state more reliably. Try with a lot smaller epoch_fract. (wip)

Clone this wiki locally