-
Notifications
You must be signed in to change notification settings - Fork 205
1602Stats
We have noticed that results on some of the tasks are remarkably unstable and investigating that.
We do not see a very large variance with the Ubuntu dataset!
(Still enough to have statistical considerations enter comparisons between similar-performing systems, though.)
very small model:
python tools/ubuntu_train.py rnn data/anssel/ubuntu/v2-vocab.pickle data/anssel/ubuntu/v2-trainset.pickle data/anssel/ubuntu/v2-valset.pickle "pact='tanh'" sdim=1/6 pdim=1/6 ptscorer=B.dot_ptscorer dropout=1/2
Epoch 17/32
200064/200000 [==============================] - 1348s - loss: 0.1473 val mrr 0.716158
Val MRR: 0.760466
Val 2-R@1: 0.894683
Val 10-R@1: 0.637628 10-R@2: 0.775256 10-R@5: 0.941053
rnn--14befea563038897
Val MRR: 0.739525
Val 2-R@1: 0.876943
Val 10-R@1: 0.611708 10-R@2: 0.749387 10-R@5: 0.924489
rnn--15045270c51250bf
Epoch 17/32
200064/200000 [==============================] - 1324s - loss: 0.1491 val mrr 0.704187--
Val MRR: 0.750804
Val 2-R@1: 0.887117
Val 10-R@1: 0.626074 10-R@2: 0.763344 10-R@5: 0.931084
rnn--48e4d70df2db9111
Val MRR: 0.753133
Val 2-R@1: 0.888241
Val 10-R@1: 0.628067 10-R@2: 0.766360 10-R@5: 0.935992
rnn-619a7067ff9c2f1f
Val MRR: 0.760466
Val 2-R@1: 0.894683
Val 10-R@1: 0.637628 10-R@2: 0.775256 10-R@5: 0.941053
very small model + padding experiment (80):
python tools/ubuntu_train2.py rnn data/anssel/ubuntu/v2-vocab.pickle data/anssel/ubuntu/v2-trainset.pickle data/anssel/ubuntu/v2-valset.pickle "pact='tanh'" sdim=1/6 pdim=1/6 ptscorer=B.dot_ptscorer dropout=0
Epoch 11/32
200064/200000 [==============================] - 767s - loss: 0.2440 val mrr 0.729863
data/anssel/ubuntu/v2-valset.pickle MRR: 0.748693
data/anssel/ubuntu/v2-valset.pickle 2-R@1: 0.885123
data/anssel/ubuntu/v2-valset.pickle 10-R@1: 0.622342 10-R@2: 0.762321 10-R@5: 0.930930
rnn-13309de338ed5ca9
Val MRR: 0.749029
Val 2-R@1: 0.888855
Val 10-R@1: 0.620961 10-R@2: 0.764417 10-R@5: 0.935583
rnn--503416baa4bf7609
Val MRR: 0.753204
Val 2-R@1: 0.889008
Val 10-R@1: 0.628885 10-R@2: 0.766667 10-R@5: 0.933947
rnn--58f30ee70695ae87
Epoch 17/32
200064/200000 [==============================] - 763s - loss: 0.1375 val mrr 0.710547
Val MRR: 0.752327
Val 2-R@1: 0.890593
Val 10-R@1: 0.626329 10-R@2: 0.765337 10-R@5: 0.936094
rnn-3b38f6cc1e91ec9e
TODO re-eval (0.748693)
rnn-13309de338ed5ca9
Val MRR: 0.749029
Val 2-R@1: 0.888855
Val 10-R@1: 0.620961 10-R@2: 0.764417 10-R@5: 0.935583
rnn--59309a454c87d012
Val MRR: 0.750123
Val 2-R@1: 0.887065
Val 10-R@1: 0.623466 10-R@2: 0.762474 10-R@5: 0.934254
rnn-10c244d2f57cafd4
Val MRR: 0.746547--
Val 2-R@1: 0.885583
Val 10-R@1: 0.619939 10-R@2: 0.758640 10-R@5: 0.929294
rnn-2df03f10a8cf5003
Val MRR: 0.756253
Val 2-R@1: 0.891258
Val 10-R@1: 0.631544 10-R@2: 0.769888 10-R@5: 0.939264
rnn--503416baa4bf7609
Val MRR: 0.753204
Val 2-R@1: 0.889008
Val 10-R@1: 0.628885 10-R@2: 0.766667 10-R@5: 0.933947
rnn-785a86a0677c3731
Val MRR: 0.752015
Val 2-R@1: 0.892280
Val 10-R@1: 0.626483 10-R@2: 0.765644 10-R@5: 0.933640
rnn--4644a47ffc98cec5
Val MRR: 0.753015
Val 2-R@1: 0.887474
Val 10-R@1: 0.629294 10-R@2: 0.764417 10-R@5: 0.934049
rnn--7ea9ca0ecca48708
Val MRR: 0.752862
Val 2-R@1: 0.887986
Val 10-R@1: 0.629550 10-R@2: 0.763037 10-R@5: 0.933742
rnn-9bab77e3444e76e
Val MRR: 0.747036
Val 2-R@1: 0.886043
Val 10-R@1: 0.619530 10-R@2: 0.760020 10-R@5: 0.932311
rnn--5566ac149c1df702
Val MRR: 0.746756
Val 2-R@1: 0.881391
Val 10-R@1: 0.621472 10-R@2: 0.756391 10-R@5: 0.928425
rnn-5ca553289a85ec18
data/anssel/ubuntu/v2-valset.pickle MRR: 0.755227
data/anssel/ubuntu/v2-valset.pickle 2-R@1: 0.891973
data/anssel/ubuntu/v2-valset.pickle 10-R@1: 0.631288 10-R@2: 0.768405 10-R@5: 0.937014
rnn-23d33f7c865857d2
data/anssel/ubuntu/v2-valset.pickle MRR: 0.755279
data/anssel/ubuntu/v2-valset.pickle 2-R@1: 0.889519
data/anssel/ubuntu/v2-valset.pickle 10-R@1: 0.630010 10-R@2: 0.769836 10-R@5: 0.938753
large model with spad=80:
sdim=2 pdim=1 "pact='tanh'" ptscorer=B.dot_ptscorer
Epoch 17/32
200064/200000 [==============================] - 2448s - loss: 0.1755 val mrr 0.738215
Predict&Eval (best epoch)
Val MRR: 0.766637
Val 2-R@1: 0.898160
Val 10-R@1: 0.644121 10-R@2: 0.786196 10-R@5: 0.943405
Epoch 4/32
200064/200000 [==============================] - 2423s - loss: 0.4320 val mrr 0.761111
Epoch 9/32
200064/200000 [==============================] - 2389s - loss: 0.3566 val mrr 0.773375
Epoch 10/32
200064/200000 [==============================] - 2467s - loss: 0.3566 val mrr 0.778220
Epoch 4/32
200064/200000 [==============================] - 2436s - loss: 0.4363 val mrr 0.760970
Epoch 4/32
200064/200000 [==============================] - 2421s - loss: 0.4337 val mrr 0.758342
Old Keras experiments with dropout partially applied, but it shouldn't influence relative differences between these results.
The initial attn1511 implementation, 64 runs:
mrrv, mapv:
[0.87948717948717947, 0.85974358974358978, 0.85564102564102562, 0.86384615384615382, 0.87820512820512808, 0.86717948717948712, 0.86538461538461542, 0.88256410256410256, 0.87846153846153852, 0.88179487179487182, 0.85954415954415953, 0.88051282051282054, 0.88717948717948725, 0.87948717948717958, 0.87435897435897425, 0.85783882783882781, 0.87948717948717958, 0.85256410256410253, 0.88, 0.88705128205128203, 0.88205128205128203, 0.86692307692307691, 0.87282051282051287, 0.87358974358974351, 0.89205128205128204, 0.87230769230769234, 0.88256410256410256, 0.87358974358974351, 0.86040293040293037, 0.87102564102564106, 0.8684615384615384, 0.86564102564102563, 0.88717948717948714, 0.875, 0.87743589743589745, 0.87358974358974351, 0.88860805860805858, 0.86769230769230776, 0.87435897435897425, 0.88076923076923075, 0.88897435897435895, 0.86846153846153851, 0.88512820512820511, 0.86527472527472526, 0.87923076923076937, 0.87743589743589745, 0.86589743589743584, 0.8684615384615384, 0.88230769230769235, 0.86923076923076925, 0.86076923076923084, 0.86333333333333329, 0.85999999999999999, 0.90897435897435896, 0.87410256410256415, 0.88769230769230778, 0.8527106227106227, 0.8682051282051283, 0.8682051282051283, 0.8682051282051283, 0.8682051282051283, 0.88769230769230778, 0.88329670329670329, 0.87871794871794862]
[0.8089, 0.7863, 0.7897, 0.79, 0.8051, 0.8001, 0.7966, 0.7932, 0.7988, 0.7984, 0.7801, 0.8052, 0.7923, 0.7944, 0.8007, 0.7844, 0.7923, 0.7934, 0.8042, 0.8006, 0.8026, 0.7929, 0.7856, 0.7944, 0.814, 0.7883, 0.7891, 0.8013, 0.7758, 0.7953, 0.7699, 0.7879, 0.8106, 0.7961, 0.7903, 0.7957, 0.7793, 0.7742, 0.7804, 0.8069, 0.8027, 0.7902, 0.7996, 0.8015, 0.7918, 0.7874, 0.7879, 0.79, 0.7897, 0.7946, 0.7771, 0.7826, 0.7955, 0.8195, 0.7901, 0.8071, 0.7805, 0.8001, 0.8001, 0.8001, 0.8001, 0.8093, 0.7909, 0.7959]
mrrt, mapt:
[0.81044494720965299, 0.81560457516339868, 0.74914852304558188, 0.80479691876750703, 0.78295454545454557, 0.79157239819004521, 0.79202317290552593, 0.81607142857142867, 0.82686651583710413, 0.82377450980392164, 0.79301470588235301, 0.7760854341736696, 0.80882352941176483, 0.77327317290552589, 0.78582516339869291, 0.78707893413775765, 0.79369658119658126, 0.79966063348416305, 0.80870098039215688, 0.79944614209320097, 0.78984593837535011, 0.77916666666666667, 0.7890819964349377, 0.78504901960784323, 0.79369747899159671, 0.81815476190476188, 0.77254901960784317, 0.77761437908496733, 0.77832633053221301, 0.83455882352941191, 0.79267533936651591, 0.77638888888888902, 0.78951914098972942, 0.82486631016042788, 0.7948418003565062, 0.79888591800356501, 0.78490896358543427, 0.78050356506238849, 0.78731325863678803, 0.82599753187988489, 0.83054298642533941, 0.79888591800356512, 0.81350867269984928, 0.81093514328808458, 0.80493697478991599, 0.80857843137254903, 0.80453431372549022, 0.78984593837535022, 0.8093137254901962, 0.78242296918767518, 0.80539215686274523, 0.83002450980392151, 0.82933006535947718, 0.81126336898395734, 0.80318627450980395, 0.76183473389355738, 0.78371848739495797, 0.79296218487394965, 0.79296218487394965, 0.79296218487394965, 0.79296218487394965, 0.83946078431372551, 0.82023809523809543, 0.80526960784313728]
[0.7277, 0.7349, 0.7019, 0.7292, 0.7207, 0.7283, 0.7239, 0.7373, 0.7296, 0.7405, 0.7241, 0.7137, 0.7358, 0.7142, 0.7297, 0.7054, 0.7127, 0.7228, 0.7152, 0.7238, 0.7124, 0.7192, 0.7116, 0.7325, 0.7267, 0.7418, 0.7147, 0.7072, 0.7216, 0.7441, 0.7199, 0.7047, 0.7286, 0.7426, 0.72, 0.7236, 0.7203, 0.7213, 0.7201, 0.7403, 0.7348, 0.7298, 0.7227, 0.7189, 0.7208, 0.7365, 0.74, 0.7124, 0.7429, 0.7198, 0.7385, 0.7337, 0.7535, 0.747, 0.7303, 0.7098, 0.7161, 0.7258, 0.7258, 0.7258, 0.7258, 0.7539, 0.7274, 0.7453]
In [17]: ss.pearsonr(mrrv, mrrt)
Out[17]: (0.2044974133763463, 0.10503569612821166)
In [18]: ss.pearsonr(mapv, mapt)
Out[18]: (0.17587090427082455, 0.16449885001585987)
In [8]: np.mean(mrrv)
Out[8]: 0.8740141687016687
In [9]: np.std(mrrv)
Out[9]: 0.010561231563509519
In [11]: np.mean(mrrt)
Out[11]: 0.79887312251167963
In [10]: np.std(mrrt)
Out[10]: 0.018295636088356944
In [12]: np.max(mrrt)
Out[12]: 0.83946078431372551
In [13]: np.min(mrrt)
Out[13]: 0.74914852304558188
In [14]: np.max(mapt)
Out[14]: 0.75390000000000001
In [15]: np.mean(mapt)
Out[15]: 0.72627968749999994
n [16]: np.std(mapt)
Out[16]: 0.011727240986367755
This is awful!
CNN:
==> 10607226.arien.ics.muni.cz.aw_cnn <==
20000/20000 [==============================] - 178s - loss: 0.1354 - val_loss: 0.9067 val mrr 0.766916
Predict&Eval (best epoch)
Train Accuracy: raw 0.947806 (y=0 0.992798, y=1 0.586360), bal 0.789579
Train MRR: 0.941199 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.825425 (y=0 0.995614, y=1 0.068293), bal 0.531953
Val MRR: 0.851795
==> 10607227.arien.ics.muni.cz.aw_cnn <==
20000/20000 [==============================] - 177s - loss: 0.1628 - val_loss: 0.6934 val mrr 0.841966
Predict&Eval (best epoch)
Train Accuracy: raw 0.958231 (y=0 0.984861, y=1 0.744299), bal 0.864580
Train MRR: 0.944445 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.843330 (y=0 0.987939, y=1 0.200000), bal 0.593969
Val MRR: 0.860769
==> 10607228.arien.ics.muni.cz.aw_cnn <==
20000/20000 [==============================] - 178s - loss: 0.1366 - val_loss: 0.7379 val mrr 0.805531
Predict&Eval (best epoch)
Train Accuracy: raw 0.953767 (y=0 0.991300, y=1 0.652238), bal 0.821769
Train MRR: 0.954174 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.828111 (y=0 0.997807, y=1 0.073171), bal 0.535489
Val MRR: 0.863333
==> 10610660.arien.ics.muni.cz.aw_cnn <==
20000/20000 [==============================] - 179s - loss: 0.1441 - val_loss: 0.8120 val mrr 0.817308
Predict&Eval (best epoch)
Train Accuracy: raw 0.968703 (y=0 0.987069, y=1 0.821157), bal 0.904113
Train MRR: 0.955313 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.844226 (y=0 0.993421, y=1 0.180488), bal 0.586954
Val MRR: 0.888974
==> 10610661.arien.ics.muni.cz.aw_cnn <==
20000/20000 [==============================] - 177s - loss: 0.1261 - val_loss: 0.7154 val mrr 0.811282
Predict&Eval (best epoch)
Train Accuracy: raw 0.959283 (y=0 0.992956, y=1 0.688767), bal 0.840861
Train MRR: 0.960245 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.839749 (y=0 0.989035, y=1 0.175610), bal 0.582322
Val MRR: 0.872949
==> 10610662.arien.ics.muni.cz.aw_cnn <==
20000/20000 [==============================] - 176s - loss: 0.1471 - val_loss: 0.6331 val mrr 0.836044
Predict&Eval (best epoch)
Train Accuracy: raw 0.957951 (y=0 0.986333, y=1 0.729941), bal 0.858137
Train MRR: 0.948060 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.846016 (y=0 0.982456, y=1 0.239024), bal 0.610740
Val MRR: 0.869872
==> 10610663.arien.ics.muni.cz.aw_cnn <==
20000/20000 [==============================] - 176s - loss: 0.1803 - val_loss: 0.7234 val mrr 0.834762
Predict&Eval (best epoch)
Train Accuracy: raw 0.889302 (y=0 0.999947, y=1 0.000422), bal 0.500185
Train MRR: 0.866786 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.816473 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.862564
==> 10610669.arien.ics.muni.cz.aw_cnn <==
20000/20000 [==============================] - 177s - loss: 0.1798 - val_loss: 0.7526 val mrr 0.846325
Predict&Eval (best epoch)
Train Accuracy: raw 0.920856 (y=0 0.987331, y=1 0.386824), bal 0.687078
Train MRR: 0.884897 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.826321 (y=0 0.994518, y=1 0.078049), bal 0.536283
Val MRR: 0.866667
attn1511:
==> 10607198.arien.ics.muni.cz.aw_a1511 <==
20000/20000 [==============================] - 276s - loss: 0.1325 - val_loss: 0.6001 val mrr 0.791867
Predict&Eval (best epoch)
Train Accuracy: raw 0.913681 (y=0 0.979683, y=1 0.383446), bal 0.681564
Train MRR: 0.864677 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.825425 (y=0 0.991228, y=1 0.087805), bal 0.539516
Val MRR: 0.866667
==> 10607199.arien.ics.muni.cz.aw_a1511 <==
20000/20000 [==============================] - 276s - loss: 0.1069 - val_loss: 0.6251 val mrr 0.764676
Predict&Eval (best epoch)
Train Accuracy: raw 0.935442 (y=0 0.972876, y=1 0.634713), bal 0.803794
Train MRR: 0.890238 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.840645 (y=0 0.967105, y=1 0.278049), bal 0.622577
Val MRR: 0.865385
==> 10607200.arien.ics.muni.cz.aw_a1511 <==
20000/20000 [==============================] - 277s - loss: 0.1134 - val_loss: 0.5559 val mrr 0.793007
Predict&Eval (best epoch)
Train Accuracy: raw 0.940514 (y=0 0.969801, y=1 0.705236), bal 0.837518
Train MRR: 0.896228 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.840645 (y=0 0.958333, y=1 0.317073), bal 0.637703
Val MRR: 0.890769
==> 10610671.arien.ics.muni.cz.aw_a1511 <==
20000/20000 [==============================] - 275s - loss: 0.1196 - val_loss: 0.5428 val mrr 0.843846
Predict&Eval (best epoch)
Train Accuracy: raw 0.924105 (y=0 0.993324, y=1 0.368032), bal 0.680678
Train MRR: 0.911378 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.816473 (y=0 0.991228, y=1 0.039024), bal 0.515126
Val MRR: 0.886410
==> 10610672.arien.ics.muni.cz.aw_a1511 <==
20000/20000 [==============================] - 274s - loss: 0.1065 - val_loss: 0.6153 val mrr 0.825897
Predict&Eval (best epoch)
Train Accuracy: raw 0.933525 (y=0 0.976503, y=1 0.588260), bal 0.782381
Train MRR: 0.898266 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.842435 (y=0 0.970395, y=1 0.273171), bal 0.621783
Val MRR: 0.888462
==> 10610673.arien.ics.muni.cz.aw_a1511 <==
20000/20000 [==============================] - 275s - loss: 0.1062 - val_loss: 0.4998 val mrr 0.793333
Predict&Eval (best epoch)
Train Accuracy: raw 0.936400 (y=0 0.976634, y=1 0.613176), bal 0.794905
Train MRR: 0.894604 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.845121 (y=0 0.981360, y=1 0.239024), bal 0.610192
Val MRR: 0.865128
==> 10610674.arien.ics.muni.cz.aw_a1511 <==
20000/20000 [==============================] - 275s - loss: 0.0871 - val_loss: 1.0442 val mrr 0.745128
Predict&Eval (best epoch)
Train Accuracy: raw 0.943319 (y=0 0.975504, y=1 0.684755), bal 0.830130
Train MRR: 0.903599 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.842435 (y=0 0.967105, y=1 0.287805), bal 0.627455
Val MRR: 0.873333
==> 10610675.arien.ics.muni.cz.aw_a1511 <==
20000/20000 [==============================] - 276s - loss: 0.1012 - val_loss: 0.6416 val mrr 0.808974
Predict&Eval (best epoch)
Train Accuracy: raw 0.937873 (y=0 0.987910, y=1 0.535895), bal 0.761902
Train MRR: 0.910429 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.826321 (y=0 0.984649, y=1 0.121951), bal 0.553300
Val MRR: 0.873590
64x RNN with binary_crossentropy loss (bounded 0,1 predictions will make it easier to compute per-sample variances):
10675999.arien.ics.muni.cz.aw_1rnnd0_lbc etc.
[0.862821, 0.877875, 0.882198, 0.850040, 0.851026, 0.863150, 0.873706, 0.860769, 0.848681, 0.851474, 0.847326, 0.854231, 0.838462, 0.855748, 0.880962, 0.866597, 0.879359, 0.887436, 0.878246, 0.843462, 0.881685, 0.875442, 0.856410, 0.850414, 0.886154, 0.874359, 0.866630, 0.862051, 0.861709, 0.839524, 0.859134, 0.876703, 0.858132, 0.877179, 0.841931, 0.860952, 0.844017, 0.861951, 0.872977, 0.858065, 0.851874, 0.846325, 0.846044, 0.831757, 0.850962, 0.855275, 0.856581, 0.857040, 0.839744, 0.860073, 0.847051, 0.838205, 0.857179, 0.852051, 0.876282, 0.859121, 0.865128, 0.884951, 0.871429, 0.892051, 0.885425, 0.853746, 0.861795, 0.854231, ]
Old Keras experiments with dropout partially applied.
CNN:
==> 10607229.arien.ics.muni.cz.ay_cnn <==
20000/20000 [==============================] - 221s - loss: 0.1166 - val_loss: 0.3841 val mrr 0.265574
Predict&Eval (best epoch)
Train Accuracy: raw 0.964884 (y=0 0.998043, y=1 0.384102), bal 0.691072
Train MRR: 0.700279 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.934229 (y=0 0.988121, y=1 0.023544), bal 0.505832
Val MRR: 0.364058
==> 10607230.arien.ics.muni.cz.ay_cnn <==
20000/20000 [==============================] - 219s - loss: 0.1115 - val_loss: 0.3619 val mrr 0.259407
Predict&Eval (best epoch)
Train Accuracy: raw 0.962491 (y=0 0.996293, y=1 0.370449), bal 0.683371
Train MRR: 0.699684 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.930975 (y=0 0.983207, y=1 0.048327), bal 0.515767
Val MRR: 0.334033
==> 10607231.arien.ics.muni.cz.ay_cnn <==
20000/20000 [==============================] - 221s - loss: 0.1256 - val_loss: 0.3087 val mrr 0.255388
Predict&Eval (best epoch)
Train Accuracy: raw 0.953725 (y=0 0.998389, y=1 0.171420), bal 0.584904
Train MRR: 0.593810 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.943991 (y=0 0.999853, y=1 0.000000), bal 0.499927
Val MRR: 0.380828
==> 10610664.arien.ics.muni.cz.ay_cnn <==
20000/20000 [==============================] - 219s - loss: 0.1383 - val_loss: 0.3556 val mrr 0.271949
Predict&Eval (best epoch)
Train Accuracy: raw 0.949972 (y=0 0.999965, y=1 0.074333), bal 0.537149
Train MRR: 0.503931 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944060 (y=0 0.999927, y=1 0.000000), bal 0.499963
Val MRR: 0.344283
==> 10610665.arien.ics.muni.cz.ay_cnn <==
20000/20000 [==============================] - 222s - loss: 0.1264 - val_loss: 0.3648 val mrr 0.288607
Predict&Eval (best epoch)
Train Accuracy: raw 0.957494 (y=0 0.997627, y=1 0.254551), bal 0.626089
Train MRR: 0.668582 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.941567 (y=0 0.996847, y=1 0.007435), bal 0.502141
Val MRR: 0.367982
==> 10610666.arien.ics.muni.cz.ay_cnn <==
20000/20000 [==============================] - 220s - loss: 0.1204 - val_loss: 0.3388 val mrr 0.269678
Predict&Eval (best epoch)
Train Accuracy: raw 0.955576 (y=0 0.998822, y=1 0.198119), bal 0.598471
Train MRR: 0.646638 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.943506 (y=0 0.999193, y=1 0.002478), bal 0.500836
Val MRR: 0.370531
==> 10610667.arien.ics.muni.cz.ay_cnn <==
20000/20000 [==============================] - 219s - loss: 0.1241 - val_loss: 0.3724 val mrr 0.255753
Predict&Eval (best epoch)
Train Accuracy: raw 0.950382 (y=0 0.999965, y=1 0.081917), bal 0.540941
Train MRR: 0.598393 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944129 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.399154
==> 10610668.arien.ics.muni.cz.ay_cnn <==
20000/20000 [==============================] - 220s - loss: 0.1301 - val_loss: 0.2897 val mrr 0.250608
Predict&Eval (best epoch)
Train Accuracy: raw 0.955445 (y=0 0.997177, y=1 0.224515), bal 0.610846
Train MRR: 0.572635 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.941291 (y=0 0.996480, y=1 0.008674), bal 0.502577
Val MRR: 0.345531
attn1511:
==> 10607195.arien.ics.muni.cz.ay_a1511 <==
20000/20000 [==============================] - 337s - loss: 0.1076 - val_loss: 0.3202 val mrr 0.247784
Predict&Eval (best epoch)
Train Accuracy: raw 0.953479 (y=0 0.997713, y=1 0.178701), bal 0.588207
Train MRR: 0.501824 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944337 (y=0 0.999927, y=1 0.004957), bal 0.502442
Val MRR: 0.483186
==> 10607196.arien.ics.muni.cz.ay_a1511 <==
20000/20000 [==============================] - 338s - loss: 0.1323 - val_loss: 0.3089 val mrr 0.211629
Predict&Eval (best epoch)
Train Accuracy: raw 0.951627 (y=0 0.999307, y=1 0.116505), bal 0.557906
Train MRR: 0.446953 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944129 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.489259
==> 10607197.arien.ics.muni.cz.ay_a1511 <==
20000/20000 [==============================] - 337s - loss: 0.1334 - val_loss: 0.3269 val mrr 0.274479
Predict&Eval (best epoch)
Train Accuracy: raw 0.951709 (y=0 0.999117, y=1 0.121359), bal 0.560238
Train MRR: 0.456003 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944129 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.506934
==> 10610676.arien.ics.muni.cz.ay_a1511 <==
20000/20000 [==============================] - 336s - loss: 0.1322 - val_loss: 0.3004 val mrr 0.354856
Predict&Eval (best epoch)
Train Accuracy: raw 0.950759 (y=0 0.999463, y=1 0.097694), bal 0.548579
Train MRR: 0.452666 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944129 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.483744
==> 10610677.arien.ics.muni.cz.ay_a1511 <==
20000/20000 [==============================] - 336s - loss: 0.1306 - val_loss: 0.2994 val mrr 0.273350
Predict&Eval (best epoch)
Train Accuracy: raw 0.951660 (y=0 0.999307, y=1 0.117112), bal 0.558209
Train MRR: 0.455662 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944129 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.457244
==> 10610678.arien.ics.muni.cz.ay_a1511 <==
20000/20000 [==============================] - 336s - loss: 0.1320 - val_loss: 0.3083 val mrr 0.332179
Predict&Eval (best epoch)
Train Accuracy: raw 0.951742 (y=0 0.999307, y=1 0.118629), bal 0.558968
Train MRR: 0.473461 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944129 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.454481
==> 10610679.arien.ics.muni.cz.ay_a1511 <==
20000/20000 [==============================] - 336s - loss: 0.1461 - val_loss: 0.2929 val mrr 0.243183
Predict&Eval (best epoch)
Train Accuracy: raw 0.949841 (y=0 0.999965, y=1 0.071905), bal 0.535935
Train MRR: 0.443360 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944129 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.473662
==> 10610680.arien.ics.muni.cz.ay_a1511 <==
20000/20000 [==============================] - 337s - loss: 0.1136 - val_loss: 0.2985 val mrr 0.293237
Predict&Eval (best epoch)
Train Accuracy: raw 0.952102 (y=0 0.998874, y=1 0.132888), bal 0.565881
Train MRR: 0.502078 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944129 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.527090
With dropout fully fixed... (and not applied at all :)
attn1511:
==> 10658672.arien.ics.muni.cz.ay_1a51d0 <==
15256/15256 [==============================] - 271s - loss: 0.1090 - val_loss: 0.2547 val mrr 0.351885
Predict&Eval (best epoch)
Train Accuracy: raw 0.949612 (y=0 0.999723, y=1 0.071905), bal 0.535814
Train MRR: 0.436671 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944129 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.477024
==> 10658673.arien.ics.muni.cz.ay_1a51d0 <==
15256/15256 [==============================] - 268s - loss: 0.1167 - val_loss: 0.2528 val mrr 0.368745
Predict&Eval (best epoch)
Train Accuracy: raw 0.951676 (y=0 0.999238, y=1 0.118629), bal 0.558933
Train MRR: 0.476874 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944129 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.455434
==> 10658674.arien.ics.muni.cz.ay_1a51d0 <==
15256/15256 [==============================] - 269s - loss: 0.1627 - val_loss: 0.2321 val mrr 0.285615
Predict&Eval (best epoch)
Train Accuracy: raw 0.951922 (y=0 0.996986, y=1 0.162621), bal 0.579804
Train MRR: 0.449095 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944129 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.449533
==> 10658675.arien.ics.muni.cz.ay_1a51d0 <==
15256/15256 [==============================] - 267s - loss: 0.1071 - val_loss: 0.3014 val mrr 0.274728
Predict&Eval (best epoch)
Train Accuracy: raw 0.952643 (y=0 0.996432, y=1 0.185680), bal 0.591056
Train MRR: 0.468218 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944129 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.483179
(attn1511 could apparently make do with some dropout after all).
RNN:
==> 10658668.arien.ics.muni.cz.ay_1rnnd0 <==
20000/20000 [==============================] - 183s - loss: 0.0599 - val_loss: 0.4257 val mrr 0.258471
Predict&Eval (best epoch)
Train Accuracy: raw 0.955101 (y=0 0.998112, y=1 0.201760), bal 0.599936
Train MRR: 0.548954 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.943091 (y=0 0.998827, y=1 0.001239), bal 0.500033
Val MRR: 0.307231
==> 10658669.arien.ics.muni.cz.ay_1rnnd0 <==
15256/15256 [==============================] - 150s - loss: 0.0247 - val_loss: 0.4625 val mrr 0.266685
Predict&Eval (best epoch)
Train Accuracy: raw 0.990906 (y=0 0.997523, y=1 0.875000), bal 0.936261
Train MRR: 0.955644 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.921698 (y=0 0.973235, y=1 0.050805), bal 0.512020
Val MRR: 0.322597
==> 10658670.arien.ics.muni.cz.ay_1rnnd0 <==
15256/15256 [==============================] - 150s - loss: 0.0833 - val_loss: 0.4153 val mrr 0.316059
Predict&Eval (best epoch)
Train Accuracy: raw 0.957690 (y=0 0.998233, y=1 0.247573), bal 0.622903
Train MRR: 0.616726 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.940044 (y=0 0.995160, y=1 0.008674), bal 0.501917
Val MRR: 0.347030
==> 10658671.arien.ics.muni.cz.ay_1rnnd0 <==
15256/15256 [==============================] - 149s - loss: 0.0864 - val_loss: 0.3491 val mrr 0.280607
Predict&Eval (best epoch)
Train Accuracy: raw 0.951545 (y=0 0.999896, y=1 0.104672), bal 0.552284
Train MRR: 0.531235 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.944129 (y=0 1.000000, y=1 0.000000), bal 0.500000
Val MRR: 0.309661
64x RNN with binary_crossentropy loss (bounded 0,1 predictions will make it easier to compute per-sample variances):
10675927.arien.ics.muni.cz.ay_1rnnd0_lbc etc.
[0.333701, 0.326125, 0.336284, 0.345328, 0.339932, 0.299496, 0.308057, 0.294640, 0.371689, 0.358677, 0.316229, 0.351746, 0.389020, 0.344435, 0.319640, 0.335084, 0.350999, 0.344522, 0.330878, 0.316592, 0.392825, 0.325323, 0.376527, 0.350433, 0.353456, 0.291590, 0.317691, 0.352287, 0.365691, 0.307232, 0.327808, 0.327562, 0.358058, 0.371684, 0.373486, 0.341047, 0.339725, 0.340547, 0.334846, 0.342750, 0.291530, 0.323766, 0.334265, 0.358144, 0.341592, 0.343868, 0.310702, 0.326541, 0.319661, 0.339354, 0.299108, 0.331717, 0.310318, 0.362667, 0.327149, 0.355891, 0.305087, 0.359355, 0.357987, 0.331698, 0.316316, 0.347250, 0.334750, 0.308903, ]
Observation: Disregarding number of pairs, the number of questions (on which MRR is measured) is actually pretty small on the val set (88)!
32x RNN with binary_crossentropy loss, using large2470-val for validation (MRR computed from 333 questions):
10676451.arien.ics.muni.cz.ayl_1rnnd0_lbc etc.
[0.354874, 0.318208, 0.317254, 0.393136, 0.326733, 0.371319, 0.338579, 0.367860, 0.343624, 0.375447, 0.349652, 0.365866, 0.336435, 0.345256, 0.341385, 0.357561, 0.353628, 0.344108, 0.324256, 0.351309, 0.369522, 0.348427, 0.334746, 0.354037, 0.362202, 0.356219, 0.363204, 0.348845, 0.366015, 0.359293, 0.337874, 0.330425, ]
Ok, that didn't help. Not so surprising given the per-pair accuracy reported above also fluctuating.
Observation: Tiny changes in rank may transfer to huge changes in MRR near the top. What if we try to alleviate this by looking at hard questions with many alternatives only?
32x RNN with binary_crossentropy loss, using large2470-val for validation, but only questions with 100 or more pairs (MRR computed from ~230 questions):
10676483.arien.ics.muni.cz.ayl_1rnnd0_lbc_mq100 etc.
[0.261798, 0.250904, 0.254477, 0.226031, 0.243203, 0.254298, 0.296402, 0.225364, 0.254544, 0.270242, 0.225916, 0.247560, 0.250359, 0.256854, 0.245027, 0.234488, 0.246186, 0.265761, 0.259943, 0.267567, 0.247762, 0.277862, 0.259452, 0.268017, 0.264453, 0.218607, 0.239640, 0.244359, 0.243594, 0.302583, 0.265366, 0.240797, ]
New Keras only...
attn1511:
==> 10658677.arien.ics.muni.cz.al_1a51d0 <==
45136/45136 [==============================] - 831s - loss: 0.1019 - val_loss: 0.2926 val mrr 0.348215
Predict&Eval (best epoch)
Train Accuracy: raw 0.944385 (y=0 0.999248, y=1 0.144755), bal 0.572002
Train MRR: 0.495441 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.939602 (y=0 0.998748, y=1 0.015060), bal 0.506904
Val MRR: 0.406924
==> 10658678.arien.ics.muni.cz.al_1a51d0 <==
45136/45136 [==============================] - 831s - loss: 0.1059 - val_loss: 0.2699 val mrr 0.399776
Predict&Eval (best epoch)
Train Accuracy: raw 0.941538 (y=0 0.999307, y=1 0.099551), bal 0.549429
Train MRR: 0.440954 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.940055 (y=0 0.999518, y=1 0.010542), bal 0.505030
Val MRR: 0.416866
==> 10658679.arien.ics.muni.cz.al_1a51d0 <==
45136/45136 [==============================] - 808s - loss: 0.1054 - val_loss: 0.2854 val mrr 0.345488
Predict&Eval (best epoch)
Train Accuracy: raw 0.944767 (y=0 0.996088, y=1 0.196774), bal 0.596431
Train MRR: 0.475772 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.933717 (y=0 0.990606, y=1 0.044428), bal 0.517517
Val MRR: 0.429878
==> 10658680.arien.ics.muni.cz.al_1a51d0 <==
45136/45136 [==============================] - 807s - loss: 0.1189 - val_loss: 0.2734 val mrr 0.368531
Predict&Eval (best epoch)
Train Accuracy: raw 0.941676 (y=0 0.989500, y=1 0.244651), bal 0.617076
Train MRR: 0.472865 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.907140 (y=0 0.960909, y=1 0.066642), bal 0.513775
Val MRR: 0.445978
RNN:
==> 10658681.arien.ics.muni.cz.al_1rnnd0 <==
45136/45136 [==============================] - 453s - loss: 0.0877 - val_loss: 0.2444 val mrr 0.390329
Predict&Eval (best epoch)
Train Accuracy: raw 0.940995 (y=0 0.999663, y=1 0.085921), bal 0.542792
Train MRR: 0.485474 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.940576 (y=0 0.999326, y=1 0.022214), bal 0.510770
Val MRR: 0.428687
==> 10658682.arien.ics.muni.cz.al_1rnnd0 <==
45136/45136 [==============================] - 456s - loss: 0.0843 - val_loss: 0.3327 val mrr 0.404100
Predict&Eval (best epoch)
Train Accuracy: raw 0.944163 (y=0 0.997751, y=1 0.163130), bal 0.580440
Train MRR: 0.474562 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.935686 (y=0 0.993208, y=1 0.036521), bal 0.514864
Val MRR: 0.426952
==> 10658683.arien.ics.muni.cz.al_1rnnd0 <==
45136/45136 [==============================] - 457s - loss: 0.2585 - val_loss: 0.2296 val mrr 0.076974
Predict&Eval (best epoch)
Train Accuracy: raw 0.984790 (y=0 0.995857, y=1 0.823499), bal 0.909678
Train MRR: 0.926491 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.923213 (y=0 0.972157, y=1 0.158133), bal 0.565145
Val MRR: 0.402299
==> 10658684.arien.ics.muni.cz.al_1rnnd0 <==
45136/45136 [==============================] - 451s - loss: 0.0227 - val_loss: 0.3588 val mrr 0.403606
Predict&Eval (best epoch)
Train Accuracy: raw 0.993808 (y=0 0.998621, y=1 0.923654), bal 0.961138
Train MRR: 0.974894 (on training set, y=0 may be subsampled!)
Val Accuracy: raw 0.927174 (y=0 0.976877, y=1 0.150226), bal 0.563552
Val MRR: 0.414855
8x RNN with binary_crossentropy loss (bounded 0,1 predictions will make it easier to compute per-sample variances):
10676209.arien.ics.muni.cz.al_1rnnd0_lbc etc.
[0.423496, 0.396040, 0.409522, 0.437343, 0.417056, 0.421266, 0.417228, 0.417645, ]
-
Are some questions more variable than others? Per-question RR variability.
-
Maybe the validation performance is bound to train performance because we sometimes tend to overfit too drastically - finer epochs could enable us to catch a non-overfit state more reliably. Try with a lot smaller epoch_fract. (wip)