Skip to content

1604KWWeights

Petr Baudis edited this page Apr 29, 2016 · 7 revisions

Keyword Weights as Inputs

YodaQA type of the anssel task datasets includes additional feature for the input pairs - weights of keywords and about-keywords of s0 matched in s1.

They are pretty strong predictors on their own (curatedv2 devMRR 0.337348, large2470 devMRR 0.318246).

TODO: We could also augment this with (or use only just...) BM25 weights. That could work for other datasets as well, and is an alternative use for the prescoring logic.

KWWeights

Baselines (we did these measurements with the vocabcase setting):

8x R_ay_3rnn - 0.419903 (95% [0.399927, 0.439880])

4x R_al_3rnn - 0.395602 (95% [0.383595, 0.407609])

4x R_al_3a51 - 0.404151 (95% [0.382397, 0.425904])

8x R_ay_3rnn_kw - 0.452198 (95% [0.436496, 0.467899]):

10884109.arien.ics.muni.cz.R_ay_3rnn_kw etc.
[0.467730, 0.466489, 0.458678, 0.480130, 0.427241, 0.423624, 0.452207, 0.441481, ]

4x R_al_3rnn_kw - 0.411832 (95% [0.388420, 0.435244]):

10884136.arien.ics.muni.cz.R_al_3rnn_kw etc.
[0.400349, 0.424932, 0.427774, 0.394274, ]

4x R_al_3a51_kw - 0.465138 (95% [0.461127, 0.469148]):

10884138.arien.ics.muni.cz.R_al_3a51_kw etc.
[0.465793, 0.468988, 0.462912, 0.462857, ]

KWWeights on master

Wrt. the master baseliens:

8x R_ay_2rnn_kw - 0.470143 (95% [0.444607, 0.495678]):

10911926.arien.ics.muni.cz.R_ay_2rnn_kw etc.
[0.432749, 0.442759, 0.479331, 0.504750, 0.480979, 0.422751, 0.501615, 0.496206, ]

4x R_al_2rnn_kw - 0.423874 (95% [0.406368, 0.441380]):

10911924.arien.ics.muni.cz.R_al_2rnn_kw etc.
[0.418824, 0.418950, 0.442729, 0.414993, ]

4x R_al_2a51_kw - 0.457016 (95% [0.434550, 0.479482]):

10930683.arien.ics.muni.cz.R_al_2a51_kw etc.
[0.469147, 0.470216, 0.453156, 0.435544, ]

Zero-dropout experiment

8x R_ay_2rnnd0_kw - 0.434593 (95% [0.420952, 0.448234]):

10911927.arien.ics.muni.cz.R_ay_2rnnd0_kw etc.
[0.446196, 0.467435, 0.432201, 0.426130, 0.432023, 0.435823, 0.405995, 0.430943, ]

4x R_al_2rnnd0_kw - 0.446685 (95% [0.438853, 0.454517]):

10911925.arien.ics.muni.cz.R_al_2rnnd0_kw etc.
[0.439836, 0.452840, 0.444595, 0.449469, ]

Same trend as with Ubuntu - with large dataset, dropout advantage tapers off.

Zero-dropout-zero-L2reg experiment

TODO transfer learning check

4x R_al_2rnnd0L0_kw - 0.442456 (95% [0.431101, 0.453812]):

10930681.arien.ics.muni.cz.R_al_2rnnd0L0_kw etc.
[0.440065, 0.441733, 0.453828, 0.434200, ]

4x R_al_2a51d0L0_kw - 0.441782 (95% [0.415093, 0.468470]):

10930684.arien.ics.muni.cz.R_al_2a51d0L0_kw etc.
[0.467851, 0.422707, 0.432930, 0.443639, ]

Zero-padding experiment

Let's check if forcing _PAD_ to zero (part of the argus clasrel pull request) is harmless.

16x R_ay_2rnn_kw - 0.469423 (95% [0.455889, 0.482957]):

10911926.arien.ics.muni.cz.R_ay_2rnn_kw etc.
[0.432749, 0.442759, 0.479331, 0.504750, 0.480979, 0.422751, 0.501615, 0.496206, 0.485784, 0.452306, 0.443067, 0.458959, 0.505849, 0.474536, 0.471225, 0.457896, ]

16x R_ay_2a51_kw - 0.485543 (95% [0.476930, 0.494155]):

11123025.arien.ics.muni.cz.R_ay_2a51_kw etc.
[0.477314, 0.495001, 0.475405, 0.482468, 0.470529, 0.492458, 0.498071, 0.475183, 0.461710, 0.480581, 0.484353, 0.524057, 0.514815, 0.473767, 0.469940, 0.493029, ]

8x R_ay_2rnn_kw_pad0 - 0.450365 (95% [0.430197, 0.470533]):

11141639.arien.ics.muni.cz.R_ay_2rnn_kw_pad0 etc.
[0.441757, 0.446003, 0.430938, 0.493640, 0.425181, 0.430856, 0.448221, 0.486323, ]

8x R_ay_2a51_kw_pad0 - 0.496796 (95% [0.489058, 0.504534]):

11141678.arien.ics.muni.cz.R_ay_2a51_kw_pad0 etc.
[0.502875, 0.506105, 0.484980, 0.494001, 0.479094, 0.504348, 0.501516, 0.501447, ]

Ok, looks harmless enough.

Final anssel results

curatedv2

Model trainAllMRR devMRR testMAP testMRR settings
yodaqakw 0.368773 0.337348 0.284100 0.383238 (defaults)
termfreq BM25 #w 0.483538 0.452647 0.294300 0.484530 (defaults)
-------------------------- ------------- ---------- ---------- ---------- ---------
avg 0.422881 0.402618 0.229694 0.329356 (defaults)
±0.024685 ±0.006664 ±0.001715 ±0.003511
DAN 0.437119 0.430754 0.233000 0.354075 inp_e_dropout=0 inp_w_dropout=1/3 deep=2 pact='relu'
±0.014494 ±0.014477 ±0.002657 ±0.010307
rnn 0.459869 0.429780 0.228869 0.341706 (defaults)
±0.035981 ±0.015609 ±0.005554 ±0.010643
cnn 0.544067 0.363028 0.228538 0.309165 (defaults)
±0.037730 ±0.011041 ±0.004791 ±0.009649
rnncnn 0.578608 0.374195 0.238200 0.344659 (defaults)
±0.044228 ±0.023533 ±0.007741 ±0.014747
attn1511 0.432403 0.475125 0.275219 0.468555 (defaults)
±0.016183 ±0.012810 ±0.006562 ±0.014433
-------------------------- ------------- ---------- ---------- ---------- ---------
avg 0.487246 0.451062 0.250563 0.370919 f_add_kw=True
±0.046523 ±0.007836 ±0.005624 ±0.008380
DAN 0.492934 0.483218 0.279650 0.441829 inp_e_dropout=0 inp_w_dropout=1/3 deep=2 pact='relu' f_add_kw=True
±0.037740 ±0.007931 ±0.004544 ±0.009156
rnn 0.488602 0.469423 0.255750 0.403185 f_add_kw=True
±0.030025 ±0.013534 ±0.005382 ±0.010489
cnn 0.572758 0.410014 0.248494 0.350063 f_add_kw=True
±0.025883 ±0.012990 ±0.005084 ±0.010683
rnncnn 0.555559 0.419693 0.259669 0.386323 f_add_kw=True
±0.035131 ±0.019007 ±0.005742 ±0.015534
attn1511 0.475656 0.485543 0.299025 0.473519 f_add_kw=True
±0.014700 ±0.008612 ±0.004635 ±0.004926

16x R_ay_2avg_preBM25f - 0.456198 (95% [0.441375, 0.471021]):

11164666.arien.ics.muni.cz.R_ay_2avg_preBM25f etc.
[0.498219, 0.479567, 0.484712, 0.504871, 0.490978, 0.437424, 0.426823, 0.441328, 0.457319, 0.437506, 0.418024, 0.427199, 0.428851, 0.431862, 0.469301, 0.465179, ]

16x R_ay_2dan_preBM25f - 0.529542 (95% [0.518664, 0.540419]):

11164667.arien.ics.muni.cz.R_ay_2dan_preBM25f etc.
[0.550406, 0.532501, 0.501926, 0.502845, 0.524899, 0.518520, 0.554686, 0.509359, 0.533622, 0.554307, 0.534756, 0.531446, 0.509179, 0.565314, 0.500386, 0.548515, ]

15x R_ay_2rnn_preBM25f - 0.495364 (95% [0.487221, 0.503508]):

11164668.arien.ics.muni.cz.R_ay_2rnn_preBM25f etc.
[0.488752, 0.493261, 0.513562, 0.466224, 0.497205, 0.477107, 0.498736, 0.486709, 0.527588, 0.479961, 0.497441, 0.502171, 0.502466, 0.509093, 0.490191, ]

16x R_ay_2cnn_preBM25f - 0.483106 (95% [0.472069, 0.494143]):

11164669.arien.ics.muni.cz.R_ay_2cnn_preBM25f etc.
[0.488898, 0.472309, 0.493535, 0.506969, 0.505426, 0.470859, 0.468073, 0.483682, 0.480535, 0.478417, 0.486767, 0.454820, 0.473261, 0.441064, 0.493540, 0.531538, ]

16x R_ay_2rnncnn_preBM25f - 0.481229 (95% [0.465358, 0.497100]):

11164670.arien.ics.muni.cz.R_ay_2rnncnn_preBM25f etc.
[0.505181, 0.483252, 0.451131, 0.454070, 0.433947, 0.536033, 0.446546, 0.537421, 0.490100, 0.490110, 0.461282, 0.499851, 0.476747, 0.468474, 0.457429, 0.508093, ]

16x R_ay_2a51_preBM25f - 0.475746 (95% [0.464648, 0.486844]):

11164671.arien.ics.muni.cz.R_ay_2a51_preBM25f etc.
[0.427083, 0.514587, 0.464229, 0.494987, 0.466147, 0.464223, 0.487800, 0.455298, 0.512902, 0.482585, 0.471736, 0.463600, 0.475212, 0.473154, 0.487878, 0.470510, ]

BM25 prescoring is even better than kw prescoring! (Also, high DAN performance is intriguing in principle...)

large2480

Model trainAllMRR devMRR testMAP testMRR settings
yodaqakw 0.332693 0.318246 0.303900 0.376465 (defaults)
termfreq BM25 #w 0.441573 0.432115 0.313900 0.490822 (defaults)
-------------------------- ------------- ---------- ---------- ---------- ---------
avg 0.798883 0.408034 0.262569 0.362190 (defaults)
±0.026554 ±0.004656 ±0.002054 ±0.005725
DAN 0.646481 0.404210 0.272675 0.386522 inp_e_dropout=0 inp_w_dropout=1/3 deep=2 pact='relu'
±0.070994 ±0.005378 ±0.003028 ±0.007627
rnn 0.460984 0.382949 0.262463 0.381298 (defaults)
±0.023715 ±0.006451 ±0.002641 ±0.007643
cnn 0.550441 0.348247 0.264476 0.353243 (defaults)
±0.069701 ±0.006217 ±0.002918 ±0.009620
rnncnn 0.681908 0.408662 0.286118 0.394865 (defaults)
±0.114967 ±0.008659 ±0.003501 ±0.011895
attn1511 0.445635 0.408495 0.288100 0.430892 (defaults)
±0.056352 ±0.008744 ±0.005601 ±0.017858
-------------------------- ------------- ---------- ---------- ---------- ---------
avg 0.647144 0.420943 0.289044 0.419559 f_add_kw=True
±0.068187 ±0.004745 ±0.002541 ±0.011235
DAN 0.578884 0.454751 0.316606 0.472173 inp_e_dropout=0 inp_w_dropout=1/3 deep=2 pact='relu' f_add_kw=True
±0.051564 ±0.005778 ±0.004260 ±0.006205
rnn 0.471287 0.423417 0.296419 0.446478 f_add_kw=True
±0.021866 ±0.007853 ±0.005486 ±0.011307
cnn 0.532295 0.375244 0.285288 0.398820 f_add_kw=True
±0.052085 ±0.006402 ±0.002901 ±0.009145
rnncnn 0.595107 0.430172 0.308475 0.444440 f_add_kw=True
±0.091860 ±0.010868 ±0.005538 ±0.013976
attn1511 0.488763 0.455023 0.330781 0.492604 f_add_kw=True
±0.015243 ±0.006933 ±0.002899 ±0.005126

wang

Model trainAllMRR devMRR testMAP testMRR settings
termfreq BM25 #w 0.813992 0.829004 0.630100 0.765363 (defaults)
-------------------------- ------------- ---------- ---------- ---------- ---------
avg 0.786983 0.799939 0.607031 0.689948 (defaults)
±0.019449 ±0.007218 ±0.005516 ±0.009912
DAN 0.838842 0.828035 0.643288 0.734727 inp_e_dropout=0 inp_w_dropout=1/3 deep=2 pact='relu'
±0.013775 ±0.007839 ±0.009993 ±0.008747
rnn 0.791770 0.842155 0.648863 0.742747 (defaults)
±0.017036 ±0.009447 ±0.010918 ±0.009896
cnn 0.845162 0.841343 0.690906 0.770042 (defaults)
±0.015552 ±0.005409 ±0.006910 ±0.010381
rnncnn 0.922721 0.849363 0.716519 0.797826 (defaults)
±0.019407 ±0.006259 ±0.007169 ±0.011460
attn1511 0.852364 0.851368 0.708163 0.789822 (defaults)
±0.017280 ±0.005533 ±0.008958 ±0.013308

32x R_aw_2avg_preBM25f - 0.859210 (95% [0.855743, 0.862678]):

11165383.arien.ics.muni.cz.R_aw_2avg_preBM25f etc.
[0.860256, 0.865385, 0.866667, 0.860256, 0.844872, 0.861538, 0.866667, 0.862821, 0.869231, 0.851282, 0.880769, 0.857692, 0.855128, 0.852564, 0.844219, 0.855128, 0.869231, 0.860256, 0.856410, 0.862821, 0.837179, 0.876154, 0.857692, 0.844872, 0.869231, 0.862821, 0.851282, 0.870513, 0.858974, 0.843590, 0.860256, 0.858974, ]

32x R_aw_2dan_preBM25f - 0.850233 (95% [0.846307, 0.854159]):

11165384.arien.ics.muni.cz.R_aw_2dan_preBM25f etc.
[0.836923, 0.842821, 0.840769, 0.856410, 0.863333, 0.864103, 0.853077, 0.830513, 0.858205, 0.847949, 0.831282, 0.841026, 0.862051, 0.849231, 0.852308, 0.850513, 0.855641, 0.868462, 0.864615, 0.840886, 0.859487, 0.835128, 0.846795, 0.847949, 0.846154, 0.840897, 0.839744, 0.852465, 0.865385, 0.858974, 0.867949, 0.836410, ]

16x R_aw_2rnn_preBM25f - 0.872479 (95% [0.865383, 0.879576]):

11165385.arien.ics.muni.cz.R_aw_2rnn_preBM25f etc.
[0.891026, 0.876154, 0.883333, 0.867949, 0.871538, 0.871795, 0.887692, 0.883333, 0.847436, 0.882051, 0.880403, 0.847179, 0.876923, 0.853846, 0.859890, 0.879121, ]

16x R_aw_2cnn_preBM25f - 0.867242 (95% [0.862184, 0.872300]):

11165386.arien.ics.muni.cz.R_aw_2cnn_preBM25f etc.
[0.856923, 0.872308, 0.854274, 0.869231, 0.875641, 0.852564, 0.880427, 0.865385, 0.856410, 0.870513, 0.878205, 0.863333, 0.877839, 0.865385, 0.856667, 0.880769, ]

16x R_aw_2rnncnn_preBM25f - 0.862151 (95% [0.856422, 0.867880]):

11165387.arien.ics.muni.cz.R_aw_2rnncnn_preBM25f etc.
[0.860256, 0.858718, 0.839744, 0.872308, 0.856410, 0.842308, 0.868462, 0.860403, 0.858974, 0.873590, 0.871429, 0.861538, 0.882564, 0.862051, 0.870385, 0.855275, ]

16x R_aw_2a51_preBM25f - 0.864038 (95% [0.859672, 0.868405]):

11165388.arien.ics.muni.cz.R_aw_2a51_preBM25f etc.
[0.855128, 0.871795, 0.872308, 0.875641, 0.855641, 0.859487, 0.847436, 0.862821, 0.864615, 0.866667, 0.871795, 0.859487, 0.852564, 0.865385, 0.874872, 0.868974, ]

BM25 prescoring is again awesome, though it doesn't help that much as the source dataset already comes from IR system that probably uses BM25.

BM25 Feature+Pruning Combo

4x R_ay_2dan_preBM25f - 0.521920 (95% [0.489241, 0.554598]):

4x R_ay_2dan_preBM25fp10 - 0.498505 (95% [0.459013, 0.537998]):

11164679.arien.ics.muni.cz.R_ay_2dan_preBM25fp10 etc.
[0.515782, 0.458379, 0.497800, 0.522061, ]

4x R_ay_2dan_preBM25fp20 - 0.472205 (95% [0.413860, 0.530549]):

11164672.arien.ics.muni.cz.R_ay_2dan_preBM25fp20 etc.
[0.529796, 0.427913, 0.467881, 0.463228, ]

4x R_ay_2dan_preBM25fp40 - 0.498207 (95% [0.471961, 0.524453]):

11164683.arien.ics.muni.cz.R_ay_2dan_preBM25fp40 etc.
[0.511627, 0.507601, 0.503527, 0.470072, ]

4x R_ay_2dan_preBM25fp80 - 0.498641 (95% [0.483196, 0.514086]):

11164704.arien.ics.muni.cz.R_ay_2dan_preBM25fp80 etc.
[0.511180, 0.499969, 0.483906, 0.499509, ]

4x R_ay_2rnn_preBM25f - 0.490450 (95% [0.463683, 0.517216]):

4x R_ay_2rnn_preBM25fp10 - 0.451735 (95% [0.431087, 0.472382]):

11164718.arien.ics.muni.cz.R_ay_2rnn_preBM25fp10 etc.
[0.429715, 0.462510, 0.459515, 0.455199, ]

4x R_ay_2rnn_preBM25fp20 - 0.498031 (95% [0.480156, 0.515906]):

11164710.arien.ics.muni.cz.R_ay_2rnn_preBM25fp20 etc.
[0.508158, 0.483139, 0.491218, 0.509608, ]

4x R_ay_2rnn_preBM25fp40 - 0.462664 (95% [0.435296, 0.490033]):

11164714.arien.ics.muni.cz.R_ay_2rnn_preBM25fp40 etc.
[0.460831, 0.472898, 0.481267, 0.435661, ]

non-conclusive; next step: 32-way baseline vs. p20.

16x R_ay_2dan_preBM25f - 0.529542 (95% [0.518664, 0.540419]):

15x R_ay_2rnn_preBM25f - 0.495364 (95% [0.487221, 0.503508]):

32x R_ay_2dan_preBM25fp20 - 0.505769 (95% [0.486799, 0.524739]):

11164672.arien.ics.muni.cz.R_ay_2dan_preBM25fp20 etc.
[0.529796, 0.427913, 0.467881, 0.463228, 0.525758, 0.479401, 0.521772, 0.517636, 0.523367, 0.539304, 0.531952, 0.532263, 0.530724, 0.527800, 0.520635, 0.531143, 0.518818, 0.526477, 0.534160, 0.546105, 0.463003, 0.519570, 0.454251, 0.532180, 0.522453, 0.526706, 0.512854, 0.514134, 0.257657, 0.532079, 0.531198, 0.522390, ]

32x R_ay_2rnn_preBM25fp20 - 0.486556 (95% [0.479236, 0.493876]):

11164710.arien.ics.muni.cz.R_ay_2rnn_preBM25fp20 etc.
[0.508158, 0.483139, 0.491218, 0.509608, 0.469012, 0.494205, 0.483975, 0.480952, 0.461927, 0.504582, 0.461186, 0.460236, 0.481414, 0.480188, 0.457711, 0.461436, 0.491194, 0.483480, 0.478641, 0.484989, 0.494288, 0.466136, 0.491828, 0.526553, 0.473151, 0.493774, 0.466335, 0.492406, 0.509150, 0.465757, 0.525069, 0.538084, ]

32x R_aw_2dan_preBM25f - 0.850233 (95% [0.846307, 0.854159]):

32x R_aw_2dan_preBM25fp5 - 0.863100 (95% [0.860764, 0.865435]):

11171392.arien.ics.muni.cz.R_aw_2dan_preBM25fp5 etc.
[0.871513, 0.854846, 0.856128, 0.862539, 0.870744, 0.868949, 0.871513, 0.861257, 0.862539, 0.861770, 0.861257, 0.854846, 0.863052, 0.857411, 0.859205, 0.870231, 0.865103, 0.859205, 0.869462, 0.868949, 0.861257, 0.859205, 0.870744, 0.856128, 0.862539, 0.871513, 0.872795, 0.868949, 0.865103, 0.845103, 0.855359, 0.859975, ]

32x R_aw_2dan_preBM25fp10 - 0.856020 (95% [0.852684, 0.859356]):

11171394.arien.ics.muni.cz.R_aw_2dan_preBM25fp10 etc.
[0.842637, 0.856740, 0.835861, 0.844689, 0.861355, 0.851099, 0.860073, 0.847766, 0.854945, 0.854176, 0.847912, 0.850330, 0.843919, 0.851612, 0.849817, 0.854945, 0.856740, 0.855458, 0.856227, 0.854945, 0.849817, 0.869560, 0.860073, 0.866484, 0.866996, 0.881868, 0.867766, 0.854945, 0.858791, 0.851099, 0.863150, 0.870842, ]

4x R_aw_2dan_preBM25fp20 - 0.855614 (95% [0.840413, 0.870814]):

11171395.arien.ics.muni.cz.R_aw_2dan_preBM25fp20 etc. [0.853077, 0.847436, 0.871795, 0.850147, ]


4x R_aw_2dan_preBM25fp40 - 0.876602 (95% [0.869336, 0.883869]):

11171397.arien.ics.muni.cz.R_aw_2dan_preBM25fp40 etc. [0.873077, 0.871795, 0.883333, 0.878205, ]


4x R_aw_2dan_preBM25fp60 - 0.852212 (95% [0.826522, 0.877901]):

11171399.arien.ics.muni.cz.R_aw_2dan_preBM25fp60 etc. [0.829359, 0.856410, 0.874359, 0.848718, ]


----

conclusion:  pruning not detrimental (possibly slight difference either way), but massive speedup

----

general next step: apply to other tasks, transfer learning test

Clone this wiki locally