Reason using 2 sets of attention weights ? #7

valtheval · 2020-07-15T08:42:24Z

Hello retain team,

Great job ! Thank you for sharing it.
Do you have explanation why you use 2 sets of attention weights (visits and variables) instead of only one for variables ?
With this set you can still get a visit contribution using aggregating method, average or sum of the variable weights of each visit for instance
Thanks in advance for your help

mp2893 · 2020-07-17T07:04:32Z

Hi Valtheval,

Thanks for taking interest in our work.
It's an interesting question as a few other researchers asked the same question to me as well.
You can totally do what you suggested (i.e. using only code-level attention, then aggregating them).
For example, you can just use a single LSTM to encode a sequence of codes (no visits, but just a sequence of codes), and apply attention on top of them. But this way, you lose the visit-level information (i.e. which codes belong to the same visit).

The more interesting alternative would be using the RETAIN architecture, but remove the visit-level attention component. This way, you still tell the model which codes belong to the same visit. I am actually quite curious how this would turn out :)
If you happen to run this experiment, please share your results with everyone.

Best,
Ed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reason using 2 sets of attention weights ? #7

Reason using 2 sets of attention weights ? #7

valtheval commented Jul 15, 2020

mp2893 commented Jul 17, 2020

Reason using 2 sets of attention weights ? #7

Reason using 2 sets of attention weights ? #7

Comments

valtheval commented Jul 15, 2020

mp2893 commented Jul 17, 2020