Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reason using 2 sets of attention weights ? #7

Open
valtheval opened this issue Jul 15, 2020 · 1 comment
Open

Reason using 2 sets of attention weights ? #7

valtheval opened this issue Jul 15, 2020 · 1 comment

Comments

@valtheval
Copy link

Hello retain team,

Great job ! Thank you for sharing it.
Do you have explanation why you use 2 sets of attention weights (visits and variables) instead of only one for variables ?
With this set you can still get a visit contribution using aggregating method, average or sum of the variable weights of each visit for instance
Thanks in advance for your help

@mp2893
Copy link
Owner

mp2893 commented Jul 17, 2020

Hi Valtheval,

Thanks for taking interest in our work.
It's an interesting question as a few other researchers asked the same question to me as well.
You can totally do what you suggested (i.e. using only code-level attention, then aggregating them).
For example, you can just use a single LSTM to encode a sequence of codes (no visits, but just a sequence of codes), and apply attention on top of them. But this way, you lose the visit-level information (i.e. which codes belong to the same visit).

The more interesting alternative would be using the RETAIN architecture, but remove the visit-level attention component. This way, you still tell the model which codes belong to the same visit. I am actually quite curious how this would turn out :)
If you happen to run this experiment, please share your results with everyone.

Best,
Ed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants