Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dimensions not matching? #12

Open
nair-p opened this issue Feb 6, 2020 · 2 comments
Open

Dimensions not matching? #12

nair-p opened this issue Feb 6, 2020 · 2 comments

Comments

@nair-p
Copy link

nair-p commented Feb 6, 2020

Hi Edward,

I'm trying to reproduce GRAM results using MIMIC-III data.
If I understand correctly, there are 4894 medical codes used to represent patient visits. So the G matrix (from the paper) has to be of size 4894 x 128 (embedding dimension). However, there are no matrices of that size stored as a result of running gram.py.

Am I missing something or am I supposed to be deriving the G matrix with the help of other stored files? I tried to do this too but the dimensions just don't seem to be matching. Any help will be highly appreciated.

Thanks!

@mp2893
Copy link
Owner

mp2893 commented Feb 7, 2020

Hi nair-p,

After you train the model, you should be able to see W_emb, of which the dimension size is some thousand dimensions by the embedding dimension. That is the embeddings of all medical codes plus the ancestor codes. You use attention on W_emb to derive the G matrix, which happens between line 126 and line 132 of gram.py.

Best,
Ed

@nair-p
Copy link
Author

nair-p commented Feb 7, 2020

Hi Edward,

Thank you for getting back.

I actually did try doing what you suggested. However, I seem to be getting the following error when I try to generate embList because the dimension of W_emb is 1671.
----> 4 attentionInput = T.concatenate([tparams['W_emb'][leaves], tparams['W_emb'][ancestors]], axis=2)
IndexError: index 5622 is out of bounds for axis 0 with size 1671
I built the leavesList and ancestorsList using to your code.

I tried modifying your code a little bit to save the predicted values of the test set at each epoch (saving the y_hat values) to try and reproduce the accuracy@k results. However the results do not seem to match. Of course this could be due to difference in Theano version etc (I'm using version 1.0.4), but I just wanted to make sure that doing this is a legit way of comparison.

I used the label file frequency of medical codes to divide them into bins of percentiles as mentioned in the paper. Then for each bin, I obtain the patients whose true label lies in that bin and check the accuracy@20 for the predicted labels for these patients. Is this how you calculate the accuracy@20 for each bin?

Thanks,
PN

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants