Dimensions not matching? #12

nair-p · 2020-02-06T02:24:01Z

Hi Edward,

I'm trying to reproduce GRAM results using MIMIC-III data.
If I understand correctly, there are 4894 medical codes used to represent patient visits. So the G matrix (from the paper) has to be of size 4894 x 128 (embedding dimension). However, there are no matrices of that size stored as a result of running gram.py.

Am I missing something or am I supposed to be deriving the G matrix with the help of other stored files? I tried to do this too but the dimensions just don't seem to be matching. Any help will be highly appreciated.

Thanks!

mp2893 · 2020-02-07T08:41:21Z

Hi nair-p,

After you train the model, you should be able to see W_emb, of which the dimension size is some thousand dimensions by the embedding dimension. That is the embeddings of all medical codes plus the ancestor codes. You use attention on W_emb to derive the G matrix, which happens between line 126 and line 132 of gram.py.

Best,
Ed

nair-p · 2020-02-07T17:31:54Z

Hi Edward,

Thank you for getting back.

I actually did try doing what you suggested. However, I seem to be getting the following error when I try to generate embList because the dimension of W_emb is 1671.
----> 4 attentionInput = T.concatenate([tparams['W_emb'][leaves], tparams['W_emb'][ancestors]], axis=2)
IndexError: index 5622 is out of bounds for axis 0 with size 1671
I built the leavesList and ancestorsList using to your code.

I tried modifying your code a little bit to save the predicted values of the test set at each epoch (saving the y_hat values) to try and reproduce the accuracy@k results. However the results do not seem to match. Of course this could be due to difference in Theano version etc (I'm using version 1.0.4), but I just wanted to make sure that doing this is a legit way of comparison.

I used the label file frequency of medical codes to divide them into bins of percentiles as mentioned in the paper. Then for each bin, I obtain the patients whose true label lies in that bin and check the accuracy@20 for the predicted labels for these patients. Is this how you calculate the accuracy@20 for each bin?

Thanks,
PN

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dimensions not matching? #12

Dimensions not matching? #12

nair-p commented Feb 6, 2020

mp2893 commented Feb 7, 2020

nair-p commented Feb 7, 2020 •

edited

Loading

Dimensions not matching? #12

Dimensions not matching? #12

Comments

nair-p commented Feb 6, 2020

mp2893 commented Feb 7, 2020

nair-p commented Feb 7, 2020 • edited Loading

nair-p commented Feb 7, 2020 •

edited

Loading