-
Notifications
You must be signed in to change notification settings - Fork 67
synthetic_samples #3
Comments
@mp2893 Could you take a look at this? |
Hi shibin2018, Thanks for taking interest in our work. BTW, anybody with the intention of healthcare research can download eICU as per the official guideline. Best, |
Thank you ,I have downloaded the visit_list.p and get the synthetic _sample.but the graph-convolutional-transformer code only with the eICU dataset |
Hi shibin2018, In order to use the synthetic samples, you need to make some changes to the GCT code.
The tricky part is The one that requires minimal change is the Diagnosis-Treatment Classification. When you read each seqex, look for the context feature As for the Graph Reconstruction and Masked Diagnosis Code Prediction, they are much more involved, as you need to generate proper labels from the beginning, which means you need to change both Best, |
Thank you very much,I will try it.
----- 原始邮件 -----
发件人:Edward Choi <[email protected]>
收件人:Google-Health/records-research <[email protected]>
抄送人:shibin2018 <[email protected]>, Author <[email protected]>
主题:Re: [Google-Health/records-research] synthetic_samples (#3)
日期:2020年05月02日 19点53分
Hi shibin2018,
In order to use the synthetic samples, you need to make some changes to the GCT code.
First, please take a look at the arguments in line 626 in the source code, which can be changed via line 40 in train.py:
feature_keys: This should be changed to ['dx_ints', 'proc_ints', 'lab_ints'].
label_key: This will be explained below
vocab_sizes: This should be changed to {'dx_ints':1000, 'proc_ints':1000, 'lab_ints':1000}.
feature_set: This should be changed to 'vdpl'.
num_classes: This will be explained below.
The tricky part is label_key. In the paper, I conducted three prediction tasks (i.e. Graph Reconstruction, Diagnosis-Treatment Classification, Masked Diagnosis Code Prediction) using the synthetic samples. With the current GCT code, you can do none of them unless you modify the code.
The one that requires minimal change is the Diagnosis-Treatment Classification. When you read each seqex, look for the context feature 'label.medication.class'. If there is no such feature, then the true label is 0. If context_feature['label.medication.class'] is either '1' or '2', then that is the corresponding true label. Use those labels as the argument labels of model_fn of line 774 in the source code. You also need to change num_classes to 3 obviously. And you need to change get_loss in line 730 of the source code so that it uses the proper softmax loss instead the sigmoid loss. And don't forget to change any segment that uses sigmoid for probability (e.g. line 793 of the source code) to softmax.
As for the Graph Reconstruction and Masked Diagnosis Code Prediction, they are much more involved, as you need to generate proper labels from the beginning, which means you need to change both process_synthetic.py and graph_convolutional_transformer.py. So I recommend you start with Diagnosis-Treatment Classification to lightly modify the codes, and take the extra step if you want to do other tasks as well.
Best,
Ed
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Hi Thank you for your reply.I want to ask that how to change the label_key for synthetic samples.Because in the source code,I can't find the ['label.medication.class'],I don't know how to change it, and what is the meaning in "which can be changed via line 40 in train.py",I find the line 40 in train.py is null.Thank you.
----- 原始邮件 -----
发件人:Edward Choi <[email protected]>
收件人:Google-Health/records-research <[email protected]>
抄送人:shibin2018 <[email protected]>, Author <[email protected]>
主题:Re: [Google-Health/records-research] synthetic_samples (#3)
日期:2020年05月02日 19点53分
Hi shibin2018,
In order to use the synthetic samples, you need to make some changes to the GCT code.
First, please take a look at the arguments in line 626 in the source code, which can be changed via line 40 in train.py:
feature_keys: This should be changed to ['dx_ints', 'proc_ints', 'lab_ints'].
label_key: This will be explained below
vocab_sizes: This should be changed to {'dx_ints':1000, 'proc_ints':1000, 'lab_ints':1000}.
feature_set: This should be changed to 'vdpl'.
num_classes: This will be explained below.
The tricky part is label_key. In the paper, I conducted three prediction tasks (i.e. Graph Reconstruction, Diagnosis-Treatment Classification, Masked Diagnosis Code Prediction) using the synthetic samples. With the current GCT code, you can do none of them unless you modify the code.
The one that requires minimal change is the Diagnosis-Treatment Classification. When you read each seqex, look for the context feature 'label.medication.class'. If there is no such feature, then the true label is 0. If context_feature['label.medication.class'] is either '1' or '2', then that is the corresponding true label. Use those labels as the argument labels of model_fn of line 774 in the source code. You also need to change num_classes to 3 obviously. And you need to change get_loss in line 730 of the source code so that it uses the proper softmax loss instead the sigmoid loss. And don't forget to change any segment that uses sigmoid for probability (e.g. line 793 of the source code) to softmax.
As for the Graph Reconstruction and Masked Diagnosis Code Prediction, they are much more involved, as you need to generate proper labels from the beginning, which means you need to change both process_synthetic.py and graph_convolutional_transformer.py. So I recommend you start with Diagnosis-Treatment Classification to lightly modify the codes, and take the extra step if you want to do other tasks as well.
Best,
Ed
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Hi, same problem with the label key.. |
I want to ask that how to train this model with the synthetic_samples.I find that you only give the graph-convolutional-transformer code with the eICU dataset. Because I can't download the eICU dataset,I want to use the synthetic dataset.
The text was updated successfully, but these errors were encountered: