synthetic_samples #3

shibin2018 · 2020-04-21T09:26:40Z

I want to ask that how to train this model with the synthetic_samples.I find that you only give the graph-convolutional-transformer code with the eICU dataset. Because I can't download the eICU dataset,I want to use the synthetic dataset.

a-dai · 2020-04-21T18:34:23Z

@mp2893 Could you take a look at this?

mp2893 · 2020-04-22T02:28:20Z

Hi shibin2018,

Thanks for taking interest in our work.
Did you manage to follow the preprocessing instruction in README? (downloading visit_list.p, etc.)

BTW, anybody with the intention of healthcare research can download eICU as per the official guideline.

Best,
Ed

shibin2018 · 2020-04-29T01:38:51Z

Thank you ,I have downloaded the visit_list.p and get the synthetic _sample.but the graph-convolutional-transformer code only with the eICU dataset

mp2893 · 2020-05-02T11:53:19Z

Hi shibin2018,

In order to use the synthetic samples, you need to make some changes to the GCT code.
First, please take a look at the arguments in line 626 in the source code, which can be changed via line 40 in train.py:

feature_keys: This should be changed to ['dx_ints', 'proc_ints', 'lab_ints'].
label_key: This will be explained below
vocab_sizes: This should be changed to {'dx_ints':1000, 'proc_ints':1000, 'lab_ints':1000}.
feature_set: This should be changed to 'vdpl'.
num_classes: This will be explained below.

The tricky part is label_key. In the paper, I conducted three prediction tasks (i.e. Graph Reconstruction, Diagnosis-Treatment Classification, Masked Diagnosis Code Prediction) using the synthetic samples. With the current GCT code, you can do none of them unless you modify the code.

The one that requires minimal change is the Diagnosis-Treatment Classification. When you read each seqex, look for the context feature 'label.medication.class'. If there is no such feature, then the true label is 0. If context_feature['label.medication.class'] is either '1' or '2', then that is the corresponding true label. Use those labels as the argument labels of model_fn of line 774 in the source code. You also need to change num_classes to 3 obviously. And you need to change get_loss in line 730 of the source code so that it uses the proper softmax loss instead the sigmoid loss. And don't forget to change any segment that uses sigmoid for probability (e.g. line 793 of the source code) to softmax.

As for the Graph Reconstruction and Masked Diagnosis Code Prediction, they are much more involved, as you need to generate proper labels from the beginning, which means you need to change both process_synthetic.py and graph_convolutional_transformer.py. So I recommend you start with Diagnosis-Treatment Classification to lightly modify the codes, and take the extra step if you want to do other tasks as well.

Best,
Ed

shibin2018 · 2020-05-02T23:41:10Z

Thank you very much,I will try it. ----- 原始邮件 ----- 发件人：Edward Choi <[email protected]> 收件人：Google-Health/records-research <[email protected]> 抄送人：shibin2018 <[email protected]>, Author <[email protected]> 主题：Re: [Google-Health/records-research] synthetic_samples (#3) 日期：2020年05月02日 19点53分 Hi shibin2018, In order to use the synthetic samples, you need to make some changes to the GCT code. First, please take a look at the arguments in line 626 in the source code, which can be changed via line 40 in train.py: feature_keys: This should be changed to ['dx_ints', 'proc_ints', 'lab_ints']. label_key: This will be explained below vocab_sizes: This should be changed to {'dx_ints':1000, 'proc_ints':1000, 'lab_ints':1000}. feature_set: This should be changed to 'vdpl'. num_classes: This will be explained below. The tricky part is label_key. In the paper, I conducted three prediction tasks (i.e. Graph Reconstruction, Diagnosis-Treatment Classification, Masked Diagnosis Code Prediction) using the synthetic samples. With the current GCT code, you can do none of them unless you modify the code. The one that requires minimal change is the Diagnosis-Treatment Classification. When you read each seqex, look for the context feature 'label.medication.class'. If there is no such feature, then the true label is 0. If context_feature['label.medication.class'] is either '1' or '2', then that is the corresponding true label. Use those labels as the argument labels of model_fn of line 774 in the source code. You also need to change num_classes to 3 obviously. And you need to change get_loss in line 730 of the source code so that it uses the proper softmax loss instead the sigmoid loss. And don't forget to change any segment that uses sigmoid for probability (e.g. line 793 of the source code) to softmax. As for the Graph Reconstruction and Masked Diagnosis Code Prediction, they are much more involved, as you need to generate proper labels from the beginning, which means you need to change both process_synthetic.py and graph_convolutional_transformer.py. So I recommend you start with Diagnosis-Treatment Classification to lightly modify the codes, and take the extra step if you want to do other tasks as well. Best, Ed — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

shibin2018 · 2020-05-06T05:06:21Z

Hi Thank you for your reply.I want to ask that how to change the label_key for synthetic samples.Because in the source code,I can't find the ['label.medication.class'],I don't know how to change it, and what is the meaning in "which can be changed via line 40 in train.py",I find the line 40 in train.py is null.Thank you. ----- 原始邮件 ----- 发件人：Edward Choi <[email protected]> 收件人：Google-Health/records-research <[email protected]> 抄送人：shibin2018 <[email protected]>, Author <[email protected]> 主题：Re: [Google-Health/records-research] synthetic_samples (#3) 日期：2020年05月02日 19点53分 Hi shibin2018, In order to use the synthetic samples, you need to make some changes to the GCT code. First, please take a look at the arguments in line 626 in the source code, which can be changed via line 40 in train.py: feature_keys: This should be changed to ['dx_ints', 'proc_ints', 'lab_ints']. label_key: This will be explained below vocab_sizes: This should be changed to {'dx_ints':1000, 'proc_ints':1000, 'lab_ints':1000}. feature_set: This should be changed to 'vdpl'. num_classes: This will be explained below. The tricky part is label_key. In the paper, I conducted three prediction tasks (i.e. Graph Reconstruction, Diagnosis-Treatment Classification, Masked Diagnosis Code Prediction) using the synthetic samples. With the current GCT code, you can do none of them unless you modify the code. The one that requires minimal change is the Diagnosis-Treatment Classification. When you read each seqex, look for the context feature 'label.medication.class'. If there is no such feature, then the true label is 0. If context_feature['label.medication.class'] is either '1' or '2', then that is the corresponding true label. Use those labels as the argument labels of model_fn of line 774 in the source code. You also need to change num_classes to 3 obviously. And you need to change get_loss in line 730 of the source code so that it uses the proper softmax loss instead the sigmoid loss. And don't forget to change any segment that uses sigmoid for probability (e.g. line 793 of the source code) to softmax. As for the Graph Reconstruction and Masked Diagnosis Code Prediction, they are much more involved, as you need to generate proper labels from the beginning, which means you need to change both process_synthetic.py and graph_convolutional_transformer.py. So I recommend you start with Diagnosis-Treatment Classification to lightly modify the codes, and take the extra step if you want to do other tasks as well. Best, Ed — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Livnatc · 2020-11-28T18:00:00Z

Hi,

same problem with the label key..
what should be the label key for the prediction task that was published? (for synthetic data)

jonasbkemp · 2022-07-12T20:45:39Z

Closing this issue as stale. From discussion in #6, DescEmb should be preferred to GCT for new experiments where possible.

a-dai self-assigned this Apr 21, 2020

jonasbkemp mentioned this issue May 4, 2020

Could you provide a toy example about the eICU dataset #5

Closed

jonasbkemp mentioned this issue Nov 4, 2020

GCT training error #8

Closed

mp2893 mentioned this issue Jan 10, 2021

Graph reconstruction #10

Closed

jonasbkemp closed this as not planned Won't fix, can't repro, duplicate, stale Jul 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

synthetic_samples #3

synthetic_samples #3

shibin2018 commented Apr 21, 2020

a-dai commented Apr 21, 2020

mp2893 commented Apr 22, 2020 •

edited

Loading

shibin2018 commented Apr 29, 2020

mp2893 commented May 2, 2020

shibin2018 commented May 2, 2020 via email

shibin2018 commented May 6, 2020 via email

Livnatc commented Nov 28, 2020

jonasbkemp commented Jul 12, 2022

synthetic_samples #3

synthetic_samples #3

Comments

shibin2018 commented Apr 21, 2020

a-dai commented Apr 21, 2020

mp2893 commented Apr 22, 2020 • edited Loading

shibin2018 commented Apr 29, 2020

mp2893 commented May 2, 2020

shibin2018 commented May 2, 2020 via email

shibin2018 commented May 6, 2020 via email

Livnatc commented Nov 28, 2020

jonasbkemp commented Jul 12, 2022

mp2893 commented Apr 22, 2020 •

edited

Loading