Skip to content
This repository was archived by the owner on Nov 5, 2022. It is now read-only.

synthetic_samples #3

Closed
shibin2018 opened this issue Apr 21, 2020 · 8 comments
Closed

synthetic_samples #3

shibin2018 opened this issue Apr 21, 2020 · 8 comments
Assignees

Comments

@shibin2018
Copy link

I want to ask that how to train this model with the synthetic_samples.I find that you only give the graph-convolutional-transformer code with the eICU dataset. Because I can't download the eICU dataset,I want to use the synthetic dataset.

@a-dai a-dai self-assigned this Apr 21, 2020
@a-dai
Copy link

a-dai commented Apr 21, 2020

@mp2893 Could you take a look at this?

@mp2893
Copy link
Contributor

mp2893 commented Apr 22, 2020

Hi shibin2018,

Thanks for taking interest in our work.
Did you manage to follow the preprocessing instruction in README? (downloading visit_list.p, etc.)

BTW, anybody with the intention of healthcare research can download eICU as per the official guideline.

Best,
Ed

@shibin2018
Copy link
Author

Thank you ,I have downloaded the visit_list.p and get the synthetic _sample.but the graph-convolutional-transformer code only with the eICU dataset

@mp2893
Copy link
Contributor

mp2893 commented May 2, 2020

Hi shibin2018,

In order to use the synthetic samples, you need to make some changes to the GCT code.
First, please take a look at the arguments in line 626 in the source code, which can be changed via line 40 in train.py:

  • feature_keys: This should be changed to ['dx_ints', 'proc_ints', 'lab_ints'].
  • label_key: This will be explained below
  • vocab_sizes: This should be changed to {'dx_ints':1000, 'proc_ints':1000, 'lab_ints':1000}.
  • feature_set: This should be changed to 'vdpl'.
  • num_classes: This will be explained below.

The tricky part is label_key. In the paper, I conducted three prediction tasks (i.e. Graph Reconstruction, Diagnosis-Treatment Classification, Masked Diagnosis Code Prediction) using the synthetic samples. With the current GCT code, you can do none of them unless you modify the code.

The one that requires minimal change is the Diagnosis-Treatment Classification. When you read each seqex, look for the context feature 'label.medication.class'. If there is no such feature, then the true label is 0. If context_feature['label.medication.class'] is either '1' or '2', then that is the corresponding true label. Use those labels as the argument labels of model_fn of line 774 in the source code. You also need to change num_classes to 3 obviously. And you need to change get_loss in line 730 of the source code so that it uses the proper softmax loss instead the sigmoid loss. And don't forget to change any segment that uses sigmoid for probability (e.g. line 793 of the source code) to softmax.

As for the Graph Reconstruction and Masked Diagnosis Code Prediction, they are much more involved, as you need to generate proper labels from the beginning, which means you need to change both process_synthetic.py and graph_convolutional_transformer.py. So I recommend you start with Diagnosis-Treatment Classification to lightly modify the codes, and take the extra step if you want to do other tasks as well.

Best,
Ed

@shibin2018
Copy link
Author

shibin2018 commented May 2, 2020 via email

@shibin2018
Copy link
Author

shibin2018 commented May 6, 2020 via email

@Livnatc
Copy link

Livnatc commented Nov 28, 2020

Hi,

same problem with the label key..
what should be the label key for the prediction task that was published? (for synthetic data)

@jonasbkemp
Copy link
Contributor

Closing this issue as stale. From discussion in #6, DescEmb should be preferred to GCT for new experiments where possible.

@jonasbkemp jonasbkemp closed this as not planned Won't fix, can't repro, duplicate, stale Jul 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants