-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Other fields in data generation #10
Comments
Hi Ankit267, In order to generate features (age, gender, procedure codes) other than Dx codes, you need training data that includes them in the first place. As for medGAN.py: For the second question: medGAN will generate synthetic samples that closely follow the distribution of the real data samples. So the dependency among all features will come into play. Since Patient ID is not being generated (unless you include them in the training data) no features will depend on Patient ID. Hope this helps. Best, |
Thanks Ed, that really helps. |
Dear Ed |
Hi Myshgithub, The synthetic records generated by MedGAN will have no relationship whatsoever with the original Patient IDs. The generated records will be purely synthetic, and every time you generate a new batch, you get a fresh batch of synthetic records. Sometimes you might get duplicate records by chance, but that doesn't mean they are the same patient. MedGAN does not understand the concept of Patient ID, unless you modify it somehow. Best, |
Thank you so much Ed! So, my question is please what changes or modifications do I have to do you think in order to have ( Patient ID along with synthetic generated records)? I would appreciate your responses... |
Hi Edward,
Thanks for the code, it really helps in understanding the paper better.
Currently your python code generates patient id and ICD9 diagnosis codes.
I wanted to know what changes or modifications do I have to do in your Process_mimic and medGAN code if I need to generate synthetic data to incorporate fields such as Age,gender,Procedures etc.?
Do I have to include the desired fields in process_mimic file only or do I need to make changes to medGAN.py also?
Secondly, the data that would be generated (i.e. pid, icd9, age,gender, procedures) will take into consideration impact of other variables or will it solely be dependent on the Patient id?
Thanks
The text was updated successfully, but these errors were encountered: