-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/test data for lab and health insurance #19
Feat/test data for lab and health insurance #19
Conversation
this way it's clear that we set it for all test datasets and not only medication data
replace = TRUE | ||
)), | ||
# Number of packages | ||
apk = sample(1:3, 1000, replace = TRUE), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
number of packages ("antal pakker", the slighly cryptic variable name on dst is apk) purchased is factored in when calculating the number of doses of insulin vs. non-insulin when classifying type 1 from type 2 diabetes.
In general, for creating data used within the package, the code to create it should be not included as part of the package (since it might include dependencies that aren't actually needed by the package for intended uses). That's why I moved the code over into |
@lwjohnst86 Your comments seem to be mostly on the medication test data, which is actually not a part of this PR (test data for lab and health insurance). |
Totally fine - I understand the confusion 👍 |
@Aastedet @lwjohnst86 Status on this? Anything I can do for this to become ready to merge? |
added assign_drugname_from_atc() to med_a10_df
Fix to previous commit to assign drugnames to med_a10_df
forgot to actually assign drug names to med_a10_df
- Added offset to pnr number generation to have more control when generating data for false-positive diabetes cases (for medication: 1-200: non-cases, 201-250: true cases). - Increased number of samples in health insurance/lab data and changed years covered by health insurance to match real world setting.
Merge commit '61343b054d2a190eb1e20de6bcb1265c10d2ac34' #Conflicts: # .Rbuildignore # DESCRIPTION
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! Next step is for me to add the hospital diagnosis (lpr) and population data (bef).
Are we waiting for me to add the fake diagnosis data (I will, I promise 😄 ) before merging or do I need to do more in terms of reviewing the PR? |
@Aastedet there are still a lot of questions I have and the code needs a lot of work, that's why I haven't merged it in yet. I think it would be better to first do #32 before creating the example/test data, because it would make it easier to build those if we have the variable list well defined and set up first. |
@lwjohnst86 Cool - I'll get #32 done ASAP. |
This pr includes the following:
create_test_lab_df()
increate_test_data.R
lab_df
increate_test_data.R
using thecreate_test_lab_df()
.lab_df
has 100 rows (1 row per individual)create_test_health_insurance_df()
function increate_test_data.R
health_insurance_df
in increate_test_data.R
using thecreate_test_health_insurance_df()
.health_insurance_df
has 100 rows (1 row per individual)Can you check whether these dfs follow your descriptions in #4, @Aastedet?
From the description, I'm not sure whether there should be multiple rows per individual? Currently, both functions sample from 001-100 with replacement, meaning that with 100 samples some ID's will appear multiple times and some ID's in the range will not appear. Is that what you imagined?
Do we maybe want to keep the test data functions and creation of the test data in separate scripts?