Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/test data for lab and health insurance #19

Merged
merged 20 commits into from
Apr 27, 2024

Conversation

signekb
Copy link
Contributor

@signekb signekb commented Dec 8, 2023

This pr includes the following:

  • The addition of create_test_lab_df() in create_test_data.R
  • The creation of lab_df in create_test_data.R using the create_test_lab_df(). lab_df has 100 rows (1 row per individual)
  • The addition of a create_test_health_insurance_df() function in create_test_data.R
  • The creation of health_insurance_df in in create_test_data.R using the create_test_health_insurance_df(). health_insurance_df has 100 rows (1 row per individual)

Can you check whether these dfs follow your descriptions in #4, @Aastedet?
From the description, I'm not sure whether there should be multiple rows per individual? Currently, both functions sample from 001-100 with replacement, meaning that with 100 samples some ID's will appear multiple times and some ID's in the range will not appear. Is that what you imagined?

Do we maybe want to keep the test data functions and creation of the test data in separate scripts?

@signekb signekb linked an issue Dec 8, 2023 that may be closed by this pull request
@signekb signekb marked this pull request as draft December 8, 2023 11:56
@signekb signekb requested a review from Aastedet December 8, 2023 13:04
@signekb signekb self-assigned this Dec 8, 2023
@signekb signekb marked this pull request as ready for review December 8, 2023 13:05
data-raw/testdata.R Outdated Show resolved Hide resolved
data-raw/testdata.R Show resolved Hide resolved
replace = TRUE
)),
# Number of packages
apk = sample(1:3, 1000, replace = TRUE),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

number of packages ("antal pakker", the slighly cryptic variable name on dst is apk) purchased is factored in when calculating the number of doses of insulin vs. non-insulin when classifying type 1 from type 2 diabetes.

data-raw/testdata.R Show resolved Hide resolved
data-raw/testdata.R Show resolved Hide resolved
@lwjohnst86
Copy link
Member

In general, for creating data used within the package, the code to create it should be not included as part of the package (since it might include dependencies that aren't actually needed by the package for intended uses). That's why I moved the code over into data-raw/ (set up using the usethis::use_data_raw()).

@signekb
Copy link
Contributor Author

signekb commented Dec 13, 2023

@lwjohnst86 Your comments seem to be mostly on the medication test data, which is actually not a part of this PR (test data for lab and health insurance).
But I guess @Aastedet can take a look at the comments, since I don't really have an overview of what's happening in that part of the script either :)
I will add @Aastedet as assignee for this PR (assignee = actively working on the PR and being responsible for getting it into a merge-ready state).

@lwjohnst86
Copy link
Member

@signekb I didn't realize until later that it was added by Anders (?) earlier, since the PR showed it all coming from you ☺️ though I did know that you hadn't added that code when I made the comments because of reading the #4 issue ☺️

@signekb
Copy link
Contributor Author

signekb commented Dec 13, 2023

Totally fine - I understand the confusion 👍

@signekb
Copy link
Contributor Author

signekb commented Jan 31, 2024

@Aastedet @lwjohnst86 Status on this? Anything I can do for this to become ready to merge?

@lwjohnst86
Copy link
Member

@signekb We'll get to this when we start the focus period next week. More likely it will be me and you figuring things out and getting feedback from @Aastedet during meetings.

added assign_drugname_from_atc() to med_a10_df
Fix to previous commit to assign drugnames to med_a10_df
forgot to actually assign drug names to med_a10_df
data-raw/testdata.R Outdated Show resolved Hide resolved
Anders Aasted Isaksen added 2 commits February 17, 2024 23:30
- Added offset to pnr number generation to have more control when generating data for false-positive diabetes cases (for medication: 1-200: non-cases, 201-250: true cases).
- Increased number of samples in health insurance/lab data and changed years covered by health insurance to match real world setting.
Merge commit '61343b054d2a190eb1e20de6bcb1265c10d2ac34'

#Conflicts:
#	.Rbuildignore
#	DESCRIPTION
Copy link
Collaborator

@Aastedet Aastedet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! Next step is for me to add the hospital diagnosis (lpr) and population data (bef).

@Aastedet
Copy link
Collaborator

Are we waiting for me to add the fake diagnosis data (I will, I promise 😄 ) before merging or do I need to do more in terms of reviewing the PR?

@lwjohnst86
Copy link
Member

@Aastedet there are still a lot of questions I have and the code needs a lot of work, that's why I haven't merged it in yet. I think it would be better to first do #32 before creating the example/test data, because it would make it easier to build those if we have the variable list well defined and set up first.

@Aastedet
Copy link
Collaborator

@lwjohnst86 Cool - I'll get #32 done ASAP.

@lwjohnst86 lwjohnst86 merged commit b645de4 into main Apr 27, 2024
0 of 2 checks passed
@lwjohnst86 lwjohnst86 deleted the feat/test-data-for-lab-and-health-insurance branch April 27, 2024 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a fake dataset to test that the functions work
3 participants