diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd index 21f4e17..2360270 100644 --- a/vignettes/function-flow.Rmd +++ b/vignettes/function-flow.Rmd @@ -97,98 +97,246 @@ library(dplyr) library(osdc) ``` -#### HbA1c tests above the diagnosis cut-off value (48 mmol/mol or 6.5%) - -The function `include_hba1c()` uses `lab_forsker` as the input data to -extract all events of HbA1c tests above the diagnosis cut-off value. - -Since the HbA1c diagnosis cut-off value depends on the kind of test that is -used, the inclusion event is defined as follows: - -- For HbA1c IFCC (NPU03835), we include values \>= 6.5 %. -- For HbA1c DCCT (NPU27300), we include values \>= 48 mmol/mol. - -```{r, echo=FALSE} -algorithm |> - filter(name=="hba1c") |> - knitr::kable(caption = "Algorithm used in the implementation for including HbA1c.") -``` - -#### Hospital diagnosis of diabetes +#### Hospital diagnoses + +**Joining LPR2 and LPR3 data** + +The helper functions `join_lpr2()` and `join_lpr3()` join records of +diagnoses to administrative information in LPR2-formatted and +LPR3-formatted data, respectively. + +`join_lpr2()` takes `lpr_diag` and `lpr_adm` as inputs, filters to the +necessary diagnoses (`c_diag` starting with "DO0[0-6]", "DO8[0-4]", +"DZ3[37]", "DE1[0-4]", "249", or "250"), joins the required information +by record number (`recnum`), and outputs a `data.frame` with the +following variables: + +- `pnr`: identifier variable +- `date`: date of the recorded diagnosis (renamed from `d_inddto`) +- `specialty`: department specialty (renamed from `c_spec`) +- `diagnosis`: diagnosis code (renamed from `c_diag`) +- `diagnosis_type`: diagnosis type (renamed from `c_diagtype`) + +`join_lpr3()` takes `diagnoser` and `kontakter` as inputs, filters to +the necessary diagnoses (`diagnosekode` starting with "DO0[0-6]", +"DO8[0-4]", "DZ3[37]" or "DE1[0-4]"), joins the required information by +record number (`dw_ek_kontakt`), and outputs a `data.frame` with the +following variables: + +- `pnr`: identifier variable (renamed from `cpr`) +- `date`: date of the recorded diagnosis (renamed from `dato_start`) +- `specialty`: department specialty (renamed from `hovedspeciale_ans`) +- `diagnosis`: diagnosis code (renamed from `diagnosekode`) +- `diagnosis_type`: diagnosis type (renamed from `diagnosetype`) +- `diagnosis_retracted`: if the diagnosis was later retracted (renamed + from `senere_afkraeftet`) + +These outputs are passed to `include_diabetes_diagnoses()` (and to +`get_pregnancy_dates()`, see exclusion events) for further processing +below. + +**Processing of diabetes diagnoses** The function `include_diabetes_diagnoses()` uses the hospital contacts -from LPR2 and 3 to include all dates of diabetes diagnoses. Diabetes -diagnoses from both ICD 8 and ICD 10 are included. - -This function contains two helper functions: - -- `keep_diabetes_icd10()` -- `keep_diabetes_icd8()` - - - - +from LPR2 and LPR3 to include all dates of diabetes diagnoses to use for +inclusion, as well as additional information needed to classify diabetes +type. Diabetes diagnoses from both ICD-8 and ICD-10 are included. + +The function takes the outputs of `join_lpr2()` and `join_lpr3()` as +inputs and processes each input separately to generate the following +internal variables: + +- From `join_lpr2`: + - `pnr`: identifier variable + - `date`: dates of all included diabetes diagnoses: + - registered as primary (A) or secondary (B) diagnoses, regardless + of type or department: + - `diagnosis` starts with "DE1[0-4]", "249" or "250", and + `diagnosis_type` is either "A" or "B" + - `is_primary`: Define whether the diagnosis was a primary + diagnosis (`diagnosis_type` == "A") + - `is_t1d`: Define whether the diagnosis was T1D-specific + (`diagnosis` starts with "DE10" or "249") + - `is_t2d`: Define whether the diagnosis was T2D-specific + (`diagnosis` starts with "DE11" or "250") + - `department`: Define whether the diagnosis was made made by an + endocrinological (`specialty` == 8 -\> `department` == + "endocrinology") or other medical department (`specialty` \< 8 + or 9-30 -\> `department` == "other medical") +- From `join_lpr3()`: + - `pnr`: identifier variable + - `date`: dates of all included diabetes diagnoses: + - registered as primary (A) or secondary (B) diagnoses, regardless + of type or department, but exclude retracted diagnoses: + - `diagnosis` starts with "DE1[0-4]", `diagnosis_type` is + either "A" or "B" and `diagnosis_retracted` == "Nej" + - `is_primary`: Define whether the diagnosis was a primary + diagnosis (`diagnosis_type` == "A") + - `is_t1d`: Define whether the diagnosis was T1D-specific + (`diagnosis` starts with "DE10") + - `is_t2d`: Define whether the diagnosis was T2D-specific + (`diagnosis` starts with "DE11") + - `department`: Define whether the diagnosis was made made by an + endocrinological department (`specialty` == "medicinsk + endokrinologi" -\> `department` == "endocrinology") or other + medical department (`specialty` either "Blandet medicin og + kirurgi", "Intern medicin", "Geriatri", "Hepatologi", + "Hæmatologi", "Infektionsmedicin", "Kardiologi", "Medicinsk + allergologi", "Medicinsk gastroenterologi", "Medicinsk + lungesygdomme", "Nefrologi", "Reumatologi", "Palliativ medicin", + "Akut medicin", "Dermato-venerologi", "Neurologi", "Onkologi", + "Fysiurgi", or "Tropemedicin" -\> `department` == "other + medical") + +Internally, these intermediate results are combined and processed +together. And ultimately, `include_diabetes_diagnoses()` outputs a +single `data.frame` with the following variables (up to two rows per +individual): + +- `pnr`: identifier variable +- `dates`: dates of the first and second hospital diabetes diagnosis +- `n_t1d_endocrinology`: number of type 1 diabetes-specific primary + diagnosis codes from endocrinological departments +- `n_t2d_endocrinology`: number of type 2 diabetes-specific primary + diagnosis codes from endocrinological departments +- `n_t1d_medical`: number of type 1 diabetes-specific primary + diagnosis codes from medical departments +- `n_t2d_medical`: number of type 2 diabetes-specific primary + diagnosis codes from medical departments + +This output is passed to the `join_inclusions()` function, where the +`dates` variable is used for the final step of the inclusion process. +The variables of counts of diabetes type-specific primary diagnoses are +carried over for the subsequent classification of diabetes type, +initially as inputs to the `get_t1d_primary_diagnosis()` and +`get_majority_of_t1d_diagnoses()` functions. #### Diabetes-specific podiatrist services The function `include_podiatrist_services()` uses `sysi` or `sssy` as input to extract the dates of all diabetes-specific podiatrist services. - +These dates are extracted by filtering values beginning with "54" in the +`speciale` variable of the `sssy` and `sysi` registers by default +(alternatively, the function can take the `spec2` variable as input +instead, if that is the data available to the user). In addition, +services provided to a child of the individual (`barnmak` != 0) are +excluded using the `barnmak` variable. An internal helper function +`get_unique_honuge_dates()` is applied to generate a proper date +variable based on the year-week (wwyy-formatted) variable (`honuge`) +found in the raw data, and de-duplicates multiple services registered on +the same date. -#### GLD purchases +`include_podiatrist_services()` outputs a 2-column data frame with up to +two rows for each individual, containing the following variables: -The function `include_gld_purchases()` uses `lmdb` to extract the dates -of all GLD purchases (from 1997 onwards). +- `pnr`: identifier variable +- `dates`: the dates of the first and second diabetes-specific + podiatrist record - +The output is passed to the `join_inclusions()` function for the final +step of the inclusion process. - +#### HbA1c tests above the diagnosis cut-off value (48 mmol/mol or 6.5%) -### Exclusion events +The function `include_hba1c()` uses `lab_forsker` as the input data to +extract the dates of all elevated HbA1c test results, using the +appropriate cut-offs: -#### HbA1c tests and GLD purchases during pregnancy +- IFCC units: `analysiscode` NPU27300, any `value` $\geq$ 48 mmol/mol +- DCCT units: `analysiscode` NPU03835: any `value` $\geq$ 6.5% . -The function `exclude_pregnancy()` uses diagnoses from LPR2 or LPR3 as -input and is used to exclude both HbA1c tests and GLD purchases during -pregnancy. +```{r, echo=FALSE} +algorithm |> + filter(name=="hba1c") |> + knitr::kable(caption = "Algorithm used in the implementation for including HbA1c.") +``` -Internally, this relies on the function `get_pregnancy_dates()` that -contains the following three helper functions: +Multiple elevated results on the same day within each individual are +deduplicated, to account for the same test result often being reported +twice (one for IFCC, one for DCCT units). -- `calculate_pregnancy_index_date_for_mc_visits_wo_end_date()` (this - might be removed with the inclusion of the birth register) -- `get_pregnancy_end_dates()`: Keep maternal care visits with an end - date and drop visits between 40 weeks before end date and 12 weeks - after end date. -- `get_maternal_care_visit_dates_without_end_date()`: Uses the output - from `get_pregnancy_end_dates()` which identifies maternal care - visits *with* end dates to derive maternal care visits *without* end - dates. below. +`include_hba1c()` outputs a 2-column data frame containing the following +variables: - +- `pnr`: identifier variable +- `dates`: the dates of all elevated HbA1c test results - +The output is passed to the `exclude_pregnancy()` function for censoring +of elevated results due to potential gestational diabetes (see below). + +#### GLD purchases -#### Glucose-lowering brand drugs for weight loss +The function `include_gld_purchases()` uses `lmdb` to extract the dates +of all GLD purchases. + +These dates are extracted by including all values beginning with "A10" +in the `atc` variable of the `lmdb` register. Since the diagnosis code +data on pregnancies (see below) is insufficient to perform censoring +prior to 1997, `include_gld_purchases()` only extracts dates from 1997 +onward by default (if Medical Birth Register data is available to use +for censoring, the extraction window can be extended). + +This function outputs a `data.frame` with the following variables needed +later in the classification part of the function flow: + +- `pnr`: identifier variable +- `date`: dates of all purchases of GLD (renamed from `eksd`) +- `atc`: type of drug +- `contained_doses`: amount purchased, in number of defined daily + doses (DDD). Calculated as `volume` (doses contained in the + purchased package) times `apk` (number of packages purchased) +- `indication_code`: indication code of the prescription (renamed from + `indo`) + +These events are then passed to a chain of exclusion functions: +`exclude_wld_purchases()`, `exclude_potential_pcos()`, +`exclude_pregnancy()` described in the sections below. -The function `exclude_wld_purchases()` uses lmdb as input and excludes -the brand drugs Saxenda and Wegovy. +### Exclusion events - +#### Metformin purchases potentially for the treatment of polycystic ovary syndrome -#### Metformin purchases for women below age 40 +The function `exclude_potential_pcos()` takes the output from +`include_gld_purchases()` and `bef` (information on sex and date of +birth) as inputs and censors (filters out) all purchases of metformin in +women below age 40 at the date of purchase (`atc` = "A10BA02" & `sex` = +"woman" & date at purchase (`date`-`date_of_birth`) \< 40 years) or an +indication code suggesting treatment of polycystic ovary syndrome (`atc` += "A10BA02" & `sex` = "woman" & `indication_code` either "0000092", +"0000276" or "0000781"). -The function `exclude_potential_pcos()` as input to exclude all -purchases of metformin by women below age 40 (i.e., \<= 39 years old) at -the date of purchase. It relies on `bef` as input. +After these exclusions are made, the output is passed to +`exclude_pregnancy()` for further censoring, described below: -This function contains two helper functions: +#### HbA1c tests and GLD purchases during pregnancy -- `keep_women()` -- `drop_age_40_below()` +The function `exclude_pregnancy()` takes the combined outputs from +`join_lpr2()`, `join_lpr3()`, `include_hba1c()`, and +`exclude_potential_pcos()` and uses diagnoses from LPR2 or LPR3 to +exclude both elevated HbA1c tests and GLD purchases during pregnancy, as +these may be due to gestational diabetes, rather than type 1 or type 2 +diabetes. - +Internally, this relies on the function `get_pregnancy_dates()` that +uses diagnoses registered in the National Patient Register to extract +the dates of all pregnancy ending (live births or miscarriages). These +are identified by `diagnosis` values beginning with "DO0[0-6]", +"DO8[0-4]" or "DZ3[37]". The dates output by`get_pregnancy_dates()\` are +used to exclude all inclusion events registered between 40 weeks before +and 12 weeks after a pregnancy ending. + +After these exclusion functions have been applied, the output serves as +inputs to two sets of functions: + +1. the censored HbA1c and GLD data are passed to the + `join_inclusions()` function for the final step of the inclusion + process. +2. the censored GLD data is passed to the + `get_only_insulin_purchases()`, + `get_insulin_purchases_within_180_days()`, and + `get_insulin_is_two_thirds_of_gld_doses()` helper functions for the + classification of diabetes type. ### Get diagnosis date @@ -233,8 +381,8 @@ OSDC algorithm includes the following criteria: diagnoses extracted from `lpr_diag` (LPR2) and `diagnoser` (LPR3) in the previous steps. 2. `get_only_insulin_purchases()` which relies on the GLD purchases - from Lægemiddelsdatabasen to get patients where all GLD purchases - are insulin only. + from Lægemiddeldatabasen to get patients where all GLD purchases are + insulin only. 3. `get_majority_of_t1d_diagnoses()` (as compared to T2D diagnoses) which again relies on primary hospital diagnoses from LPR. 4. `get_insulin_purchase_within_180_days()` which relies on both