diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd index 76acd71..bbc6139 100644 --- a/vignettes/function-flow.Rmd +++ b/vignettes/function-flow.Rmd @@ -51,38 +51,55 @@ rules. ## Function flow -The OSDC algorithm - and thereby, the osdc package - contains one main -function that will classify individuals into those with either type 1 or -type 2 diabetes using the Danish registers: -`classify_diabetes_status()`. This function classifies those with +The osdc package contains one main function that classifies individuals +into those with either type 1 or type 2 diabetes using the Danish +registers: `classify_diabetes()`. This function classifies those with diabetes (type 1 or 2) based on the Danish registers described in the -`vignette("design")`. All data sources are used as input for this -function. The specific inclusion and exclusion details are also -described in the `vignette("design")`. +`vignette("design")` and `vignette("data-sources")`. All data sources +are used as input for this function. The specific inclusion and +exclusion details are also described in the `vignette("design")`. This results in the functionality flow for classifying diabetes status -seen below. All functions take a `data.frame` type object as input and -outputs the same type of object as the input object (a `data.frame` -type). For instance, if the input is a `data.table` object, the output -will also be a `data.table`. +seen below. This flow can be divided into two sections: extracting the +diabetes population and classifying diabetes type which we will detail +in the following sections. + +All functions take a `data.frame` type object as input and outputs the +same type of object as the input object (a `data.frame` type). For +instance, if the input is a `data.table` object, the output will also be +a `data.table`. ![Flow of functions, as well as their required input registers, for -classifying diabetes status using the `osdc` package. Light blue and +classifying diabetes status using the osdc package. Light blue and orange boxes represent filtering functions (inclusion and exclusion events, respectively). Uncoloured boxes are helper functions that get or extract a condition or joins data or function outputs.](images/function-flow.png) -## Inclusion events +## Population extraction + +In the following sections, we describe the functions used to extract the +diabetes population from the Danish registers. The functions are divided +into inclusion and exclusion events, and the final diagnosis date is +calculated based on these events. + +![Flow of functions, as well as their required input registers, for +extracting the population with diabetes using the osdc package. Light +blue and orange boxes represent filtering functions (inclusion and +exclusion events, respectively). Uncoloured boxes are helper functions +that get or extract a condition or joins data or function +outputs.](images/function-flow-population.png) + +### Inclusion events -### HbA1c tests above 48 mmol/mol +#### HbA1c tests above 48 mmol/mol The function `include_hba1c()` uses `lab_forsker` as the input data to extract all events of tests above 48 mmol/mol. -### Hospital diagnosis of diabetes +#### Hospital diagnosis of diabetes The function `include_diabetes_diagnoses()` uses the hospital contacts from LPR2 and 3 to include all dates of diabetes diagnoses. Diabetes @@ -97,14 +114,14 @@ This function contains two helper functions: -### Diabetes-specific podiatrist services +#### Diabetes-specific podiatrist services The function `include_podiatrist_services()` uses `sysi` or `sssy` as input to extract the dates of all diabetes-specific podiatrist services. -### GLD purchases +#### GLD purchases The function `include_gld_purchases()` uses `lmdb` to extract the dates of all GLD purchases (from 1997 onwards). @@ -113,9 +130,9 @@ of all GLD purchases (from 1997 onwards). -## Exclusion events +### Exclusion events -### HbA1c tests and GLD purchases during pregnancy +#### HbA1c tests and GLD purchases during pregnancy The function `exclude_pregnancy()` uses diagnoses from LPR2 or LPR3 as input and is used to exclude both HbA1c tests and GLD purchases during @@ -138,16 +155,14 @@ contains the following three helper functions: -### Glucose-lowering brand drugs for weight loss +#### Glucose-lowering brand drugs for weight loss -The function `exclude_purchases_of_weight_loss_drugs()` uses REGISTER as -input and excludes BRANDS. +The function `exclude_wld_purchases()` uses lmdb as input and excludes +the brand drugs Saxenda and Wegovy. - - -### Metformin purchases for women below age 40 +#### Metformin purchases for women below age 40 The function `exclude_potential_pcos()` as input to exclude all purchases of metformin by women below age 40 (i.e., \<= 39 years old) at @@ -160,7 +175,7 @@ This function contains two helper functions: -## Get diagnosis date +### Get diagnosis date The function `get_diagnosis_date()` combines the outputs from the inclusion and exclusion functions to get the final diagnosis date. @@ -173,4 +188,99 @@ together with `join_diagnosis_dates()`. Finally, the dates outside of the data coverage period are dropped with `drop_diagnosis_dates_outside_coverage()` to end with a final diagnosis date. For details on this censoring based on periods with insufficient -data coverage, see the `vignette("algorithm-logic")`. +data coverage, see the `vignette("design")`. + +### Classifying the diabetes type + +The next step of the OSDC algorithm classifies individuals from the +extracted diabetes population as having either T1D or T2D. As described +in the `vignette("design")`, individuals not classified as T1D cases are +classified as T2D cases. + +The output is a `data.frame` that includes one row per individual in the +diabetes population: one column with their PNR, two columns with +inclusion dates (one "stable" date and one "raw" date - see the +`vignette("design")` for an elaboration on what that entails), and one +column with the diabetes type. + + + +![Flow of functions for classifying diabetes status using the `osdc` +package.](images/function-flow-classification.png) + +#### Type 1 classification + +The details for the classification of type 1 diabetes is described in +`vignette("design")`. To classify whether an individual has T1D, the +OSDC algorithm includes the following criteria: + +1. `get_t1d_primary_diagnosis()`, which relies on the hospital + diagnoses extracted from `lpr_diag` (LPR2) and `diagnoser` (LPR3) in + the previous steps. +2. `get_only_insulin_purchases()` which relies on the GLD purchases + from Lægemiddelsdatabasen to get patients where all GLD purchases + are insulin only. +3. `get_majority_of_t1d_diagnoses()` (as compared to T2D diagnoses) + which again relies on primary hospital diagnoses from LPR. +4. `get_insulin_purchase_within_180_days()` which relies on both + diagnosis from LPR and GLD purchases from Lægemiddelsdatabasen. +5. `get_insulin_is_two_thirds_of_gld_doses` which relies on the GLD + purchases from Lægemiddelsdatabasen. + +Note the following hierarchy in first function above: First, the +function checks whether the individual has primary diagnoses from +endocrinological specialty. If that's the case for a given person, the +check of whether they have a majority of T1D primary diagnoses are based +on data from endocrinological specialty. If that's not the case, the +check will be based on primary diagnoses from medical specialties. + +#### Type 2 classification + +As described in the `vignette("design")`, individuals not classified as +type 1 cases are classified as type 2 cases. + +## Output + +The output of the OSDC algorithm is a `data.frame` which includes four +columns: + +1. **PNR**: The pseudonymised social security number of individuals in + the diabetes population (one row per individual) +2. **stable_inclusion_date**: The *stable* inclusion date (i.e., the + raw date mutated so only individuals included in the time-period + where data coverage is sufficient to make incident cases + reliable)[^1] + +3. **raw_inclusion_date**: The *raw* inclusion date (i.e., the date of + the second inclusion event as described in the [Extracting the + diabetes population](#extracting-diabetes-population) section above) +4. **diabetes_type** The classified diabetes type + +[^1]: For more information on the "raw" versus "stable" inclusion date, + see `vignette("design")`. + + + +For an example, see below. + +| PNR | stable_inclusion_date | raw_inclusion_date | diabetes_type | +|------------|-----------------------|--------------------|---------------| +| 0000000001 | 2020-01-01 | 2020-01-01 | T1D | +| 0000000004 | NULL | 1995-04-19 | T2D | + +: Example rows of the `data.frame` output of the osdc package. + +The individuals `0000000001` and `0000000004` have been classified as +having diabetes (`T1D` and `T2D`, respectively). `0000000004` is +classified as having type 1 diabetes (T1D) with an inclusion date of +`2020-01-01`. Since this date is within a time-period of sufficient data +coverage, the column `stable_inclusion_date` is populated with the same +date as `raw_inclusion_date`. + +The individual in the second row, `0000000004` is classified as having +type 2 diabetes `T2D` with an inclusion date of `1995-19-04`. Since 1995 +is within a time-period of insufficient data coverage, +`stable_inclusion_date` is `NULL`. However, `raw_inclusion_date` still +contains the inclusion date of this individual. + + diff --git a/vignettes/images/function-flow-classification.png b/vignettes/images/function-flow-classification.png new file mode 100644 index 0000000..887bcd5 Binary files /dev/null and b/vignettes/images/function-flow-classification.png differ diff --git a/vignettes/images/function-flow-classification.puml b/vignettes/images/function-flow-classification.puml new file mode 100644 index 0000000..9851a8f --- /dev/null +++ b/vignettes/images/function-flow-classification.puml @@ -0,0 +1,34 @@ +@startuml function-flow-classification +!theme cerulean-outline + + +'Diabetes type classification + + action "get_diagnosis_date()" as diagnosis_date + + rectangle Classification { + action "get_has_t1d_primary_diagnosis()" as t1d_diagnosis + action "get_only_insulin_purchases()" as only_insulins + action "get_majority_of_t1d_primary_diagnoses()" as t1d_diagnosis_majority + action "get_insulin_purchase_within_180_days()" as insulin_within_180_days + action "get_insulin_is_two_thirds_of_gld_doses()" as insulin_is_two_thirds + } + + diagnosis_date --> t1d_diagnosis + t1d_diagnosis -l-> only_insulins + only_insulins -d-> t1d_diagnosis_majority + t1d_diagnosis_majority -r-> insulin_within_180_days + insulin_within_180_days -r-> insulin_is_two_thirds + + +@enduml diff --git a/vignettes/images/function-flow-population.png b/vignettes/images/function-flow-population.png new file mode 100644 index 0000000..42932d3 Binary files /dev/null and b/vignettes/images/function-flow-population.png differ diff --git a/vignettes/images/function-flow-population.puml b/vignettes/images/function-flow-population.puml new file mode 100644 index 0000000..885081e --- /dev/null +++ b/vignettes/images/function-flow-population.puml @@ -0,0 +1,92 @@ +@startuml function-flow-population +!theme cerulean-outline + + +hide <> stereotype +hide <> stereotype + +'Diabetes population extraction + together { + database sssy + database sysi + database lpr_diag + database lpr_adm + database lmdb + database lab_forsker + database kontakter + database diagnoser + database bef + } + together { + action "join_lpr2()" as lpr2 + action "join_lpr3()" as lpr3 + } + + action "get_potential_pcos()" as pcos + action "get_wld_purchases()" as wld + + together { + action "exclude_pregnancy()" as ex_pregnancy <> + action "exclude_wld_purchases()" as ex_wld <> + action "exclude_potential_pcos()" as ex_pcos <> + } + together { + action "include_diabetes_diagnoses()" as in_diagnoses <> + action "include_hba1c()" as in_hba1c <> + action "include_podiatrist_services()" as in_podiatrist <> + action "include_gld_purchases()" as in_gld <> + action "get_pregnancy_dates()" as pregnancy + } + + action "join_inclusion()" as join_inclusion + action "get_diagnosis_date()" as diagnosis_date + +'join lpr + lpr_diag --> lpr2 + lpr_adm --> lpr2 + kontakter --> lpr3 + diagnoser --> lpr3 + +'inclusion: podiatrist services + sssy --> in_podiatrist + sysi --> in_podiatrist + in_podiatrist --> join_inclusion + +'inclusion: hba1c + lab_forsker --> in_hba1c + in_hba1c --> ex_pregnancy + ex_pregnancy --> join_inclusion + +'inclusion: gld purchases + lmdb --> in_gld + in_gld --> ex_pcos + ex_pcos --> ex_wld + ex_wld --> ex_pregnancy + +'inclusion: diabetes diagnoses + lpr2 --> in_diagnoses + lpr3 --> in_diagnoses + in_diagnoses --> join_inclusion + +'helper functions + lpr2 --> pregnancy + lpr3 --> pregnancy + pregnancy --> ex_pregnancy + lmdb --> wld + wld --> ex_wld + bef --> pcos + in_gld --> ex_pcos + pcos --> ex_pcos + join_inclusion --> diagnosis_date + +@enduml \ No newline at end of file diff --git a/vignettes/images/function-flow.png b/vignettes/images/function-flow.png index 22cc065..3f30c2c 100644 Binary files a/vignettes/images/function-flow.png and b/vignettes/images/function-flow.png differ diff --git a/vignettes/images/function-flow.puml b/vignettes/images/function-flow.puml index b3b3c51..e46d901 100644 --- a/vignettes/images/function-flow.puml +++ b/vignettes/images/function-flow.puml @@ -1,86 +1,8 @@ @startuml function-flow -!theme cerulean-outline - - -hide <> stereotype -hide <> stereotype - -card classify_diabetes_status() as cd { - together { - database sssy - database sysi - database lpr_diag - database lpr_adm - database lmdb - database lab_forsker - database kontakter - database diagnoser - database bef - } - - action "get_pregnancy_dates()" as pregnancy - action "get_potential_pcos()" as pcos - action "get_diagnosis_date()" as diagnosis_date - action "join_lpr2()" as lpr2 - action "join_lpr3()" as lpr3 - - together { - action "exclude_pregnancy()" as ex_pregnancy <> - action "exclude_purchases_of_weight_loss_drugs()" as ex_wld <> - action "exclude_potential_pcos()" as ex_pcos <> - } - - together { - action "include_hba1c()" as in_hba1c <> - action "include_diabetes_diagnosis()" as in_diagnosis <> - action "include_podiatrist_services()" as in_podiatrist <> - action "include_purchases_gld()" as in_gld <> - } - - lpr_diag --> lpr2 - lpr_adm --> lpr2 - kontakter --> lpr3 - diagnoser --> lpr3 - - lab_forsker --> in_hba1c - in_hba1c --> ex_pregnancy - - lpr2 --> pregnancy - lpr3 --> pregnancy - pregnancy -> ex_pregnancy - - lpr2 --> in_diagnosis - lpr3 --> in_diagnosis - - sssy --> in_podiatrist - sysi --> in_podiatrist - - lmdb --> in_gld - in_gld --> ex_pregnancy - in_gld --> ex_wld - - bef --> pcos - in_gld --> ex_pcos - pcos --> ex_pcos - - ex_wld --> diagnosis_date - ex_pregnancy --> diagnosis_date - ex_pcos --> diagnosis_date - in_podiatrist --> diagnosis_date - in_diagnosis --> diagnosis_date +card classify_diabetes() as cd { + !include function-flow-population.puml + !include function-flow-classification.puml } -@enduml + +@enduml \ No newline at end of file