steno-aarhus · Aastedet · Sep 18, 2024 · Sep 18, 2024 · Sep 18, 2024 · Sep 19, 2024
@@ -97,98 +97,246 @@ library(dplyr)
 library(osdc)
 ```
 
-#### HbA1c tests above the diagnosis cut-off value (48 mmol/mol or 6.5%)
-
-The function `include_hba1c()` uses `lab_forsker` as the input data to
-extract all events of HbA1c tests above the diagnosis cut-off value.
-
-Since the HbA1c diagnosis cut-off value depends on the kind of test that is
-used, the inclusion event is defined as follows:
-
--   For HbA1c IFCC (NPU03835), we include values \>= 6.5 %.
--   For HbA1c DCCT (NPU27300), we include values \>= 48 mmol/mol.
-
-```{r, echo=FALSE}
-algorithm |> 
-	filter(name=="hba1c") |>
-	knitr::kable(caption = "Algorithm used in the implementation for including HbA1c.")
-```
-
-#### Hospital diagnosis of diabetes
+#### Hospital diagnoses
+
+**Joining LPR2 and LPR3 data**
+
+The helper functions `join_lpr2()` and `join_lpr3()` join records of
+diagnoses to administrative information in LPR2-formatted and
+LPR3-formatted data, respectively.
+
+`join_lpr2()` takes `lpr_diag` and `lpr_adm` as inputs, filters to the
+necessary diagnoses (`c_diag` starting with "DO0[0-6]", "DO8[0-4]",
+"DZ3[37]", "DE1[0-4]", "249", or "250"), joins the required information
+by record number (`recnum`), and outputs a `data.frame` with the
+following variables:
+
+-   `pnr`: identifier variable
+-   `date`: date of the recorded diagnosis (renamed from `d_inddto`)
+-   `specialty`: department specialty (renamed from `c_spec`)
+-   `diagnosis`: diagnosis code (renamed from `c_diag`)
+-   `diagnosis_type`: diagnosis type (renamed from `c_diagtype`)
+
+`join_lpr3()` takes `diagnoser` and `kontakter` as inputs, filters to
+the necessary diagnoses (`diagnosekode` starting with "DO0[0-6]",
+"DO8[0-4]", "DZ3[37]" or "DE1[0-4]"), joins the required information by
+record number (`dw_ek_kontakt`), and outputs a `data.frame` with the
+following variables:
+
+-   `pnr`: identifier variable (renamed from `cpr`)
+-   `date`: date of the recorded diagnosis (renamed from `dato_start`)
+-   `specialty`: department specialty (renamed from `hovedspeciale_ans`)
+-   `diagnosis`: diagnosis code (renamed from `diagnosekode`)
+-   `diagnosis_type`: diagnosis type (renamed from `diagnosetype`)
+-   `diagnosis_retracted`: if the diagnosis was later retracted (renamed
+    from `senere_afkraeftet`)
+
+These outputs are passed to `include_diabetes_diagnoses()` (and to
+`get_pregnancy_dates()`, see exclusion events) for further processing
+below.
+
+**Processing of diabetes diagnoses**
 
 The function `include_diabetes_diagnoses()` uses the hospital contacts
-from LPR2 and 3 to include all dates of diabetes diagnoses. Diabetes
-diagnoses from both ICD 8 and ICD 10 are included.
-
-This function contains two helper functions:
-
--   `keep_diabetes_icd10()`
--   `keep_diabetes_icd8()`
-
-<!-- TODO: Add details on how this filtering should be done, e.g., diagnosis codes -->
-
-<!-- TODO: Which specific ICD 8 and 10 codes are included? -->
+from LPR2 and LPR3 to include all dates of diabetes diagnoses to use for
+inclusion, as well as additional information needed to classify diabetes
+type. Diabetes diagnoses from both ICD-8 and ICD-10 are included.
+
+The function takes the outputs of `join_lpr2()` and `join_lpr3()` as
+inputs and processes each input separately to generate the following
+internal variables:
+
+-   From `join_lpr2`:
+    -   `pnr`: identifier variable
+    -   `date`: dates of all included diabetes diagnoses:
+    -   registered as primary (A) or secondary (B) diagnoses, regardless
+        of type or department:
+        -   `diagnosis` starts with "DE1[0-4]", "249" or "250", and
+            `diagnosis_type` is either "A" or "B"
+    -   `is_primary`: Define whether the diagnosis was a primary
+        diagnosis (`diagnosis_type` == "A")
+    -   `is_t1d`: Define whether the diagnosis was T1D-specific
+        (`diagnosis` starts with "DE10" or "249")
+    -   `is_t2d`: Define whether the diagnosis was T2D-specific
+        (`diagnosis` starts with "DE11" or "250")
+    -   `department`: Define whether the diagnosis was made made by an
+        endocrinological (`specialty` == 8 -\> `department` ==
+        "endocrinology") or other medical department (`specialty` \< 8
+        or 9-30 -\> `department` == "other medical")
+-   From `join_lpr3()`:
+    -   `pnr`: identifier variable
+    -   `date`: dates of all included diabetes diagnoses:
+    -   registered as primary (A) or secondary (B) diagnoses, regardless
+        of type or department, but exclude retracted diagnoses:
+        -   `diagnosis` starts with "DE1[0-4]", `diagnosis_type` is
+            either "A" or "B" and `diagnosis_retracted` == "Nej"
+    -   `is_primary`: Define whether the diagnosis was a primary
+        diagnosis (`diagnosis_type` == "A")
+    -   `is_t1d`: Define whether the diagnosis was T1D-specific
+        (`diagnosis` starts with "DE10")
+    -   `is_t2d`: Define whether the diagnosis was T2D-specific
+        (`diagnosis` starts with "DE11")
+    -   `department`: Define whether the diagnosis was made made by an
+        endocrinological department (`specialty` == "medicinsk
+        endokrinologi" -\> `department` == "endocrinology") or other
+        medical department (`specialty` either "Blandet medicin og
+        kirurgi", "Intern medicin", "Geriatri", "Hepatologi",
+        "Hæmatologi", "Infektionsmedicin", "Kardiologi", "Medicinsk
+        allergologi", "Medicinsk gastroenterologi", "Medicinsk
+        lungesygdomme", "Nefrologi", "Reumatologi", "Palliativ medicin",
+        "Akut medicin", "Dermato-venerologi", "Neurologi", "Onkologi",
+        "Fysiurgi", or "Tropemedicin" -\> `department` == "other
+        medical")
+
+Internally, these intermediate results are combined and processed
+together. And ultimately, `include_diabetes_diagnoses()` outputs a
+single `data.frame` with the following variables (up to two rows per
+individual):
+
+-   `pnr`: identifier variable
+-   `dates`: dates of the first and second hospital diabetes diagnosis
+-   `n_t1d_endocrinology`: number of type 1 diabetes-specific primary
+    diagnosis codes from endocrinological departments
+-   `n_t2d_endocrinology`: number of type 2 diabetes-specific primary
+    diagnosis codes from endocrinological departments
+-   `n_t1d_medical`: number of type 1 diabetes-specific primary
+    diagnosis codes from medical departments
+-   `n_t2d_medical`: number of type 2 diabetes-specific primary
+    diagnosis codes from medical departments
+
+This output is passed to the `join_inclusions()` function, where the
+`dates` variable is used for the final step of the inclusion process.
+The variables of counts of diabetes type-specific primary diagnoses are
+carried over for the subsequent classification of diabetes type,
+initially as inputs to the `get_t1d_primary_diagnosis()` and
+`get_majority_of_t1d_diagnoses()` functions.
 
 #### Diabetes-specific podiatrist services
 
 The function `include_podiatrist_services()` uses `sysi` or `sssy` as
 input to extract the dates of all diabetes-specific podiatrist services.
 
-<!-- TODO: Add details on how this filtering should be done -->
+These dates are extracted by filtering values beginning with "54" in the
+`speciale` variable of the `sssy` and `sysi` registers by default
+(alternatively, the function can take the `spec2` variable as input
+instead, if that is the data available to the user). In addition,
+services provided to a child of the individual (`barnmak` != 0) are
+excluded using the `barnmak` variable. An internal helper function
+`get_unique_honuge_dates()` is applied to generate a proper date
+variable based on the year-week (wwyy-formatted) variable (`honuge`)
+found in the raw data, and de-duplicates multiple services registered on
+the same date.
 
-#### GLD purchases
+`include_podiatrist_services()` outputs a 2-column data frame with up to
+two rows for each individual, containing the following variables:
 
-The function `include_gld_purchases()` uses `lmdb` to extract the dates
-of all GLD purchases (from 1997 onwards).
+-   `pnr`: identifier variable
+-   `dates`: the dates of the first and second diabetes-specific
+    podiatrist record
 
-<!-- TODO: Add details on how this filtering should be done -->
+The output is passed to the `join_inclusions()` function for the final
+step of the inclusion process.
 
-<!-- TODO: Add this + link to resource "For details about this, see [link]." -->
+#### HbA1c tests above the diagnosis cut-off value (48 mmol/mol or 6.5%)
 
-### Exclusion events
+The function `include_hba1c()` uses `lab_forsker` as the input data to
+extract the dates of all elevated HbA1c test results, using the
+appropriate cut-offs:
 
-#### HbA1c tests and GLD purchases during pregnancy
+-   IFCC units: `analysiscode` NPU27300, any `value` $\geq$ 48 mmol/mol
+-   DCCT units: `analysiscode` NPU03835: any `value` $\geq$ 6.5% .
 
-The function `exclude_pregnancy()` uses diagnoses from LPR2 or LPR3 as
-input and is used to exclude both HbA1c tests and GLD purchases during
-pregnancy.
+```{r, echo=FALSE}
+algorithm |> 
+	filter(name=="hba1c") |>
+	knitr::kable(caption = "Algorithm used in the implementation for including HbA1c.")
+```
 
-Internally, this relies on the function `get_pregnancy_dates()` that
-contains the following three helper functions:
+Multiple elevated results on the same day within each individual are
+deduplicated, to account for the same test result often being reported
+twice (one for IFCC, one for DCCT units).
 
--   `calculate_pregnancy_index_date_for_mc_visits_wo_end_date()` (this
-    might be removed with the inclusion of the birth register)
--   `get_pregnancy_end_dates()`: Keep maternal care visits with an end
-    date and drop visits between 40 weeks before end date and 12 weeks
-    after end date.
--   `get_maternal_care_visit_dates_without_end_date()`: Uses the output
-    from `get_pregnancy_end_dates()` which identifies maternal care
-    visits *with* end dates to derive maternal care visits *without* end
-    dates. below.
+`include_hba1c()` outputs a 2-column data frame containing the following
+variables:
 
-<!-- TODO: What is done with the mc visits without end dates then? -->
+-   `pnr`: identifier variable
+-   `dates`: the dates of all elevated HbA1c test results
 
-<!-- TODO: Add details on how this filtering should be done -->
+The output is passed to the `exclude_pregnancy()` function for censoring
+of elevated results due to potential gestational diabetes (see below).
+
+#### GLD purchases
 
-#### Glucose-lowering brand drugs for weight loss
+The function `include_gld_purchases()` uses `lmdb` to extract the dates
+of all GLD purchases.
+
+These dates are extracted by including all values beginning with "A10"
+in the `atc` variable of the `lmdb` register. Since the diagnosis code
+data on pregnancies (see below) is insufficient to perform censoring
+prior to 1997, `include_gld_purchases()` only extracts dates from 1997
+onward by default (if Medical Birth Register data is available to use
+for censoring, the extraction window can be extended).
+
+This function outputs a `data.frame` with the following variables needed
+later in the classification part of the function flow:
+
+-   `pnr`: identifier variable
+-   `date`: dates of all purchases of GLD (renamed from `eksd`)
+-   `atc`: type of drug
+-   `contained_doses`: amount purchased, in number of defined daily
+    doses (DDD). Calculated as `volume` (doses contained in the
+    purchased package) times `apk` (number of packages purchased)
+-   `indication_code`: indication code of the prescription (renamed from
+    `indo`)
+
+These events are then passed to a chain of exclusion functions:
+`exclude_wld_purchases()`, `exclude_potential_pcos()`,
+`exclude_pregnancy()` described in the sections below.
 
-The function `exclude_wld_purchases()` uses lmdb as input and excludes
-the brand drugs Saxenda and Wegovy.
+### Exclusion events
 
-<!-- TODO: Add details on how this filtering should be done -->
+#### Metformin purchases potentially for the treatment of polycystic ovary syndrome
 
-#### Metformin purchases for women below age 40
+The function `exclude_potential_pcos()` takes the output from
+`include_gld_purchases()` and `bef` (information on sex and date of
+birth) as inputs and censors (filters out) all purchases of metformin in
+women below age 40 at the date of purchase (`atc` = "A10BA02" & `sex` =
+"woman" & date at purchase (`date`-`date_of_birth`) \< 40 years) or an
+indication code suggesting treatment of polycystic ovary syndrome (`atc`
+= "A10BA02" & `sex` = "woman" & `indication_code` either "0000092",
+"0000276" or "0000781").
 
-The function `exclude_potential_pcos()` as input to exclude all
-purchases of metformin by women below age 40 (i.e., \<= 39 years old) at
-the date of purchase. It relies on `bef` as input.
+After these exclusions are made, the output is passed to
+`exclude_pregnancy()` for further censoring, described below:
 
-This function contains two helper functions:
+#### HbA1c tests and GLD purchases during pregnancy
 
--   `keep_women()`
--   `drop_age_40_below()`
+The function `exclude_pregnancy()` takes the combined outputs from
+`join_lpr2()`, `join_lpr3()`, `include_hba1c()`, and
+`exclude_potential_pcos()` and uses diagnoses from LPR2 or LPR3 to
+exclude both elevated HbA1c tests and GLD purchases during pregnancy, as
+these may be due to gestational diabetes, rather than type 1 or type 2
+diabetes.
 
-<!-- TODO: Add details on how this filtering should be done -->
+Internally, this relies on the function `get_pregnancy_dates()` that
+uses diagnoses registered in the National Patient Register to extract
+the dates of all pregnancy ending (live births or miscarriages). These
+are identified by `diagnosis` values beginning with "DO0[0-6]",
+"DO8[0-4]" or "DZ3[37]". The dates output by`get_pregnancy_dates()\` are
+used to exclude all inclusion events registered between 40 weeks before
+and 12 weeks after a pregnancy ending.
+
+After these exclusion functions have been applied, the output serves as
+inputs to two sets of functions:
+
+1.  the censored HbA1c and GLD data are passed to the
+    `join_inclusions()` function for the final step of the inclusion
+    process.
+2.  the censored GLD data is passed to the
+    `get_only_insulin_purchases()`,
+    `get_insulin_purchases_within_180_days()`, and
+    `get_insulin_is_two_thirds_of_gld_doses()` helper functions for the
+    classification of diabetes type.
 
 ### Get diagnosis date
 
@@ -233,8 +381,8 @@ OSDC algorithm includes the following criteria:
     diagnoses extracted from `lpr_diag` (LPR2) and `diagnoser` (LPR3) in
     the previous steps.
 2.  `get_only_insulin_purchases()` which relies on the GLD purchases
-    from Lægemiddelsdatabasen to get patients where all GLD purchases
-    are insulin only.
+    from Lægemiddeldatabasen to get patients where all GLD purchases are
+    insulin only.
 3.  `get_majority_of_t1d_diagnoses()` (as compared to T2D diagnoses)
     which again relies on primary hospital diagnoses from LPR.
 4.  `get_insulin_purchase_within_180_days()` which relies on both