Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: 📝 expand on inclusions and exclusions #133

Open
wants to merge 24 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
90c7492
Fleshed out and updated include_gld_purchases() flow documentation
Sep 18, 2024
0504532
Added description of podiatrist services function flow
Sep 18, 2024
7777f36
Reformated some GLD text, added HbA1c and started on pregnancy dates
Sep 18, 2024
6382624
Reworded include_hba1c section
Sep 19, 2024
4fd5903
Added lpr-joins, started on describing lpr processing
Sep 19, 2024
29fea86
Finished LPR/diagnosis part of function flow
Sep 19, 2024
f9d7661
fixed a new things to describe LPR3 processing
Sep 19, 2024
9a05d81
specified that only primary diagnoses go into type classification
Sep 19, 2024
f03a4da
Update vignettes/function-flow.Rmd
Aastedet Sep 19, 2024
bc889d4
switched the order of inclusion sections and mentioned that some of t…
Sep 19, 2024
8d60bd0
Merge branch 'update-function-flow' of https://github.com/steno-aarhu…
Sep 19, 2024
9a74ea0
Merge branch 'main' into update-function-flow
Aastedet Sep 20, 2024
7525b60
fixed spec to speciale variable name
Sep 20, 2024
25db86d
Merge branch 'update-function-flow' of https://github.com/steno-aarhu…
Sep 20, 2024
092824e
Removed "name" or "vnr" variables from GLD function flow.
Sep 20, 2024
20f5886
Updates join_lpr function description to filter to necessary diagnoses.
Sep 20, 2024
61b5d27
Removed section on weightloss drugs, since we're no longer including …
Sep 20, 2024
7b9738d
Update vignettes/function-flow.Rmd
Aastedet Sep 20, 2024
3a95d4f
Added description of exclude_potential_pcos()
Sep 20, 2024
f663844
Merge branch 'update-function-flow' of https://github.com/steno-aarhu…
Sep 20, 2024
fe257a6
Renamed some variables.
Sep 20, 2024
4bba18e
Added censoring/exclusion function description
Sep 20, 2024
7cca920
Added correct diagnoses to filter to in lpr_join() functions.
Sep 27, 2024
b40412c
changed specialty values to align with the PR with a refactored creat…
Sep 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
282 changes: 215 additions & 67 deletions vignettes/function-flow.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -97,98 +97,246 @@ library(dplyr)
library(osdc)
```

#### HbA1c tests above the diagnosis cut-off value (48 mmol/mol or 6.5%)

The function `include_hba1c()` uses `lab_forsker` as the input data to
extract all events of HbA1c tests above the diagnosis cut-off value.

Since the HbA1c diagnosis cut-off value depends on the kind of test that is
used, the inclusion event is defined as follows:

- For HbA1c IFCC (NPU03835), we include values \>= 6.5 %.
- For HbA1c DCCT (NPU27300), we include values \>= 48 mmol/mol.

```{r, echo=FALSE}
algorithm |>
filter(name=="hba1c") |>
knitr::kable(caption = "Algorithm used in the implementation for including HbA1c.")
```

#### Hospital diagnosis of diabetes
#### Hospital diagnoses

**Joining LPR2 and LPR3 data**

The helper functions `join_lpr2()` and `join_lpr3()` join records of
diagnoses to administrative information in LPR2-formatted and
LPR3-formatted data, respectively.

`join_lpr2()` takes `lpr_diag` and `lpr_adm` as inputs, filters to the
necessary diagnoses (`c_diag` starting with "DO0[0-6]", "DO8[0-4]",
"DZ3[37]", "DE1[0-4]", "249", or "250"), joins the required information
by record number (`recnum`), and outputs a `data.frame` with the
following variables:

- `pnr`: identifier variable
- `date`: date of the recorded diagnosis (renamed from `d_inddto`)
- `specialty`: department specialty (renamed from `c_spec`)
- `diagnosis`: diagnosis code (renamed from `c_diag`)
- `diagnosis_type`: diagnosis type (renamed from `c_diagtype`)

`join_lpr3()` takes `diagnoser` and `kontakter` as inputs, filters to
the necessary diagnoses (`diagnosekode` starting with "DO0[0-6]",
"DO8[0-4]", "DZ3[37]" or "DE1[0-4]"), joins the required information by
record number (`dw_ek_kontakt`), and outputs a `data.frame` with the
following variables:

- `pnr`: identifier variable (renamed from `cpr`)
- `date`: date of the recorded diagnosis (renamed from `dato_start`)
- `specialty`: department specialty (renamed from `hovedspeciale_ans`)
- `diagnosis`: diagnosis code (renamed from `diagnosekode`)
- `diagnosis_type`: diagnosis type (renamed from `diagnosetype`)
- `diagnosis_retracted`: if the diagnosis was later retracted (renamed
from `senere_afkraeftet`)

These outputs are passed to `include_diabetes_diagnoses()` (and to
`get_pregnancy_dates()`, see exclusion events) for further processing
below.

**Processing of diabetes diagnoses**

The function `include_diabetes_diagnoses()` uses the hospital contacts
from LPR2 and 3 to include all dates of diabetes diagnoses. Diabetes
diagnoses from both ICD 8 and ICD 10 are included.

This function contains two helper functions:

- `keep_diabetes_icd10()`
- `keep_diabetes_icd8()`

<!-- TODO: Add details on how this filtering should be done, e.g., diagnosis codes -->

<!-- TODO: Which specific ICD 8 and 10 codes are included? -->
from LPR2 and LPR3 to include all dates of diabetes diagnoses to use for
inclusion, as well as additional information needed to classify diabetes
type. Diabetes diagnoses from both ICD-8 and ICD-10 are included.

The function takes the outputs of `join_lpr2()` and `join_lpr3()` as
inputs and processes each input separately to generate the following
internal variables:

- From `join_lpr2`:
- `pnr`: identifier variable
- `date`: dates of all included diabetes diagnoses:
- registered as primary (A) or secondary (B) diagnoses, regardless
of type or department:
- `diagnosis` starts with "DE1[0-4]", "249" or "250", and
`diagnosis_type` is either "A" or "B"
- `is_primary`: Define whether the diagnosis was a primary
diagnosis (`diagnosis_type` == "A")
- `is_t1d`: Define whether the diagnosis was T1D-specific
(`diagnosis` starts with "DE10" or "249")
- `is_t2d`: Define whether the diagnosis was T2D-specific
(`diagnosis` starts with "DE11" or "250")
- `department`: Define whether the diagnosis was made made by an
endocrinological (`specialty` == 8 -\> `department` ==
"endocrinology") or other medical department (`specialty` \< 8
or 9-30 -\> `department` == "other medical")
- From `join_lpr3()`:
- `pnr`: identifier variable
- `date`: dates of all included diabetes diagnoses:
- registered as primary (A) or secondary (B) diagnoses, regardless
of type or department, but exclude retracted diagnoses:
- `diagnosis` starts with "DE1[0-4]", `diagnosis_type` is
either "A" or "B" and `diagnosis_retracted` == "Nej"
- `is_primary`: Define whether the diagnosis was a primary
diagnosis (`diagnosis_type` == "A")
- `is_t1d`: Define whether the diagnosis was T1D-specific
(`diagnosis` starts with "DE10")
- `is_t2d`: Define whether the diagnosis was T2D-specific
(`diagnosis` starts with "DE11")
- `department`: Define whether the diagnosis was made made by an
endocrinological department (`specialty` == "medicinsk
endokrinologi" -\> `department` == "endocrinology") or other
medical department (`specialty` either "Blandet medicin og
kirurgi", "Intern medicin", "Geriatri", "Hepatologi",
"Hæmatologi", "Infektionsmedicin", "Kardiologi", "Medicinsk
allergologi", "Medicinsk gastroenterologi", "Medicinsk
lungesygdomme", "Nefrologi", "Reumatologi", "Palliativ medicin",
"Akut medicin", "Dermato-venerologi", "Neurologi", "Onkologi",
"Fysiurgi", or "Tropemedicin" -\> `department` == "other
medical")

Internally, these intermediate results are combined and processed
together. And ultimately, `include_diabetes_diagnoses()` outputs a
single `data.frame` with the following variables (up to two rows per
individual):

- `pnr`: identifier variable
- `dates`: dates of the first and second hospital diabetes diagnosis
- `n_t1d_endocrinology`: number of type 1 diabetes-specific primary
diagnosis codes from endocrinological departments
- `n_t2d_endocrinology`: number of type 2 diabetes-specific primary
diagnosis codes from endocrinological departments
- `n_t1d_medical`: number of type 1 diabetes-specific primary
diagnosis codes from medical departments
- `n_t2d_medical`: number of type 2 diabetes-specific primary
diagnosis codes from medical departments

This output is passed to the `join_inclusions()` function, where the
`dates` variable is used for the final step of the inclusion process.
The variables of counts of diabetes type-specific primary diagnoses are
carried over for the subsequent classification of diabetes type,
initially as inputs to the `get_t1d_primary_diagnosis()` and
`get_majority_of_t1d_diagnoses()` functions.

#### Diabetes-specific podiatrist services

The function `include_podiatrist_services()` uses `sysi` or `sssy` as
input to extract the dates of all diabetes-specific podiatrist services.

<!-- TODO: Add details on how this filtering should be done -->
These dates are extracted by filtering values beginning with "54" in the
`speciale` variable of the `sssy` and `sysi` registers by default
(alternatively, the function can take the `spec2` variable as input
instead, if that is the data available to the user). In addition,
services provided to a child of the individual (`barnmak` != 0) are
excluded using the `barnmak` variable. An internal helper function
`get_unique_honuge_dates()` is applied to generate a proper date
variable based on the year-week (wwyy-formatted) variable (`honuge`)
found in the raw data, and de-duplicates multiple services registered on
the same date.

#### GLD purchases
`include_podiatrist_services()` outputs a 2-column data frame with up to
two rows for each individual, containing the following variables:

The function `include_gld_purchases()` uses `lmdb` to extract the dates
of all GLD purchases (from 1997 onwards).
- `pnr`: identifier variable
- `dates`: the dates of the first and second diabetes-specific
podiatrist record

<!-- TODO: Add details on how this filtering should be done -->
The output is passed to the `join_inclusions()` function for the final
step of the inclusion process.

<!-- TODO: Add this + link to resource "For details about this, see [link]." -->
#### HbA1c tests above the diagnosis cut-off value (48 mmol/mol or 6.5%)

### Exclusion events
The function `include_hba1c()` uses `lab_forsker` as the input data to
extract the dates of all elevated HbA1c test results, using the
appropriate cut-offs:

#### HbA1c tests and GLD purchases during pregnancy
- IFCC units: `analysiscode` NPU27300, any `value` $\geq$ 48 mmol/mol
- DCCT units: `analysiscode` NPU03835: any `value` $\geq$ 6.5% .

The function `exclude_pregnancy()` uses diagnoses from LPR2 or LPR3 as
input and is used to exclude both HbA1c tests and GLD purchases during
pregnancy.
```{r, echo=FALSE}
algorithm |>
filter(name=="hba1c") |>
knitr::kable(caption = "Algorithm used in the implementation for including HbA1c.")
```

Internally, this relies on the function `get_pregnancy_dates()` that
contains the following three helper functions:
Multiple elevated results on the same day within each individual are
deduplicated, to account for the same test result often being reported
twice (one for IFCC, one for DCCT units).

- `calculate_pregnancy_index_date_for_mc_visits_wo_end_date()` (this
might be removed with the inclusion of the birth register)
- `get_pregnancy_end_dates()`: Keep maternal care visits with an end
date and drop visits between 40 weeks before end date and 12 weeks
after end date.
- `get_maternal_care_visit_dates_without_end_date()`: Uses the output
from `get_pregnancy_end_dates()` which identifies maternal care
visits *with* end dates to derive maternal care visits *without* end
dates. below.
`include_hba1c()` outputs a 2-column data frame containing the following
variables:

<!-- TODO: What is done with the mc visits without end dates then? -->
- `pnr`: identifier variable
- `dates`: the dates of all elevated HbA1c test results

<!-- TODO: Add details on how this filtering should be done -->
The output is passed to the `exclude_pregnancy()` function for censoring
of elevated results due to potential gestational diabetes (see below).

#### GLD purchases

#### Glucose-lowering brand drugs for weight loss
The function `include_gld_purchases()` uses `lmdb` to extract the dates
of all GLD purchases.

These dates are extracted by including all values beginning with "A10"
in the `atc` variable of the `lmdb` register. Since the diagnosis code
data on pregnancies (see below) is insufficient to perform censoring
prior to 1997, `include_gld_purchases()` only extracts dates from 1997
onward by default (if Medical Birth Register data is available to use
for censoring, the extraction window can be extended).

This function outputs a `data.frame` with the following variables needed
later in the classification part of the function flow:

- `pnr`: identifier variable
- `date`: dates of all purchases of GLD (renamed from `eksd`)
- `atc`: type of drug
- `contained_doses`: amount purchased, in number of defined daily
doses (DDD). Calculated as `volume` (doses contained in the
purchased package) times `apk` (number of packages purchased)
- `indication_code`: indication code of the prescription (renamed from
`indo`)

These events are then passed to a chain of exclusion functions:
`exclude_wld_purchases()`, `exclude_potential_pcos()`,
`exclude_pregnancy()` described in the sections below.

The function `exclude_wld_purchases()` uses lmdb as input and excludes
the brand drugs Saxenda and Wegovy.
### Exclusion events

<!-- TODO: Add details on how this filtering should be done -->
#### Metformin purchases potentially for the treatment of polycystic ovary syndrome

#### Metformin purchases for women below age 40
The function `exclude_potential_pcos()` takes the output from
`include_gld_purchases()` and `bef` (information on sex and date of
birth) as inputs and censors (filters out) all purchases of metformin in
women below age 40 at the date of purchase (`atc` = "A10BA02" & `sex` =
"woman" & date at purchase (`date`-`date_of_birth`) \< 40 years) or an
indication code suggesting treatment of polycystic ovary syndrome (`atc`
= "A10BA02" & `sex` = "woman" & `indication_code` either "0000092",
"0000276" or "0000781").

The function `exclude_potential_pcos()` as input to exclude all
purchases of metformin by women below age 40 (i.e., \<= 39 years old) at
the date of purchase. It relies on `bef` as input.
After these exclusions are made, the output is passed to
`exclude_pregnancy()` for further censoring, described below:

This function contains two helper functions:
#### HbA1c tests and GLD purchases during pregnancy

- `keep_women()`
- `drop_age_40_below()`
The function `exclude_pregnancy()` takes the combined outputs from
`join_lpr2()`, `join_lpr3()`, `include_hba1c()`, and
`exclude_potential_pcos()` and uses diagnoses from LPR2 or LPR3 to
exclude both elevated HbA1c tests and GLD purchases during pregnancy, as
these may be due to gestational diabetes, rather than type 1 or type 2
diabetes.

<!-- TODO: Add details on how this filtering should be done -->
Internally, this relies on the function `get_pregnancy_dates()` that
uses diagnoses registered in the National Patient Register to extract
the dates of all pregnancy ending (live births or miscarriages). These
are identified by `diagnosis` values beginning with "DO0[0-6]",
"DO8[0-4]" or "DZ3[37]". The dates output by`get_pregnancy_dates()\` are
used to exclude all inclusion events registered between 40 weeks before
and 12 weeks after a pregnancy ending.

After these exclusion functions have been applied, the output serves as
inputs to two sets of functions:

1. the censored HbA1c and GLD data are passed to the
`join_inclusions()` function for the final step of the inclusion
process.
2. the censored GLD data is passed to the
`get_only_insulin_purchases()`,
`get_insulin_purchases_within_180_days()`, and
`get_insulin_is_two_thirds_of_gld_doses()` helper functions for the
classification of diabetes type.

### Get diagnosis date

Expand Down Expand Up @@ -233,8 +381,8 @@ OSDC algorithm includes the following criteria:
diagnoses extracted from `lpr_diag` (LPR2) and `diagnoser` (LPR3) in
the previous steps.
2. `get_only_insulin_purchases()` which relies on the GLD purchases
from Lægemiddelsdatabasen to get patients where all GLD purchases
are insulin only.
from Lægemiddeldatabasen to get patients where all GLD purchases are
insulin only.
3. `get_majority_of_t1d_diagnoses()` (as compared to T2D diagnoses)
which again relies on primary hospital diagnoses from LPR.
4. `get_insulin_purchase_within_180_days()` which relies on both
Expand Down