Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: ✨ initial draft of functions to classify diabetes type #75

Merged
merged 48 commits into from
Jun 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
b3d5f6d
docs: :sparkles: initial draft of diabetes type functionality flow
signekb Apr 12, 2024
328516b
fix: :fire: remove old figure experiments
signekb Apr 12, 2024
6f7b6c2
style: remove unnecessary "and" and commas from figure
signekb Apr 12, 2024
8dc6ae1
docs: update output of OSDC to include two inclusion dates (stable an…
signekb Apr 17, 2024
af93e94
fix: add oxford comma to header
signekb Apr 17, 2024
ef92306
fix: describe classifying steps as "filters" with "criteria"
signekb Apr 17, 2024
18951d7
docs: minor text fixes (add link and "the")
signekb Apr 17, 2024
9f45a1f
fix: update classify diabetes type flow chart based on feedback from …
signekb Apr 17, 2024
332f421
fix: specify that we use the primary diagnosis for classification
signekb Apr 17, 2024
b157b5a
fix: minor text edit to make sentence clearer
signekb Apr 17, 2024
6e9ddaa
docs: add description of example output table
signekb Apr 17, 2024
3084d2c
fix: add missing header in output example table
signekb Apr 17, 2024
175fa3d
style: :lipstick: format
signekb Apr 17, 2024
a3655b1
Merge branch 'docs/functionality-flow-diabetes-population' into docs/…
signekb Apr 17, 2024
f6bebfe
style:
signekb Apr 17, 2024
80fc885
Merge branch 'docs/functionality-flow-diabetes-population' into docs/…
signekb Apr 17, 2024
12fd1d1
docs: elaborate on hierarchy of diagnoses from endocrinological and m…
signekb Apr 17, 2024
33d764e
Merge branch 'docs/functionality-flow-diabetes-population' into docs/…
signekb Apr 25, 2024
6078cc7
Apply suggestions from code review
signekb Apr 26, 2024
57115bf
Merged origin/main into docs/functionality-flow-classify-diabetes-type
lwjohnst86 Apr 27, 2024
dadf568
Merge branch 'docs/functionality-flow-diabetes-population' of https:/…
lwjohnst86 Apr 27, 2024
f2f29fa
Merge branch 'docs/functionality-flow-diabetes-population' of https:/…
lwjohnst86 Apr 27, 2024
4295251
Apply suggestions from code review
signekb May 16, 2024
48c69bf
Merge branch 'main' into docs/functionality-flow-classify-diabetes-type
signekb May 16, 2024
b978ce6
docs: :fire: remove `_status` from `classify_diabetes_status()`
signekb May 16, 2024
688fd64
docs: :fire: remove mentions of "components"
signekb May 16, 2024
b6205d9
Update vignettes/function-flow.Rmd
signekb May 16, 2024
56dda95
Merge branch 'main' into docs/functionality-flow-classify-diabetes-type
signekb May 16, 2024
81b2561
docs: :memo: add register abbreviations based on `variable_description`
signekb May 16, 2024
80d3eab
docs: :fire: remove details about filter 1
signekb May 16, 2024
a0f7151
docs: :memo: align links to other vignettes
signekb May 17, 2024
616e105
feat: add classification to function-flow
signekb May 21, 2024
1ed60d1
feat: add function that join inclusion events
signekb May 21, 2024
ff87125
style: :art: refactor arrows and add comments to create a clearer str…
signekb May 21, 2024
7c77923
feat: update structure (arrows and together) to correct arrows and la…
signekb May 21, 2024
0266cee
style: :art: add black font to card and rectangle
signekb May 21, 2024
3dc29b7
feat: regenerate png from puml
signekb May 21, 2024
89dfe0b
docs: add name og brand drugs Saxenda and Wegovy
signekb May 30, 2024
cced272
fix: diagnosis -> diagnoses in inclusion function
signekb May 30, 2024
8662ce4
Merge branch 'main' into docs/functionality-flow-classify-diabetes-type
signekb May 30, 2024
950641b
docs: rename wld function; add data source and brand drug names
signekb May 30, 2024
9dacf9a
docs: rewrite of classification to omit the filter distinction
signekb May 30, 2024
214f0e5
docs: remove backticks from osdc package
signekb May 30, 2024
a6e7287
docs: :sparkles: create partial function flows while keeping the enti…
signekb May 30, 2024
f465a63
docs: add separate section for population extraction and fix header l…
signekb May 30, 2024
8c45dbc
docs: :fire: remove old classification puml
signekb May 30, 2024
2cf719c
Merge branch 'main' into docs/functionality-flow-classify-diabetes-type
signekb Jun 14, 2024
9c5759b
docs: :pencil2: very minor edits and formatting fixes
lwjohnst86 Jun 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
164 changes: 137 additions & 27 deletions vignettes/function-flow.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -51,38 +51,55 @@ rules.

## Function flow

The OSDC algorithm - and thereby, the osdc package - contains one main
function that will classify individuals into those with either type 1 or
type 2 diabetes using the Danish registers:
`classify_diabetes_status()`. This function classifies those with
The osdc package contains one main function that classifies individuals
into those with either type 1 or type 2 diabetes using the Danish
registers: `classify_diabetes()`. This function classifies those with
diabetes (type 1 or 2) based on the Danish registers described in the
`vignette("design")`. All data sources are used as input for this
function. The specific inclusion and exclusion details are also
described in the `vignette("design")`.
`vignette("design")` and `vignette("data-sources")`. All data sources
are used as input for this function. The specific inclusion and
exclusion details are also described in the `vignette("design")`.

This results in the functionality flow for classifying diabetes status
seen below. All functions take a `data.frame` type object as input and
outputs the same type of object as the input object (a `data.frame`
type). For instance, if the input is a `data.table` object, the output
will also be a `data.table`.
seen below. This flow can be divided into two sections: extracting the
diabetes population and classifying diabetes type which we will detail
in the following sections.

All functions take a `data.frame` type object as input and outputs the
same type of object as the input object (a `data.frame` type). For
instance, if the input is a `data.table` object, the output will also be
a `data.table`.

![Flow of functions, as well as their required input registers, for
classifying diabetes status using the `osdc` package. Light blue and
classifying diabetes status using the osdc package. Light blue and
orange boxes represent filtering functions (inclusion and exclusion
events, respectively). Uncoloured boxes are helper functions that get or
extract a condition or joins data or function
outputs.](images/function-flow.png)

## Inclusion events
## Population extraction

In the following sections, we describe the functions used to extract the
diabetes population from the Danish registers. The functions are divided
into inclusion and exclusion events, and the final diagnosis date is
calculated based on these events.

![Flow of functions, as well as their required input registers, for
extracting the population with diabetes using the osdc package. Light
blue and orange boxes represent filtering functions (inclusion and
exclusion events, respectively). Uncoloured boxes are helper functions
that get or extract a condition or joins data or function
outputs.](images/function-flow-population.png)

### Inclusion events

### HbA1c tests above 48 mmol/mol
#### HbA1c tests above 48 mmol/mol

The function `include_hba1c()` uses `lab_forsker` as the input data to
extract all events of tests above 48 mmol/mol.

<!-- TODO: Add details on how this filtering should be done -->

### Hospital diagnosis of diabetes
#### Hospital diagnosis of diabetes

The function `include_diabetes_diagnoses()` uses the hospital contacts
from LPR2 and 3 to include all dates of diabetes diagnoses. Diabetes
Expand All @@ -97,14 +114,14 @@ This function contains two helper functions:

<!-- TODO: Which specific ICD 8 and 10 codes are included? -->

### Diabetes-specific podiatrist services
#### Diabetes-specific podiatrist services

The function `include_podiatrist_services()` uses `sysi` or `sssy` as
input to extract the dates of all diabetes-specific podiatrist services.

<!-- TODO: Add details on how this filtering should be done -->

### GLD purchases
#### GLD purchases

The function `include_gld_purchases()` uses `lmdb` to extract the dates
of all GLD purchases (from 1997 onwards).
Expand All @@ -113,9 +130,9 @@ of all GLD purchases (from 1997 onwards).

<!-- TODO: Add this + link to resource "For details about this, see [link]." -->

## Exclusion events
### Exclusion events

### HbA1c tests and GLD purchases during pregnancy
#### HbA1c tests and GLD purchases during pregnancy

The function `exclude_pregnancy()` uses diagnoses from LPR2 or LPR3 as
input and is used to exclude both HbA1c tests and GLD purchases during
Expand All @@ -138,16 +155,14 @@ contains the following three helper functions:

<!-- TODO: Add details on how this filtering should be done -->

### Glucose-lowering brand drugs for weight loss
#### Glucose-lowering brand drugs for weight loss

The function `exclude_purchases_of_weight_loss_drugs()` uses REGISTER as
input and excludes BRANDS.
The function `exclude_wld_purchases()` uses lmdb as input and excludes
the brand drugs Saxenda and Wegovy.

<!-- TODO: Add details on how this filtering should be done -->

<!-- TODO: Add data source and which brands are excluded -->

### Metformin purchases for women below age 40
#### Metformin purchases for women below age 40

The function `exclude_potential_pcos()` as input to exclude all
purchases of metformin by women below age 40 (i.e., \<= 39 years old) at
Expand All @@ -160,7 +175,7 @@ This function contains two helper functions:

<!-- TODO: Add details on how this filtering should be done -->

## Get diagnosis date
### Get diagnosis date

The function `get_diagnosis_date()` combines the outputs from the
inclusion and exclusion functions to get the final diagnosis date.
Expand All @@ -173,4 +188,99 @@ together with `join_diagnosis_dates()`.
Finally, the dates outside of the data coverage period are dropped with
`drop_diagnosis_dates_outside_coverage()` to end with a final diagnosis
date. For details on this censoring based on periods with insufficient
data coverage, see the `vignette("algorithm-logic")`.
data coverage, see the `vignette("design")`.

### Classifying the diabetes type

The next step of the OSDC algorithm classifies individuals from the
extracted diabetes population as having either T1D or T2D. As described
in the `vignette("design")`, individuals not classified as T1D cases are
classified as T2D cases.

The output is a `data.frame` that includes one row per individual in the
diabetes population: one column with their PNR, two columns with
inclusion dates (one "stable" date and one "raw" date - see the
`vignette("design")` for an elaboration on what that entails), and one
column with the diabetes type.

<!-- TODO: add a link to the specific section where this is described -->

![Flow of functions for classifying diabetes status using the `osdc`
package.](images/function-flow-classification.png)

#### Type 1 classification

The details for the classification of type 1 diabetes is described in
`vignette("design")`. To classify whether an individual has T1D, the
OSDC algorithm includes the following criteria:

1. `get_t1d_primary_diagnosis()`, which relies on the hospital
diagnoses extracted from `lpr_diag` (LPR2) and `diagnoser` (LPR3) in
the previous steps.
2. `get_only_insulin_purchases()` which relies on the GLD purchases
from Lægemiddelsdatabasen to get patients where all GLD purchases
are insulin only.
3. `get_majority_of_t1d_diagnoses()` (as compared to T2D diagnoses)
which again relies on primary hospital diagnoses from LPR.
4. `get_insulin_purchase_within_180_days()` which relies on both
diagnosis from LPR and GLD purchases from Lægemiddelsdatabasen.
5. `get_insulin_is_two_thirds_of_gld_doses` which relies on the GLD
purchases from Lægemiddelsdatabasen.

Note the following hierarchy in first function above: First, the
function checks whether the individual has primary diagnoses from
endocrinological specialty. If that's the case for a given person, the
check of whether they have a majority of T1D primary diagnoses are based
on data from endocrinological specialty. If that's not the case, the
check will be based on primary diagnoses from medical specialties.

#### Type 2 classification

As described in the `vignette("design")`, individuals not classified as
type 1 cases are classified as type 2 cases.

## Output

The output of the OSDC algorithm is a `data.frame` which includes four
columns:

1. **PNR**: The pseudonymised social security number of individuals in
the diabetes population (one row per individual)
2. **stable_inclusion_date**: The *stable* inclusion date (i.e., the
raw date mutated so only individuals included in the time-period
where data coverage is sufficient to make incident cases
reliable)[^1]
<!-- TODO: Specify this time-period: e.g., later than 1997 -->
3. **raw_inclusion_date**: The *raw* inclusion date (i.e., the date of
the second inclusion event as described in the [Extracting the
diabetes population](#extracting-diabetes-population) section above)
Comment on lines +254 to +256
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we should include this in the final output. If the data coverage isn't sufficient, we shouldn't give that to users so that they make their own choice without understanding the context. We give them the context with the given assumptions and limitations of this algorithm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aastedet ? :)

Copy link
Collaborator

@Aastedet Aastedet May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lwjohnst86 At the Steering Group meeting we decided to provide both, but the "stable_inclusion" being the clear default, while we are explicit to the user that the "raw_inclusion" is experimental/use-at-own-risk.

I'm not sure if that guides us towards an answer here though. Depending on the study design, the user might have a clear need for the "raw_inclusion" date.

I lean towards including both, but naming them to "inclusion_date" and something like "unstable_date" or "_date" to make it clear which is the default (and also obscuring the non-default variable name so users will have to read the documentation to know what the variable is).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, what about initial_inclusion_date and inclusion_date? Just wondering how easily a user might interpret "unstable".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about "initial". It sounds like it's something that's before something else? Like e.g., the first inclusion event.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe inclusion_date and inclusion_date_with_insufficient_data?

4. **diabetes_type** The classified diabetes type

[^1]: For more information on the "raw" versus "stable" inclusion date,
see `vignette("design")`.

<!-- TODO: Make sure this is the correct link - and add a link specific to the specific section where this is described -->

For an example, see below.

| PNR | stable_inclusion_date | raw_inclusion_date | diabetes_type |
|------------|-----------------------|--------------------|---------------|
| 0000000001 | 2020-01-01 | 2020-01-01 | T1D |
| 0000000004 | NULL | 1995-04-19 | T2D |

: Example rows of the `data.frame` output of the osdc package.

The individuals `0000000001` and `0000000004` have been classified as
having diabetes (`T1D` and `T2D`, respectively). `0000000004` is
classified as having type 1 diabetes (T1D) with an inclusion date of
`2020-01-01`. Since this date is within a time-period of sufficient data
coverage, the column `stable_inclusion_date` is populated with the same
date as `raw_inclusion_date`.

The individual in the second row, `0000000004` is classified as having
type 2 diabetes `T2D` with an inclusion date of `1995-19-04`. Since 1995
is within a time-period of insufficient data coverage,
`stable_inclusion_date` is `NULL`. However, `raw_inclusion_date` still
contains the inclusion date of this individual.

<!-- TODO: Specify the "stable" time-period: e.g., later than 1997 -->
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
34 changes: 34 additions & 0 deletions vignettes/images/function-flow-classification.puml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
@startuml function-flow-classification
!theme cerulean-outline
<style>
action, card, database, rectangle {
FontColor black
}
.inclusion {
BackgroundColor lightblue
}
.exclusion {
BackgroundColor orange
}
</style>

'Diabetes type classification

action "get_diagnosis_date()" as diagnosis_date

rectangle Classification {
action "get_has_t1d_primary_diagnosis()" as t1d_diagnosis
action "get_only_insulin_purchases()" as only_insulins
action "get_majority_of_t1d_primary_diagnoses()" as t1d_diagnosis_majority
action "get_insulin_purchase_within_180_days()" as insulin_within_180_days
action "get_insulin_is_two_thirds_of_gld_doses()" as insulin_is_two_thirds
}

diagnosis_date --> t1d_diagnosis
t1d_diagnosis -l-> only_insulins
only_insulins -d-> t1d_diagnosis_majority
t1d_diagnosis_majority -r-> insulin_within_180_days
insulin_within_180_days -r-> insulin_is_two_thirds


@enduml
Binary file added vignettes/images/function-flow-population.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
92 changes: 92 additions & 0 deletions vignettes/images/function-flow-population.puml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
@startuml function-flow-population
!theme cerulean-outline
<style>
action, card, database, rectangle {
FontColor black
}
.inclusion {
BackgroundColor lightblue
}
.exclusion {
BackgroundColor orange
}
</style>

hide <<inclusion>> stereotype
hide <<exclusion>> stereotype

'Diabetes population extraction
together {
database sssy
database sysi
database lpr_diag
database lpr_adm
database lmdb
database lab_forsker
database kontakter
database diagnoser
database bef
}
together {
action "join_lpr2()" as lpr2
action "join_lpr3()" as lpr3
}

action "get_potential_pcos()" as pcos
action "get_wld_purchases()" as wld

together {
action "exclude_pregnancy()" as ex_pregnancy <<exclusion>>
action "exclude_wld_purchases()" as ex_wld <<exclusion>>
action "exclude_potential_pcos()" as ex_pcos <<exclusion>>
}
together {
action "include_diabetes_diagnoses()" as in_diagnoses <<inclusion>>
action "include_hba1c()" as in_hba1c <<inclusion>>
action "include_podiatrist_services()" as in_podiatrist <<inclusion>>
action "include_gld_purchases()" as in_gld <<inclusion>>
action "get_pregnancy_dates()" as pregnancy
}

action "join_inclusion()" as join_inclusion
action "get_diagnosis_date()" as diagnosis_date

'join lpr
lpr_diag --> lpr2
lpr_adm --> lpr2
kontakter --> lpr3
diagnoser --> lpr3

'inclusion: podiatrist services
sssy --> in_podiatrist
sysi --> in_podiatrist
in_podiatrist --> join_inclusion

'inclusion: hba1c
lab_forsker --> in_hba1c
in_hba1c --> ex_pregnancy
ex_pregnancy --> join_inclusion

'inclusion: gld purchases
lmdb --> in_gld
in_gld --> ex_pcos
ex_pcos --> ex_wld
ex_wld --> ex_pregnancy

'inclusion: diabetes diagnoses
lpr2 --> in_diagnoses
lpr3 --> in_diagnoses
in_diagnoses --> join_inclusion

'helper functions
lpr2 --> pregnancy
lpr3 --> pregnancy
pregnancy --> ex_pregnancy
lmdb --> wld
wld --> ex_wld
bef --> pcos
in_gld --> ex_pcos
pcos --> ex_pcos
join_inclusion --> diagnosis_date

@enduml
Binary file modified vignettes/images/function-flow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading