Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: ✨ initial draft of functions to classify diabetes type #75

Merged
merged 48 commits into from
Jun 14, 2024
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
b3d5f6d
docs: :sparkles: initial draft of diabetes type functionality flow
signekb Apr 12, 2024
328516b
fix: :fire: remove old figure experiments
signekb Apr 12, 2024
6f7b6c2
style: remove unnecessary "and" and commas from figure
signekb Apr 12, 2024
8dc6ae1
docs: update output of OSDC to include two inclusion dates (stable an…
signekb Apr 17, 2024
af93e94
fix: add oxford comma to header
signekb Apr 17, 2024
ef92306
fix: describe classifying steps as "filters" with "criteria"
signekb Apr 17, 2024
18951d7
docs: minor text fixes (add link and "the")
signekb Apr 17, 2024
9f45a1f
fix: update classify diabetes type flow chart based on feedback from …
signekb Apr 17, 2024
332f421
fix: specify that we use the primary diagnosis for classification
signekb Apr 17, 2024
b157b5a
fix: minor text edit to make sentence clearer
signekb Apr 17, 2024
6e9ddaa
docs: add description of example output table
signekb Apr 17, 2024
3084d2c
fix: add missing header in output example table
signekb Apr 17, 2024
175fa3d
style: :lipstick: format
signekb Apr 17, 2024
a3655b1
Merge branch 'docs/functionality-flow-diabetes-population' into docs/…
signekb Apr 17, 2024
f6bebfe
style:
signekb Apr 17, 2024
80fc885
Merge branch 'docs/functionality-flow-diabetes-population' into docs/…
signekb Apr 17, 2024
12fd1d1
docs: elaborate on hierarchy of diagnoses from endocrinological and m…
signekb Apr 17, 2024
33d764e
Merge branch 'docs/functionality-flow-diabetes-population' into docs/…
signekb Apr 25, 2024
6078cc7
Apply suggestions from code review
signekb Apr 26, 2024
57115bf
Merged origin/main into docs/functionality-flow-classify-diabetes-type
lwjohnst86 Apr 27, 2024
dadf568
Merge branch 'docs/functionality-flow-diabetes-population' of https:/…
lwjohnst86 Apr 27, 2024
f2f29fa
Merge branch 'docs/functionality-flow-diabetes-population' of https:/…
lwjohnst86 Apr 27, 2024
4295251
Apply suggestions from code review
signekb May 16, 2024
48c69bf
Merge branch 'main' into docs/functionality-flow-classify-diabetes-type
signekb May 16, 2024
b978ce6
docs: :fire: remove `_status` from `classify_diabetes_status()`
signekb May 16, 2024
688fd64
docs: :fire: remove mentions of "components"
signekb May 16, 2024
b6205d9
Update vignettes/function-flow.Rmd
signekb May 16, 2024
56dda95
Merge branch 'main' into docs/functionality-flow-classify-diabetes-type
signekb May 16, 2024
81b2561
docs: :memo: add register abbreviations based on `variable_description`
signekb May 16, 2024
80d3eab
docs: :fire: remove details about filter 1
signekb May 16, 2024
a0f7151
docs: :memo: align links to other vignettes
signekb May 17, 2024
616e105
feat: add classification to function-flow
signekb May 21, 2024
1ed60d1
feat: add function that join inclusion events
signekb May 21, 2024
ff87125
style: :art: refactor arrows and add comments to create a clearer str…
signekb May 21, 2024
7c77923
feat: update structure (arrows and together) to correct arrows and la…
signekb May 21, 2024
0266cee
style: :art: add black font to card and rectangle
signekb May 21, 2024
3dc29b7
feat: regenerate png from puml
signekb May 21, 2024
89dfe0b
docs: add name og brand drugs Saxenda and Wegovy
signekb May 30, 2024
cced272
fix: diagnosis -> diagnoses in inclusion function
signekb May 30, 2024
8662ce4
Merge branch 'main' into docs/functionality-flow-classify-diabetes-type
signekb May 30, 2024
950641b
docs: rename wld function; add data source and brand drug names
signekb May 30, 2024
9dacf9a
docs: rewrite of classification to omit the filter distinction
signekb May 30, 2024
214f0e5
docs: remove backticks from osdc package
signekb May 30, 2024
a6e7287
docs: :sparkles: create partial function flows while keeping the enti…
signekb May 30, 2024
f465a63
docs: add separate section for population extraction and fix header l…
signekb May 30, 2024
8c45dbc
docs: :fire: remove old classification puml
signekb May 30, 2024
2cf719c
Merge branch 'main' into docs/functionality-flow-classify-diabetes-type
signekb Jun 14, 2024
9c5759b
docs: :pencil2: very minor edits and formatting fixes
lwjohnst86 Jun 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 126 additions & 0 deletions vignettes/function-flow.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -159,3 +159,129 @@ the data coverage period are dropped to end with a final diagnosis date
using `get_final_diagnosis_date()`. For details on this censoring based
on the periods with insufficient data coverage, see the
`vignette("algorithm-logic")`.

## Classifying the diabetes type {#classifying-diabetes-type}
signekb marked this conversation as resolved.
Show resolved Hide resolved

The second component of the OSDC algorithm classifies individuals from
signekb marked this conversation as resolved.
Show resolved Hide resolved
the extracted diabetes population as having either T1D or T2D. The
output of this component is a `data.frame` that includes one row per
individual in the diabetes population: one column with their PNR, two
columns with inclusion dates (one "stable" date and one "raw" date - see
the [Description of algorithm contents & logic](algorithm_logic.Rmd)
signekb marked this conversation as resolved.
Show resolved Hide resolved
vignette for an elaboration on what that entails), and one column with
the diabetes type.

<!-- TODO: Make sure this is the correct link - and add a link specific to the specific section where this is described -->

This component will also have one user-facing function:
`classify_diabetes_type`. This function takes the output of
`extract_diabetes_population` as input.

### Type 1 classification

The classification of type 1 diabetes relies on the following two
filters. To be classified as having type 1 diabetes, individuals must
have:

1. At least one T1D primary diagnosis and only purchased insulins
(i.e., no purchases of any other type of GLDs), or
2. A majority of T1D primary diagnoses, purchased insulin within 180
days of diagnosis, and 2/3 of GLD doses are insulin.

signekb marked this conversation as resolved.
Show resolved Hide resolved
Functions for the above filters are described in the following sections.

![Flow of functions for classifying the diabetes type using the `osdc`
package. The dark grey box denotes the user-facing function, while the
remaining boxes are internal
functions.](images/classify-diabetes-type-functions.png)

#### Filter 1: Any T1D diagnoses and all GLD purchases are insulins

The first filter includes two criteria: 1) Whether individuals have any
T1D diagnoses and 2) whether all GLD purchases are insulin.
signekb marked this conversation as resolved.
Show resolved Hide resolved

Thus, this filter contains two functions:

1. `t1d_include_at_least_one_t1d_primary_diagnosis`, which relies on
the hospital diagnoses from DNPR extracted in component 1.
signekb marked this conversation as resolved.
Show resolved Hide resolved
2. `t1d_include_only_purchased_insulins` which relies on the GLD
purchases from Lægemiddelsdatabasen.

<!-- TODO: Add English translations for Lægemiddelsdatabasen -->
signekb marked this conversation as resolved.
Show resolved Hide resolved

#### Filter 2: Majority of T1D primary diagnoses, insulin within 180 days of diagnosis, and insulin constitutes 2/3 of all GLD doses

The second filter includes three criteria: 1) Whether individuals have a
majority of T1D primary diagnoses, 2) whether they purchased insulin
within 180 days of diagnosis, and 3) whether insulin constitutes 2/3 of
all their GLD doses.

This results in three functions:

1. `t1d_include_majority_of_t1d_primary_diagnoses` (as compared to T2D
signekb marked this conversation as resolved.
Show resolved Hide resolved
diagnoses) which again relies on primary hospital diagnoses from
DNPR.
2. `t1d_include_insulin_purchase_within_180_days_of_diagnosis` which
signekb marked this conversation as resolved.
Show resolved Hide resolved
relies on both diagnosis from DNPR and GLD purchases from
Lægemiddelsdatabasen.
3. `t1d_include_two_thirds_of_purchased_gld_doses_are_insulin` which
signekb marked this conversation as resolved.
Show resolved Hide resolved
relies on the GLD purchases from Lægemiddelsdatabasen.

Note the following hierarchy in first function,
`t1d_include_majority_of_t1d_primary_diagnoses`: First, the function
checks whether the individual has primary diagnoses from
endocrinological specialty. If yes, the check of whether they have a
majority of T1D primary diagnoses are based on data from
endocrinological specialty. If no, the check will be based on primary
diagnoses from medical specialties.

### Type 2 classification

As described in the [design](design.Rmd) vignette, individuals not
classified as type 1 cases are classified as type 2 cases.

## Output

The output of the second component, and therefore, the OSDC algorithm is
a `data.frame` which includes four columns:

1. **PNR**: The pseudonymised social security number of individuals in
the diabetes population (one row per individual)
2. **stable_inclusion_date**: The *stable* inclusion date (i.e., the
raw date mutated so only individuals included in the time-period
where data coverage is sufficient to make incident cases reliable)\*
<!-- TODO: Specify this time-period: e.g., later than 1997 -->
3. **raw_inclusion_date**: The *raw* inclusion date (i.e., the date of
the second inclusion event as described in the [Extracting the
diabetes population](#extracting-diabetes-population) section above)
Comment on lines +254 to +256
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we should include this in the final output. If the data coverage isn't sufficient, we shouldn't give that to users so that they make their own choice without understanding the context. We give them the context with the given assumptions and limitations of this algorithm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aastedet ? :)

Copy link
Collaborator

@Aastedet Aastedet May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lwjohnst86 At the Steering Group meeting we decided to provide both, but the "stable_inclusion" being the clear default, while we are explicit to the user that the "raw_inclusion" is experimental/use-at-own-risk.

I'm not sure if that guides us towards an answer here though. Depending on the study design, the user might have a clear need for the "raw_inclusion" date.

I lean towards including both, but naming them to "inclusion_date" and something like "unstable_date" or "_date" to make it clear which is the default (and also obscuring the non-default variable name so users will have to read the documentation to know what the variable is).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, what about initial_inclusion_date and inclusion_date? Just wondering how easily a user might interpret "unstable".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about "initial". It sounds like it's something that's before something else? Like e.g., the first inclusion event.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe inclusion_date and inclusion_date_with_insufficient_data?

4. **diabetes_type** The classified diabetes type

\*For more information on the "raw" versus "stable" inclusion date, see
the [Description of algorithm contents & logic](algorithm_logic.Rmd)
vignette.

<!-- TODO: Make sure this is the correct link - and add a link specific to the specific section where this is described -->

For an example, see below.

| PNR | stable_inclusion_date | raw_inclusion_date | diabetes_type |
|------------|-----------------------|--------------------|---------------|
| 0000000001 | 2020-01-01 | 2020-01-01 | T1D |
| 0000000004 | NULL | 1995-04-19 | T2D |

: Example rows of the `data.frame` output of the `osdc` package.

The individuals `0000000001` and `0000000004` have been classified as
having diabetes (`T1D` and `T2D`, respectively). `0000000004` is
classified as having type 1 diabetes (T1D) with an inclusion date of
`2020-01-01`. Since this date is within a time-period of sufficient data
coverage, the column `stable_inclusion_date` is populated with the same
date as `raw_inclusion_date`.

The individual in the second row, `0000000004` is classified as having
type 2 diabetes `T2D` with an inclusion date of `1995-19-04`. Since 1995
is within a time-period of insufficient data coverage,
`stable_inclusion_date` is `NULL`. However, `raw_inclusion_date` still
contains the inclusion date of this individual.

<!-- TODO: Specify the "stable" time-period: e.g., later than 1997 -->
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
18 changes: 18 additions & 0 deletions vignettes/images/classify-diabetes-type-functions.puml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
@startuml classify-diabetes-type-functions

skinparam defaultTextAlignment center

#darkgrey: Classify_diabetes_type;
!pragma useVerticalIf on
if (**Criteria 1**\nt1d_include_at_least_one_t1d_primary_diagnosis\n AND\nt1d_include_only_purchased_insulins) then (no)
if (**Criteria 2**\nt1d_include_majority_of_t1d_primary_diagnoses\nAND\nt1d_include_insulin_purchase_within_180_days_of_diagnosis\nAND\nt1d_include_two_thirds_of_purchased_gld_doses_are_insulin) then (no)
:Type 2;
detach
else (\nyes)
endif
else (\nyes)
endif
:Type 1;
detach

@enduml