diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd
index 76acd71..bbc6139 100644
--- a/vignettes/function-flow.Rmd
+++ b/vignettes/function-flow.Rmd
@@ -51,38 +51,55 @@ rules.
## Function flow
-The OSDC algorithm - and thereby, the osdc package - contains one main
-function that will classify individuals into those with either type 1 or
-type 2 diabetes using the Danish registers:
-`classify_diabetes_status()`. This function classifies those with
+The osdc package contains one main function that classifies individuals
+into those with either type 1 or type 2 diabetes using the Danish
+registers: `classify_diabetes()`. This function classifies those with
diabetes (type 1 or 2) based on the Danish registers described in the
-`vignette("design")`. All data sources are used as input for this
-function. The specific inclusion and exclusion details are also
-described in the `vignette("design")`.
+`vignette("design")` and `vignette("data-sources")`. All data sources
+are used as input for this function. The specific inclusion and
+exclusion details are also described in the `vignette("design")`.
This results in the functionality flow for classifying diabetes status
-seen below. All functions take a `data.frame` type object as input and
-outputs the same type of object as the input object (a `data.frame`
-type). For instance, if the input is a `data.table` object, the output
-will also be a `data.table`.
+seen below. This flow can be divided into two sections: extracting the
+diabetes population and classifying diabetes type which we will detail
+in the following sections.
+
+All functions take a `data.frame` type object as input and outputs the
+same type of object as the input object (a `data.frame` type). For
+instance, if the input is a `data.table` object, the output will also be
+a `data.table`.
![Flow of functions, as well as their required input registers, for
-classifying diabetes status using the `osdc` package. Light blue and
+classifying diabetes status using the osdc package. Light blue and
orange boxes represent filtering functions (inclusion and exclusion
events, respectively). Uncoloured boxes are helper functions that get or
extract a condition or joins data or function
outputs.](images/function-flow.png)
-## Inclusion events
+## Population extraction
+
+In the following sections, we describe the functions used to extract the
+diabetes population from the Danish registers. The functions are divided
+into inclusion and exclusion events, and the final diagnosis date is
+calculated based on these events.
+
+![Flow of functions, as well as their required input registers, for
+extracting the population with diabetes using the osdc package. Light
+blue and orange boxes represent filtering functions (inclusion and
+exclusion events, respectively). Uncoloured boxes are helper functions
+that get or extract a condition or joins data or function
+outputs.](images/function-flow-population.png)
+
+### Inclusion events
-### HbA1c tests above 48 mmol/mol
+#### HbA1c tests above 48 mmol/mol
The function `include_hba1c()` uses `lab_forsker` as the input data to
extract all events of tests above 48 mmol/mol.
-### Hospital diagnosis of diabetes
+#### Hospital diagnosis of diabetes
The function `include_diabetes_diagnoses()` uses the hospital contacts
from LPR2 and 3 to include all dates of diabetes diagnoses. Diabetes
@@ -97,14 +114,14 @@ This function contains two helper functions:
-### Diabetes-specific podiatrist services
+#### Diabetes-specific podiatrist services
The function `include_podiatrist_services()` uses `sysi` or `sssy` as
input to extract the dates of all diabetes-specific podiatrist services.
-### GLD purchases
+#### GLD purchases
The function `include_gld_purchases()` uses `lmdb` to extract the dates
of all GLD purchases (from 1997 onwards).
@@ -113,9 +130,9 @@ of all GLD purchases (from 1997 onwards).
-## Exclusion events
+### Exclusion events
-### HbA1c tests and GLD purchases during pregnancy
+#### HbA1c tests and GLD purchases during pregnancy
The function `exclude_pregnancy()` uses diagnoses from LPR2 or LPR3 as
input and is used to exclude both HbA1c tests and GLD purchases during
@@ -138,16 +155,14 @@ contains the following three helper functions:
-### Glucose-lowering brand drugs for weight loss
+#### Glucose-lowering brand drugs for weight loss
-The function `exclude_purchases_of_weight_loss_drugs()` uses REGISTER as
-input and excludes BRANDS.
+The function `exclude_wld_purchases()` uses lmdb as input and excludes
+the brand drugs Saxenda and Wegovy.
-
-
-### Metformin purchases for women below age 40
+#### Metformin purchases for women below age 40
The function `exclude_potential_pcos()` as input to exclude all
purchases of metformin by women below age 40 (i.e., \<= 39 years old) at
@@ -160,7 +175,7 @@ This function contains two helper functions:
-## Get diagnosis date
+### Get diagnosis date
The function `get_diagnosis_date()` combines the outputs from the
inclusion and exclusion functions to get the final diagnosis date.
@@ -173,4 +188,99 @@ together with `join_diagnosis_dates()`.
Finally, the dates outside of the data coverage period are dropped with
`drop_diagnosis_dates_outside_coverage()` to end with a final diagnosis
date. For details on this censoring based on periods with insufficient
-data coverage, see the `vignette("algorithm-logic")`.
+data coverage, see the `vignette("design")`.
+
+### Classifying the diabetes type
+
+The next step of the OSDC algorithm classifies individuals from the
+extracted diabetes population as having either T1D or T2D. As described
+in the `vignette("design")`, individuals not classified as T1D cases are
+classified as T2D cases.
+
+The output is a `data.frame` that includes one row per individual in the
+diabetes population: one column with their PNR, two columns with
+inclusion dates (one "stable" date and one "raw" date - see the
+`vignette("design")` for an elaboration on what that entails), and one
+column with the diabetes type.
+
+
+
+![Flow of functions for classifying diabetes status using the `osdc`
+package.](images/function-flow-classification.png)
+
+#### Type 1 classification
+
+The details for the classification of type 1 diabetes is described in
+`vignette("design")`. To classify whether an individual has T1D, the
+OSDC algorithm includes the following criteria:
+
+1. `get_t1d_primary_diagnosis()`, which relies on the hospital
+ diagnoses extracted from `lpr_diag` (LPR2) and `diagnoser` (LPR3) in
+ the previous steps.
+2. `get_only_insulin_purchases()` which relies on the GLD purchases
+ from Lægemiddelsdatabasen to get patients where all GLD purchases
+ are insulin only.
+3. `get_majority_of_t1d_diagnoses()` (as compared to T2D diagnoses)
+ which again relies on primary hospital diagnoses from LPR.
+4. `get_insulin_purchase_within_180_days()` which relies on both
+ diagnosis from LPR and GLD purchases from Lægemiddelsdatabasen.
+5. `get_insulin_is_two_thirds_of_gld_doses` which relies on the GLD
+ purchases from Lægemiddelsdatabasen.
+
+Note the following hierarchy in first function above: First, the
+function checks whether the individual has primary diagnoses from
+endocrinological specialty. If that's the case for a given person, the
+check of whether they have a majority of T1D primary diagnoses are based
+on data from endocrinological specialty. If that's not the case, the
+check will be based on primary diagnoses from medical specialties.
+
+#### Type 2 classification
+
+As described in the `vignette("design")`, individuals not classified as
+type 1 cases are classified as type 2 cases.
+
+## Output
+
+The output of the OSDC algorithm is a `data.frame` which includes four
+columns:
+
+1. **PNR**: The pseudonymised social security number of individuals in
+ the diabetes population (one row per individual)
+2. **stable_inclusion_date**: The *stable* inclusion date (i.e., the
+ raw date mutated so only individuals included in the time-period
+ where data coverage is sufficient to make incident cases
+ reliable)[^1]
+
+3. **raw_inclusion_date**: The *raw* inclusion date (i.e., the date of
+ the second inclusion event as described in the [Extracting the
+ diabetes population](#extracting-diabetes-population) section above)
+4. **diabetes_type** The classified diabetes type
+
+[^1]: For more information on the "raw" versus "stable" inclusion date,
+ see `vignette("design")`.
+
+
+
+For an example, see below.
+
+| PNR | stable_inclusion_date | raw_inclusion_date | diabetes_type |
+|------------|-----------------------|--------------------|---------------|
+| 0000000001 | 2020-01-01 | 2020-01-01 | T1D |
+| 0000000004 | NULL | 1995-04-19 | T2D |
+
+: Example rows of the `data.frame` output of the osdc package.
+
+The individuals `0000000001` and `0000000004` have been classified as
+having diabetes (`T1D` and `T2D`, respectively). `0000000004` is
+classified as having type 1 diabetes (T1D) with an inclusion date of
+`2020-01-01`. Since this date is within a time-period of sufficient data
+coverage, the column `stable_inclusion_date` is populated with the same
+date as `raw_inclusion_date`.
+
+The individual in the second row, `0000000004` is classified as having
+type 2 diabetes `T2D` with an inclusion date of `1995-19-04`. Since 1995
+is within a time-period of insufficient data coverage,
+`stable_inclusion_date` is `NULL`. However, `raw_inclusion_date` still
+contains the inclusion date of this individual.
+
+
diff --git a/vignettes/images/function-flow-classification.png b/vignettes/images/function-flow-classification.png
new file mode 100644
index 0000000..887bcd5
Binary files /dev/null and b/vignettes/images/function-flow-classification.png differ
diff --git a/vignettes/images/function-flow-classification.puml b/vignettes/images/function-flow-classification.puml
new file mode 100644
index 0000000..9851a8f
--- /dev/null
+++ b/vignettes/images/function-flow-classification.puml
@@ -0,0 +1,34 @@
+@startuml function-flow-classification
+!theme cerulean-outline
+
+
+'Diabetes type classification
+
+ action "get_diagnosis_date()" as diagnosis_date
+
+ rectangle Classification {
+ action "get_has_t1d_primary_diagnosis()" as t1d_diagnosis
+ action "get_only_insulin_purchases()" as only_insulins
+ action "get_majority_of_t1d_primary_diagnoses()" as t1d_diagnosis_majority
+ action "get_insulin_purchase_within_180_days()" as insulin_within_180_days
+ action "get_insulin_is_two_thirds_of_gld_doses()" as insulin_is_two_thirds
+ }
+
+ diagnosis_date --> t1d_diagnosis
+ t1d_diagnosis -l-> only_insulins
+ only_insulins -d-> t1d_diagnosis_majority
+ t1d_diagnosis_majority -r-> insulin_within_180_days
+ insulin_within_180_days -r-> insulin_is_two_thirds
+
+
+@enduml
diff --git a/vignettes/images/function-flow-population.png b/vignettes/images/function-flow-population.png
new file mode 100644
index 0000000..42932d3
Binary files /dev/null and b/vignettes/images/function-flow-population.png differ
diff --git a/vignettes/images/function-flow-population.puml b/vignettes/images/function-flow-population.puml
new file mode 100644
index 0000000..885081e
--- /dev/null
+++ b/vignettes/images/function-flow-population.puml
@@ -0,0 +1,92 @@
+@startuml function-flow-population
+!theme cerulean-outline
+
+
+hide <> stereotype
+hide <> stereotype
+
+'Diabetes population extraction
+ together {
+ database sssy
+ database sysi
+ database lpr_diag
+ database lpr_adm
+ database lmdb
+ database lab_forsker
+ database kontakter
+ database diagnoser
+ database bef
+ }
+ together {
+ action "join_lpr2()" as lpr2
+ action "join_lpr3()" as lpr3
+ }
+
+ action "get_potential_pcos()" as pcos
+ action "get_wld_purchases()" as wld
+
+ together {
+ action "exclude_pregnancy()" as ex_pregnancy <>
+ action "exclude_wld_purchases()" as ex_wld <>
+ action "exclude_potential_pcos()" as ex_pcos <>
+ }
+ together {
+ action "include_diabetes_diagnoses()" as in_diagnoses <>
+ action "include_hba1c()" as in_hba1c <>
+ action "include_podiatrist_services()" as in_podiatrist <>
+ action "include_gld_purchases()" as in_gld <>
+ action "get_pregnancy_dates()" as pregnancy
+ }
+
+ action "join_inclusion()" as join_inclusion
+ action "get_diagnosis_date()" as diagnosis_date
+
+'join lpr
+ lpr_diag --> lpr2
+ lpr_adm --> lpr2
+ kontakter --> lpr3
+ diagnoser --> lpr3
+
+'inclusion: podiatrist services
+ sssy --> in_podiatrist
+ sysi --> in_podiatrist
+ in_podiatrist --> join_inclusion
+
+'inclusion: hba1c
+ lab_forsker --> in_hba1c
+ in_hba1c --> ex_pregnancy
+ ex_pregnancy --> join_inclusion
+
+'inclusion: gld purchases
+ lmdb --> in_gld
+ in_gld --> ex_pcos
+ ex_pcos --> ex_wld
+ ex_wld --> ex_pregnancy
+
+'inclusion: diabetes diagnoses
+ lpr2 --> in_diagnoses
+ lpr3 --> in_diagnoses
+ in_diagnoses --> join_inclusion
+
+'helper functions
+ lpr2 --> pregnancy
+ lpr3 --> pregnancy
+ pregnancy --> ex_pregnancy
+ lmdb --> wld
+ wld --> ex_wld
+ bef --> pcos
+ in_gld --> ex_pcos
+ pcos --> ex_pcos
+ join_inclusion --> diagnosis_date
+
+@enduml
\ No newline at end of file
diff --git a/vignettes/images/function-flow.png b/vignettes/images/function-flow.png
index 22cc065..3f30c2c 100644
Binary files a/vignettes/images/function-flow.png and b/vignettes/images/function-flow.png differ
diff --git a/vignettes/images/function-flow.puml b/vignettes/images/function-flow.puml
index b3b3c51..e46d901 100644
--- a/vignettes/images/function-flow.puml
+++ b/vignettes/images/function-flow.puml
@@ -1,86 +1,8 @@
@startuml function-flow
-!theme cerulean-outline
-
-
-hide <> stereotype
-hide <> stereotype
-
-card classify_diabetes_status() as cd {
- together {
- database sssy
- database sysi
- database lpr_diag
- database lpr_adm
- database lmdb
- database lab_forsker
- database kontakter
- database diagnoser
- database bef
- }
-
- action "get_pregnancy_dates()" as pregnancy
- action "get_potential_pcos()" as pcos
- action "get_diagnosis_date()" as diagnosis_date
- action "join_lpr2()" as lpr2
- action "join_lpr3()" as lpr3
-
- together {
- action "exclude_pregnancy()" as ex_pregnancy <>
- action "exclude_purchases_of_weight_loss_drugs()" as ex_wld <>
- action "exclude_potential_pcos()" as ex_pcos <>
- }
-
- together {
- action "include_hba1c()" as in_hba1c <>
- action "include_diabetes_diagnosis()" as in_diagnosis <>
- action "include_podiatrist_services()" as in_podiatrist <>
- action "include_purchases_gld()" as in_gld <>
- }
-
- lpr_diag --> lpr2
- lpr_adm --> lpr2
- kontakter --> lpr3
- diagnoser --> lpr3
-
- lab_forsker --> in_hba1c
- in_hba1c --> ex_pregnancy
-
- lpr2 --> pregnancy
- lpr3 --> pregnancy
- pregnancy -> ex_pregnancy
-
- lpr2 --> in_diagnosis
- lpr3 --> in_diagnosis
-
- sssy --> in_podiatrist
- sysi --> in_podiatrist
-
- lmdb --> in_gld
- in_gld --> ex_pregnancy
- in_gld --> ex_wld
-
- bef --> pcos
- in_gld --> ex_pcos
- pcos --> ex_pcos
-
- ex_wld --> diagnosis_date
- ex_pregnancy --> diagnosis_date
- ex_pcos --> diagnosis_date
- in_podiatrist --> diagnosis_date
- in_diagnosis --> diagnosis_date
+card classify_diabetes() as cd {
+ !include function-flow-population.puml
+ !include function-flow-classification.puml
}
-@enduml
+
+@enduml
\ No newline at end of file