-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathwbd10.rmd
126 lines (88 loc) · 3.21 KB
/
wbd10.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
title: "Wilson's Burden of Disease (WBD) Codes"
output: rmarkdown::github_document
---
Expands Wilson's Burden of Disease (WBD) classification groups from the raw Excel file using the [cghrCodes](https://github.com/cghr-toronto/cghrCodes) package at CGHR.
## Libraries
```{r}
library(tidyverse)
library(readxl)
library(cghrCodes)
```
## Read Data
Read data from `../data` folder:
* `Version 10 - Final CGHR-GBD list 18 April 2013.xls`: original WBD file mappings provided by Wilson Suraweera
* `Version 10 - Final CGHR-GBD list 18 April 2013 RW.xls` manually cleaned WBD file by Richard Wen to be in a tabular structure
```{r}
raw_file <- "../data/Version 10 - Final CGHR-GBD list 18 April 2013_Updated 12Jan2023 RW.xls"
# Read sheets 4-7 for ddict, adult, neo, child sheets in order
raw_wbd <- list(
ddict=read_excel(raw_file, sheet=4),
adult=read_excel(raw_file, sheet=5),
neo=read_excel(raw_file, sheet=6),
child=read_excel(raw_file, sheet=7)
)
```
### Data Dictionary Preview
This sheet contains the column descriptions for each of the adult, neonate, and child sheets that contain the WBD codes.
```{r}
raw_wbd[["ddict"]]
```
### Adult WBD Codes Preview
This sheet contains the adult WBD codes.
```{r}
raw_wbd[["adult"]]
```
### Child WBD Codes Preview
This sheet contains the child WBD codes.
```{r}
raw_wbd[["child"]]
```
### Neonate WBD Codes Preview
This sheet contains the neonate WBD codes.
```{r}
raw_wbd[["neo"]]
```
## Combine WBD Codes
Combine WBD code sheets data for all age groups into a common dataframe by lower casing all column names to match.
```{r}
wbd <- list()
# Lower case col names
wbd[["adult"]] <- raw_wbd[["adult"]] %>% rename_all(tolower)
wbd[["child"]] <- raw_wbd[["child"]] %>% rename_all(tolower)
wbd[["neo"]] <- raw_wbd[["neo"]] %>% rename_all(tolower)
# Combine into common dataframe
wbd <- bind_rows(wbd)
wbd
```
## Expand ICD-10 Ranges
Expands the ICD-10 codes from the `icd10_range` column to create an ICD-10 code for each WBD code.
```{r}
# Filter for codex only
icd10towbda <- raw_wbd[["adult"]] %>% rename_all(tolower) %>% filter(kind == "codex")
icd10towbdc <- raw_wbd[["child"]] %>% rename_all(tolower) %>% filter(kind == "codex")
icd10towbdn <- raw_wbd[["neo"]] %>% rename_all(tolower) %>% filter(kind == "codex")
# Expand ICD 10 ranges
icd10towbda <- expandICD10Data(icd10towbda, icd10Column="icd10_range", expandColumn="icd10")
icd10towbdc <- expandICD10Data(icd10towbdc, icd10Column="icd10_range", expandColumn="icd10")
icd10towbdn <- expandICD10Data(icd10towbdn, icd10Column="icd10_range", expandColumn="icd10")
# Combine dataframes
icd10towbd <- bind_rows(icd10towbda, icd10towbdc, icd10towbdn)
as_tibble(icd10towbd)
```
## Data Dictionary
Preprocesses the data dictionary by lower casing the column names and values.
```{r}
ddict <- raw_wbd[["ddict"]] %>%
rename_all(tolower) %>%
mutate(column=tolower(column)) %>%
add_row(column="icd10", description="ICD-10 code for the entity.")
ddict
```
## Save Data
Writes the original WBD data, expanded WBD data, and data dictionary to the `../data` folder.
```{r}
write_csv(ddict, "../data/wbd10_raw_ddict.csv", na="")
write_csv(wbd, "../data/wbd10_raw.csv", na="")
write_csv(icd10towbd, "../data/icd10towbd10_raw.csv", na="")
```