src/R/wbd10.rmd

---
title: "Wilson's Burden of Disease (WBD) Codes"
output: rmarkdown::github_document
---

Expands Wilson's Burden of Disease (WBD) classification groups from the raw Excel file using the [cghrCodes](https://github.com/cghr-toronto/cghrCodes) package at CGHR.

## Libraries

```{r}
library(tidyverse)
library(readxl)
library(cghrCodes)
```

## Read Data

Read data from `../data` folder:

* `Version 10 - Final CGHR-GBD list 18 April 2013.xls`: original WBD file mappings provided by Wilson Suraweera
* `Version 10 - Final CGHR-GBD list 18 April 2013 RW.xls` manually cleaned WBD file by Richard Wen to be in a tabular structure

```{r}
raw_file <- "../data/Version 10 - Final CGHR-GBD list 18 April 2013_Updated 12Jan2023 RW.xls"

# Read sheets 4-7 for ddict, adult, neo, child sheets in order
raw_wbd <- list(
  ddict=read_excel(raw_file, sheet=4),
  adult=read_excel(raw_file, sheet=5),
  neo=read_excel(raw_file, sheet=6),
  child=read_excel(raw_file, sheet=7)
)
```

### Data Dictionary Preview

This sheet contains the column descriptions for each of the adult, neonate, and child sheets that contain the WBD codes.

```{r}
raw_wbd[["ddict"]]
```

### Adult WBD Codes Preview

This sheet contains the adult WBD codes.

```{r}
raw_wbd[["adult"]]
```

### Child WBD Codes Preview

This sheet contains the child WBD codes.

```{r}
raw_wbd[["child"]]
```

### Neonate WBD Codes Preview

This sheet contains the neonate WBD codes.

```{r}
raw_wbd[["neo"]]
```

## Combine WBD Codes

Combine WBD code sheets data for all age groups into a common dataframe by lower casing all column names to match.


```{r}
wbd <- list()

# Lower case col names
wbd[["adult"]] <- raw_wbd[["adult"]] %>% rename_all(tolower)
wbd[["child"]] <- raw_wbd[["child"]] %>% rename_all(tolower)
wbd[["neo"]] <- raw_wbd[["neo"]] %>% rename_all(tolower)

# Combine into common dataframe
wbd <- bind_rows(wbd)
wbd
```

## Expand ICD-10 Ranges

Expands the ICD-10 codes from the `icd10_range` column to create an ICD-10 code for each WBD code.

```{r}

# Filter for codex only
icd10towbda <- raw_wbd[["adult"]] %>% rename_all(tolower) %>% filter(kind == "codex")
icd10towbdc <- raw_wbd[["child"]] %>% rename_all(tolower) %>% filter(kind == "codex")
icd10towbdn <- raw_wbd[["neo"]] %>% rename_all(tolower) %>% filter(kind == "codex")

# Expand ICD 10 ranges
icd10towbda <- expandICD10Data(icd10towbda, icd10Column="icd10_range", expandColumn="icd10")
icd10towbdc <- expandICD10Data(icd10towbdc, icd10Column="icd10_range", expandColumn="icd10")
icd10towbdn <- expandICD10Data(icd10towbdn, icd10Column="icd10_range", expandColumn="icd10")

# Combine dataframes
icd10towbd <- bind_rows(icd10towbda, icd10towbdc, icd10towbdn)
as_tibble(icd10towbd)
```

## Data Dictionary

Preprocesses the data dictionary by lower casing the column names and values.

```{r}
ddict <- raw_wbd[["ddict"]] %>%
  rename_all(tolower) %>%
  mutate(column=tolower(column)) %>%
  add_row(column="icd10", description="ICD-10 code for the entity.")
ddict
```

## Save Data

Writes the original WBD data, expanded WBD data, and data dictionary to the `../data` folder.

```{r}
write_csv(ddict, "../data/wbd10_raw_ddict.csv", na="")
write_csv(wbd, "../data/wbd10_raw.csv", na="")
write_csv(icd10towbd, "../data/icd10towbd10_raw.csv", na="")
```