Skip to content

Commit

Permalink
Create individual file (#715)
Browse files Browse the repository at this point in the history
* Until L594

* Converted until L677

* Until L731

* Update documentation

* Remove test ref

* Style code

* WIP writing functions to fill postcode in line with previous DOB functions

* Update documentation

* implement quick fix for running 22/23

* Style code

* Fix missed comma

* Exclude DD code for now - TEMP fix

* Correct/rename variables

* Style code

* Include NSU in `check_year_valid`

* Update `check_year_valid_tests`

* Update documentation

* Update `add_nsu_cohort` to pick up years valid

* Style code

* remove extra `!`

* Exclude `cij_delay`

* Style code

* improve `max_no_inf()`

* Use pmin/max instead of `rowwise`

* improve `min_no_inf()`

* Use n_distinct(cij_marker)

* deal with distinct(ch_chi_cis)

* use n_distinct(ooh_case_id)

* remove `find_non_duplicates`

* Use dplyr::if_else()

Co-authored-by: James McMahon <[email protected]>

* Fix typo in `ooh_covid_assessment`

* Move `ooh_case_id` to aggregate

* Use `slfhelper::ltc_vars`

* Remove `clean_up_dob`
Already done in `correct_demographics`

* Update documentation

* [check-spelling] Update metadata

Update for https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/4981058958/attempts/1
Accepted in #654 (comment)

Signed-off-by: check-spelling-bot <[email protected]>

* Use `start_next_fy_quarter` in place of rowwise

* Style code

* Use `compute_mid_year_age`

* convert code into data.table for improving speed

* Update `get_fy_dates`function

* remove `date_from_fy`, use `get_fy_dates`

* Update documentation

* Remove `clean_up_postcode` function
Not needed anymore

* Remove non duplicates function/move to aggregate

* Style code

* Update documentation

* Add time stamps to `create_individual_file`

* Style code

* remove `clean_up_postcode`

* Deal with ch cis episodes

* Style code

* add .data$

* Turn ch aggregate into a data table

* Style code

* use ch_chi_cis

* remove `preventable_admissions` from aggregate

* exclude `hh_in_fy` for now

* Style code

* Test - exclude `sc_` vars from aggregate

* Style code

* Exclude for now

* exclude for now

* Style code

* automate `check_year_valid`

* Return dummy file path for NSU not valid

* Style code

* Fix brackets in aggregate

* TEMP - exclude variables

* Use `phsmethods::sex_from_chi`

* Style code

* Add ungroup()

* lowercase dob

* Remove as.data.table

* rewrite aggregate_by_chi with data.table

* Style code

* minor changes

* Use the updated function

* to properly import data.table

* remove redundant columns dob postcode and gpprac

* minor changes to remove redundant postcode gpprac columns

* Style code

* rename columns with small letters

* Style code

* newaggregate_ch_episodes

* Update documentation

* add functions to replace regular expressions to select column/variables

* Update documentation

* Style code

* minor changes

* add a missing variable, cij_delay

* Style code

* add variables cij_delay, preventable_beddays

* add missing variables health_net_cost, health_net_costincdnas, and cmh, dd sds columns

* Style code

* add more variables needed

* Style code

* Update R/link_delayed_discharge_eps.R

* Style code

* amend costs

* Style code

* Revert "amend costs"

This reverts commit 8048e68.

* Add DN and cij_delay back in

* fix the issue

* Style code

* remove running in chunks

* Style code

* Update tests to include missing variables

* Remove unnecessary comma

* fix the bug of preventable_beddays

* Update documentation

* fix total ae_attendances

* fix the bug of preventable_admissions

* fix the bug of hbrescode etc

* minor fix

* minor fix

* Style code

* Fix some warnings being produced by the tests

* Fix failing test

* remove running in chunks

* Style code

* Update the targets config to use `timestamp_positives` as the default reporter

* fix the bug of preventable_beddays

* Update documentation

* fix total ae_attendances

* fix the bug of preventable_admissions

* fix the bug of hbrescode etc

* minor fix

* minor fix

* Style code

* fix home care cost

* add ipdc to fix maternity

* fix preventable addmission and care home cost

* fix preventable_admissions and calculate preventable_beddays here

* add monthly_beddays and yearstay to dd

* Style code

* fix preventable_admissions and preventable_beddays

* Style code

* include parameter for write to disk/year

* Add lookups to indiv file creation pipeline

* include parameter for write to disk/year

* fix delay discharge beddays and yearstay

* Style code

* fix preventable issues

* Style code

* fix the issue of preventable stuff

* Style code

* Update R/aggregate_by_chi_zihao.R

* Update documentation

* Fix minor typos

* [check-spelling] Update metadata

Update for https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/5443581387/attempts/1
Accepted in #709 (comment)

Signed-off-by: check-spelling-bot <[email protected]>

* Remove some obsolete comments

* Remove some unnecessary brackets

* Reformat some code

* Use some `dplyr` functions for readability

* Style code

* Update R/link_delayed_discharge_eps.R

* Style code

* Remove some code which is no longer needed

We now match on these variables after

* Work out preventable admissions with similar indicators

* Lowercase variable names

* Restore `cij_delay`

* Restore DN variables

* Tidy the code and use integers where possible

* Supply `year` as a parameter to `clean_up_ch`

* Supply `year` as a parameter to `clean_individual_file`

* Only keep required variables to save memory

* Rename the parameter so the documentation works

* Use `setnames` to change names to lower

* Remove unneeded code

* Update file path name

* Trim the return code

* Some fixes

* Correctly compute `ooh_cases`

* Update documentation

* Style code

* [check-spelling] Update metadata

Update for https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/5466392495/attempts/1
Accepted in #719 (comment)

Signed-off-by: check-spelling-bot <[email protected]>

* Add targets for the individual file

* Fix missed pipe

* Style code

* Update some targets to only run once a week

* Make the deaths lookup unique

* Add `year` back to the individual file

* Remove `cost_total_net_inc_dnas` from the indiv file  (#737)

* Drop `cost_total_net_inc_dnas`

* Rename `health_net_costincdnas` to `health_net_cost_inc_dnas`

* Join slf lookups onto individual file (#724)

* Create function for matching on slf lookups

* fix some build warnings

* Add `hbrescode` to select list

* Pass lookups as parameters/deal with hbrescode

* Update R/create_individual_file.R

---------

Co-authored-by: James McMahon <[email protected]>

* Join sc client variables onto individual file (#740)

* New function for matching sc client to indiv file

* Style code

* [check-spelling] Update metadata

Update for https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/5555048903/attempts/1
Accepted in #740 (comment)

Signed-off-by: check-spelling-bot <[email protected]>

* Code layout

* Style code

* Remove redundant sc variables

Co-authored-by: James McMahon <[email protected]>

* Update comments

Co-authored-by: James McMahon <[email protected]>

* Update comments

Co-authored-by: James McMahon <[email protected]>

* Sort order of parameters to pass `data` first

* Update documentation

* Style code

* Update R/create_individual_file.R

* Update R/create_individual_file.R

* Update R/create_individual_file.R

* Style code

---------

Signed-off-by: check-spelling-bot <[email protected]>
Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: James McMahon <[email protected]>
Co-authored-by: Moohan <[email protected]>

* Update documentation

* Output the individual file with `anon_chi` (#748)

* Make episode file output with `anon_chi`

I've added this as a parameter so you can output CHI if desired, but the default is for anon_chi.

For the tests, it swaps back to CHI as there are some tests which specifically us the CHI number.

* Output `anon_chi` in the individual file

* Style code

* Sort variables with issues `hbrescode` (HB2018), `datazone` and `hscp` (#746)

* rename `hscp` to `hscp2018`

* rename `spd` as `slf_pc_lookup`

* Add `datazone2011` to coalesce code

* Rename `datazone` to `datazone2011`

* include `datazone2011_old` in selections

* Update R/fill_geographies.R

---------

Co-authored-by: James McMahon <[email protected]>

* Fix for anon_chi being NA

---------

Co-authored-by: Moohan <[email protected]>
Co-authored-by: Jennit07 <[email protected]>

---------

Signed-off-by: check-spelling-bot <[email protected]>
Co-authored-by: Mandy Norrbo <[email protected]>
Co-authored-by: jr-mandy <[email protected]>
Co-authored-by: shintoLampgit config --global user.email [email protected]  git config --global user.name shintoLamp <[email protected]>
Co-authored-by: shintoLamp <[email protected]>
Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Jennifer Thom <[email protected]>
Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Zihao Li <[email protected]>
Co-authored-by: lizihao-anu <[email protected]>
Co-authored-by: Moohan <[email protected]>
Co-authored-by: Zihao Li <[email protected]>
  • Loading branch information
12 people authored Jul 19, 2023
1 parent 74109bf commit 8db3769
Show file tree
Hide file tree
Showing 45 changed files with 1,818 additions and 16 deletions.
13 changes: 13 additions & 0 deletions .github/actions/spelling/expect.txt
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ cmh
CNWs
commhosp
congen
costincdnas
costmonthnum
costsfy
covr
Expand All @@ -45,6 +46,7 @@ dbconnect
dbplyr
deathdiag
demog
dfc
disch
dischloc
dischto
Expand All @@ -70,6 +72,7 @@ fyyear
geogs
ggplot
GLS
gls
gms
GPOo
gpprac
Expand All @@ -86,6 +89,7 @@ hhg
hjust
hms
homecare
homev
hscp
hscpnames
IDPC
Expand All @@ -102,6 +106,8 @@ keyring
keytime
keytimex
kis
lgl
kis
los
ltc
ltcs
Expand All @@ -116,6 +122,7 @@ multiday
multisession
multistaff
NAs
newcons
nhs
nhshosp
NRS
Expand Down Expand Up @@ -147,7 +154,9 @@ purrr
quickstart
Rbuildignore
rcmdcheck
rdd
rds
reabl
reablement
readcode
readr
Expand All @@ -164,8 +173,12 @@ rspm
RStudio
rstudioapi
Rtype
SDcols
seealso
selfharm
setkeyv
setnafill
setnames
Siar
sigfac
simd
Expand Down
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,8 @@ Imports:
stringr (>= 1.5.0),
tibble (>= 3.2.1),
tidyr (>= 1.3.0),
tidyselect (>= 1.2.0)
tidyselect (>= 1.2.0),
zoo (>= 1.8.0)
Suggests:
covr (>= 3.6.1),
roxygen2 (>= 7.2.3),
Expand Down
3 changes: 3 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ export(convert_hscp_to_hscpnames)
export(convert_numeric_to_date)
export(convert_sending_location_to_lca)
export(convert_year_to_fyyear)
export(create_individual_file)
export(create_service_use_cohorts)
export(end_fy)
export(end_fy_quarter)
Expand Down Expand Up @@ -160,6 +161,8 @@ export(start_fy)
export(start_fy_quarter)
export(start_next_fy_quarter)
export(write_file)
importFrom(data.table,.N)
importFrom(data.table,.SD)
importFrom(magrittr,"%>%")
importFrom(readr,col_character)
importFrom(readr,col_date)
Expand Down
215 changes: 215 additions & 0 deletions R/aggregate_by_chi_zihao.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
#' Aggregate by CHI
#'
#' @description Aggregate episode file by CHI to convert into
#' individual file.
#'
#' @importFrom data.table .N
#' @importFrom data.table .SD
#'
#' @inheritParams create_individual_file
aggregate_by_chi_zihao <- function(episode_file) {
cli::cli_alert_info("Aggregate by CHI function started at {Sys.time()}")

# Convert to data.table
data.table::setDT(episode_file)

# Ensure all variable names are lowercase
data.table::setnames(episode_file, stringr::str_to_lower)

# Sort the data
data.table::setkeyv(
episode_file,
c(
"chi",
"record_keydate1",
"keytime1",
"record_keydate2",
"keytime2"
)
)

data.table::setnames(
episode_file,
c(
"ch_chi_cis", "cij_marker", "ooh_case_id"
# ,"hh_in_fy"
),
c(
"ch_cis_episodes", "cij_total", "ooh_cases"
# ,"hl1_in_fy"
)
)

# column specification, grouped by chi
# columns to select last
cols2 <- c(
"postcode",
"dob",
"gpprac",
vars_start_with(episode_file, "sc_")
)
# columns to count unique rows
cols3 <- c(
"ch_cis_episodes",
"cij_total",
"cij_el",
"cij_non_el",
"cij_mat",
"cij_delay",
"ooh_cases",
"preventable_admissions"
)
# columns to sum up
cols4 <- c(
vars_end_with(
episode_file,
c(
"episodes",
"beddays",
"cost",
"attendances",
"attend",
"contacts",
"hours",
"alarms",
"telecare",
"paid_items",
"advice",
"homev",
"time",
"assessment",
"other",
"dn",
"nhs24",
"pcc",
"_dnas"
)
),
vars_start_with(
episode_file,
"sds_option"
),
"health_net_cost_inc_dnas"
)
cols4 <- cols4[!(cols4 %in% c("ch_cis_episodes"))]
# columns to select maximum
cols5 <- c("nsu", vars_contain(episode_file, c("hl1_in_fy")))
data.table::setnafill(episode_file, fill = 0L, cols = cols5)
# compute
individual_file_cols1 <- episode_file[,
.(gender = mean(gender)),
by = "chi"
]
individual_file_cols2 <- episode_file[,
.SD[.N],
.SDcols = cols2,
by = "chi"
]
individual_file_cols3 <- episode_file[,
lapply(.SD, function(x) {
data.table::uniqueN(x, na.rm = TRUE)
}),
.SDcols = cols3,
by = "chi"
]
individual_file_cols4 <- episode_file[,
lapply(.SD, function(x) {
sum(x, na.rm = TRUE)
}),
.SDcols = cols4,
by = "chi"
]
individual_file_cols5 <- episode_file[,
lapply(.SD, function(x) max(x, na.rm = TRUE)),
.SDcols = cols5,
by = "chi"
]
individual_file_cols6 <- episode_file[,
.(
preventable_beddays = ifelse(
max(cij_ppa, na.rm = TRUE),
max(cij_end_date) - min(cij_start_date),
NA_real_
)
),
# cij_marker has been renamed as cij_total
by = c("chi", "cij_total")
]
individual_file_cols6 <- individual_file_cols6[,
.(
preventable_beddays = sum(preventable_beddays, na.rm = TRUE)
),
by = "chi"
]

individual_file <- dplyr::bind_cols(
individual_file_cols1,
individual_file_cols2[, chi := NULL],
individual_file_cols3[, chi := NULL],
individual_file_cols4[, chi := NULL],
individual_file_cols5[, chi := NULL],
individual_file_cols6[, chi := NULL]
)

# convert back to tibble
return(dplyr::as_tibble(individual_file))
}


#' select columns ending with some patterns
#' @describeIn select columns based on patterns
vars_end_with <- function(data, vars, ignore_case = FALSE) {
names(data)[stringr::str_ends(
names(data),
stringr::regex(paste(vars, collapse = "|"),
ignore_case = ignore_case
)
)]
}

#' select columns starting with some patterns
#' @describeIn select columns based on patterns
vars_start_with <- function(data, vars, ignore_case = FALSE) {
names(data)[stringr::str_starts(
names(data),
stringr::regex(paste(vars, collapse = "|"),
ignore_case = ignore_case
)
)]
}

#' select columns contains some characters
#' @describeIn select columns based on patterns
vars_contain <- function(data, vars, ignore_case = FALSE) {
names(data)[stringr::str_detect(
names(data),
stringr::regex(paste(vars, collapse = "|"),
ignore_case = ignore_case
)
)]
}

#' Aggregate CIS episodes
#'
#' @description Aggregate CH variables by CHI and CIS.
#'
#' @inheritParams create_individual_file
aggregate_ch_episodes_zihao <- function(episode_file) {
cli::cli_alert_info("Aggregate ch episodes function started at {Sys.time()}")

# Convert to data.table
data.table::setDT(episode_file)

# Perform grouping and aggregation
episode_file <- episode_file[, `:=`(
ch_no_cost = max(ch_no_cost),
ch_ep_start = min(record_keydate1),
ch_ep_end = max(ch_ep_end),
ch_cost_per_day = mean(ch_cost_per_day)
), by = c("chi", "ch_chi_cis")]

# Convert back to tibble if needed
episode_file <- tibble::as_tibble(episode_file)

return(episode_file)
}
Loading

0 comments on commit 8db3769

Please sign in to comment.