Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SC Demographics and SDS #900

Merged
merged 21 commits into from
Feb 7, 2024
Merged

SC Demographics and SDS #900

merged 21 commits into from
Feb 7, 2024

Conversation

SwiftySalmon
Copy link
Collaborator

Complete rewrite of SC Demographics.

Changes include:

  • only using chi from chi,
  • filling missing chi in if possible,
  • latest submissing is no longer accurate, so added code to choose the latest and made changes to replace_sc_id_with_latest function
  • now selects latest sc id across all social care sections, not just whatever section is being run. will be consistent across sds, ch, hc, and at (hopefully)
  • cases where chi is missing were previously assigned the same social care id. This is fixed so it's a unique social care id allowing analysis to be done with these cases if needed.

Things not fixed:

  • multiple chis per social care id. Data Management aware and working on it
  • cases where start date is after period end date get removed in SDS (n = 4) These one are up to DM to contact local authority and as to resubmit

I checked sds output against the social care publication and I am happy with it. Obviously demographics code might change again once I look at other sections.

SwiftySalmon and others added 13 commits January 9, 2024 14:17
different variables - removed extract date as not accurate, using chi over upi after discussion with social care data management. Added in date of death just for fun.
removed a lot of the submitted variables and instead using chi variables from chi seeding. Other changes:
- Fill in missing values,
- create flag for latest social care id (one from database is not accurate), this makes sure that each chi only has ONE sc id as the latest to stop it creating duplicates
- change postcode to choose chi over submitted
…Scotland/source-linkage-files into social-care-investigation
No major changes - only how demographics is matched on and how latest social care id is selected
Merge branch 'social-care-investigation' of github.com:Public-Health-Scotland/source-linkage-files into social-care-investigation

# Conflicts:
#	R/process_lookup_sc_demographics.R
…Scotland/source-linkage-files into social-care-investigation

This comment has been minimized.

@SwiftySalmon SwiftySalmon changed the base branch from development to mar-23-update January 24, 2024 13:07

This comment has been minimized.

This comment has been minimized.

@Jennit07
Copy link
Collaborator

Made a few comments, I was confused in some points of the code so just made a few comments as i went along. Posit was really slow for me today but i managed to run through the code and it was all working

This comment has been minimized.

This comment has been minimized.

R/replace_sc_id_with_latest.R Outdated Show resolved Hide resolved
R/replace_sc_id_with_latest.R Outdated Show resolved Hide resolved
Copy link

github-actions bot commented Feb 7, 2024

@check-spelling-bot Report

🔴 Please review

See the 📂 files view, the 📜action log, or 📝 job summary for details.

Unrecognized words (3)

commiting
stylr
submissing

To accept these unrecognized words as correct, you could run the following commands

... in a clone of the [email protected]:Public-Health-Scotland/source-linkage-files.git repository
on the social-care-investigation branch (ℹ️ how do I use this?):

curl -s -S -L 'https://raw.githubusercontent.com/check-spelling/check-spelling/main/apply.pl' |
perl - 'https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/7816301070/attempts/1'

OR

To have the bot accept them for you, reply quoting the following line:
@check-spelling-bot apply updates.

Available 📚 dictionaries could cover words (expected and unrecognized) not in the 📘 dictionary

This includes both expected items (235) from .github/actions/spelling/expect.txt and unrecognized words (3)

Dictionary Entries Covers Uniquely
cspell:k8s/dict/k8s.txt 153 1
cspell:npm/dict/npm.txt 302 1

Consider adding them (in .github/workflows/spelling.yml) for uses: check-spelling/check-spelling@main in its with:

      with:
        extra_dictionaries:
          cspell:k8s/dict/k8s.txt
          cspell:npm/dict/npm.txt

To stop checking additional dictionaries, add (in .github/workflows/spelling.yml) for uses: check-spelling/check-spelling@main in its with:

check_extra_dictionaries: ''
Pattern suggestions ✂️ (1)

You could add these patterns to .github/actions/spelling/patterns.txt:

# Automatically suggested patterns
# hit-count: 6 file-count: 2
# Compiler flags
(?:^|[\t ,"'`=(])-[DPWXYLlf](?=[A-Z]{2,}|[A-Z][a-z]|[a-z]{2,})

Errors (3)

See the 📂 files view, the 📜action log, or 📝 job summary for details.

❌ Errors Count
ℹ️ candidate-pattern 2
❌ ignored-expect-variant 13
ℹ️ no-newline-at-eof 1

See ❌ Event descriptions for more information.

If the flagged items are 🤯 false positives

If items relate to a ...

  • binary file (or some other file you wouldn't want to check at all).

    Please add a file path to the excludes.txt file matching the containing file.

    File paths are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your files.

    ^ refers to the file's path from the root of the repository, so ^README\.md$ would exclude README.md (on whichever branch you're using).

  • well-formed pattern.

    If you can write a pattern that would match it,
    try adding it to the patterns.txt file.

    Patterns are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your lines.

    Note that patterns can't match multiline strings.

Copy link
Collaborator

@Jennit07 Jennit07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran through the code and it works fine. Logic looks good. Happy to approve/merge!

@Jennit07 Jennit07 merged commit 744bbc0 into mar-23-update Feb 7, 2024
1 check passed
@Jennit07 Jennit07 deleted the social-care-investigation branch February 7, 2024 14:28
Jennit07 added a commit that referenced this pull request Mar 26, 2024
* Remove redundant code

* Update documentation

* Style code

* Reorder when we match on client variables
This was causing NSUs to show a social care id. This now resolves this.

* Update documentation

* Style code

* Revert "Update logic to use end of Quarter"

This reverts commit 004e831.

* Style code

* Update documentation

* add check comment (TO DO for this PR)

* Remove `check_quarter_format` function

* Remove `check_quarter_format`

* Add chi parameter to `create_demog_test_flags`

* Style code

* Use CHI parameter for ep/indiv tests

* Use CHI parameter for extract tests (chi)

* Change test sheet names to lowercase

* Change date to lowercase

* Update documentation

* Update documentation

* Update documentation

* Style code

* Fix pick variables
This was not taking the correct variables, leading to NSUs being assigned psychiatry

* SC Demographics and SDS (#900)

* Style code

* # read in sc demographics

different variables - removed extract date as not accurate, using chi over upi after discussion with social care data management. Added in date of death just for fun.

* social care demographics first draft

removed a lot of the submitted variables and instead using chi variables from chi seeding. Other changes:
- Fill in missing values,
- create flag for latest social care id (one from database is not accurate), this makes sure that each chi only has ONE sc id as the latest to stop it creating duplicates
- change postcode to choose chi over submitted

* Style code

* had a github error? Not sure what happened but commiting first draft of sc demographics

* Style code

* first draft sds.
No major changes - only how demographics is matched on and how latest social care id is selected

* Update documentation

* demographics - add sending location to group by

* Style code

* Update documentation

* Added ungroup()

* Remove comments

* Remove comments

* Style code

---------

Co-authored-by: SwiftySalmon <[email protected]>
Co-authored-by: marjom02 <[email protected]>
Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Zihao Li <[email protected]>

* Sc all at speedup (#904)

* speed up process_sc_all_alarms_telecare function with data.table package

* Update documentation

---------

Co-authored-by: lizihao-anu <[email protected]>
Co-authored-by: Megan McNicol <[email protected]>
Co-authored-by: Jennit07 <[email protected]>

* Add case_when statement for `high_cc` cohort

* Bug - `high_cc` in demographic cohort showing `NAs` instead of `TRUE/FALSE` (#911)

Add case_when statement for `high_cc` cohort

* added a casewhen to update property type description for homelessness

* Update documentation

* Style code

* Bug - deal with missing variables (#914)

* Add missing sc variables for no sc data

* Fix code for including `_inc_dna` variables

* Remove commented line

* Bug - Fix get pop path failing and preventing the indiv file from running.  (#913)

Fix bug - pop file paths breaking indiv file

* correct file hscp file path

* Update process_sc_all_home_care.R

A small issue was identified when running targets. Linked with changes to the function `fix_sc_end_dates()`

* Update process_sc_all_alarms_telecare.R

* remove duplicate columns

* Fix targets (#892)

* fix sc_client_lookup sc_send_lca

* fix an issue of get_pop_path

* Style code

* fix the rest of get_pop_path from get_datazone_pop_path

* Update documentation

* fix sc_send_lca

* add missing year column

* explicitly specify the argument year to avoid corruption of targets

* Update documentation

* new data pipeline with targets
remove create_individual_files from targets and append it to run_targets script

* minor changes

* Style code

* undo sc_send_lca bit

* Update targets scripts

* Remove top level targets scripts

---------

Co-authored-by: lizihao-anu <[email protected]>
Co-authored-by: Megan McNicol <[email protected]>
Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Jennifer Thom <[email protected]>

* remove cases that start date is later than end date

* Update Refs for March24 SLF update

* 758 investigate extracts to identify areas of code which can be cut down for processing times (#899)

* re-writing process_sc_all sds and alarm_telecare with data.table to improve the speed

* Update documentation

* Style code

* changes in line with new process_sc_all_sds dplyr version

* Style code

* remove duplicate columns

* remove duplicated columns

---------

Co-authored-by: lizihao-anu <[email protected]>
Co-authored-by: Megan McNicol <[email protected]>

* Update homelessness completeness path

* Update check_year_valid function

* 920 issues with file permissions need constant monitoring (#921)

* set a correct file permission

* update descriptions in process_tests function

* Update documentation

---------

Co-authored-by: lizihao-anu <[email protected]>

* change joining with sc_demog_lookup to right_join and move person_id down

* Archive social care extracts (#927)

* Set up `get_sandpit_extract_path`

* Update documentation

* Update sc `all` data paths

* Write sandpit extract if file does not exist

* Style code

---------

Co-authored-by: Jennit07 <[email protected]>

* Update excel sg completeness tabs

---------

Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Megan McNicol <[email protected]>
Co-authored-by: SwiftySalmon <[email protected]>
Co-authored-by: marjom02 <[email protected]>
Co-authored-by: Zihao Li <[email protected]>
Co-authored-by: lizihao-anu <[email protected]>
Co-authored-by: rchlv <[email protected]>
Co-authored-by: Zihao Li <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request Jun 11, 2024
* Remove redundant code

* Update documentation

* Style code

* Reorder when we match on client variables
This was causing NSUs to show a social care id. This now resolves this.

* Update documentation

* Style code

* Revert "Update logic to use end of Quarter"

This reverts commit 004e831.

* Style code

* Update documentation

* add check comment (TO DO for this PR)

* Remove `check_quarter_format` function

* Remove `check_quarter_format`

* Add chi parameter to `create_demog_test_flags`

* Style code

* Use CHI parameter for ep/indiv tests

* Use CHI parameter for extract tests (chi)

* Change test sheet names to lowercase

* Change date to lowercase

* Update documentation

* Update documentation

* Update documentation

* Style code

* Fix pick variables
This was not taking the correct variables, leading to NSUs being assigned psychiatry

* SC Demographics and SDS (#900)

* Style code

* # read in sc demographics

different variables - removed extract date as not accurate, using chi over upi after discussion with social care data management. Added in date of death just for fun.

* social care demographics first draft

removed a lot of the submitted variables and instead using chi variables from chi seeding. Other changes:
- Fill in missing values,
- create flag for latest social care id (one from database is not accurate), this makes sure that each chi only has ONE sc id as the latest to stop it creating duplicates
- change postcode to choose chi over submitted

* Style code

* had a github error? Not sure what happened but commiting first draft of sc demographics

* Style code

* first draft sds.
No major changes - only how demographics is matched on and how latest social care id is selected

* Update documentation

* demographics - add sending location to group by

* Style code

* Update documentation

* Added ungroup()

* Remove comments

* Remove comments

* Style code

---------

Co-authored-by: SwiftySalmon <[email protected]>
Co-authored-by: marjom02 <[email protected]>
Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Zihao Li <[email protected]>

* Sc all at speedup (#904)

* speed up process_sc_all_alarms_telecare function with data.table package

* Update documentation

---------

Co-authored-by: lizihao-anu <[email protected]>
Co-authored-by: Megan McNicol <[email protected]>
Co-authored-by: Jennit07 <[email protected]>

* Add case_when statement for `high_cc` cohort

* Bug - `high_cc` in demographic cohort showing `NAs` instead of `TRUE/FALSE` (#911)

Add case_when statement for `high_cc` cohort

* added a casewhen to update property type description for homelessness

* Update documentation

* Style code

* Bug - deal with missing variables (#914)

* Add missing sc variables for no sc data

* Fix code for including `_inc_dna` variables

* Remove commented line

* Bug - Fix get pop path failing and preventing the indiv file from running.  (#913)

Fix bug - pop file paths breaking indiv file

* correct file hscp file path

* Update process_sc_all_home_care.R

A small issue was identified when running targets. Linked with changes to the function `fix_sc_end_dates()`

* Update process_sc_all_alarms_telecare.R

* remove duplicate columns

* Fix targets (#892)

* fix sc_client_lookup sc_send_lca

* fix an issue of get_pop_path

* Style code

* fix the rest of get_pop_path from get_datazone_pop_path

* Update documentation

* fix sc_send_lca

* add missing year column

* explicitly specify the argument year to avoid corruption of targets

* Update documentation

* new data pipeline with targets
remove create_individual_files from targets and append it to run_targets script

* minor changes

* Style code

* undo sc_send_lca bit

* Update targets scripts

* Remove top level targets scripts

---------

Co-authored-by: lizihao-anu <[email protected]>
Co-authored-by: Megan McNicol <[email protected]>
Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Jennifer Thom <[email protected]>

* remove cases that start date is later than end date

* Update Refs for March24 SLF update

* update documentation

* Update sc connection name

* Update documentation

* 936 - Update parameters with file paths  (#939)

Specify file paths in sc function parameters

* Add test for `n_records` in ep file tests

* remove and merge overlapping records in GP OoHs

* Style code

* update spelling to lowercases

* update spelling

* Add function for reading Dev SLF file
Uses SLFhelper for easy access to Source_Linkage_Files

* Add cross year tests using SLFhelper WIP
WIP - still need to add write to disk and possibly develop visuals

* Create tests for social care sandpit extracts (#943)

* Update `write_tests_xlsx`

* Update documentation

* Add in sandpit tests where the extract is saved

* Setup tests for sandpit
Further checks needed for writing to disk

* Update documentation

* Amend case_when statement

* rename function to include 'sc'

* Update documentation

* Use `is.null` instead of `missing`

* Update documentation

* Add `year` as a parameter

* Update documentation

* Setup for writing sandpit tests to disk

* Update parameters for sandpit tests

* Update documentation

* Use `process_tests_sc_sandpit`

* Apply styling

* Style code

* update documentation

Co-authored-by: Zihao Li <[email protected]>

* Rename variable sc_id

Co-authored-by: Zihao Li <[email protected]>

* Rename variable

Co-authored-by: Zihao Li <[email protected]>

* Rename variable

Co-authored-by: Zihao Li <[email protected]>

* Update documentation

* [check-spelling] Update metadata

Update for https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/8689503990/attempts/1
Accepted in #943 (comment)

Signed-off-by: check-spelling-bot <[email protected]>
on-behalf-of: @check-spelling <[email protected]>

* update spelling

* update spelling expect variant

---------

Signed-off-by: check-spelling-bot <[email protected]>
Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Zihao Li <[email protected]>
Co-authored-by: Zihao Li <[email protected]>

* Remove filtering between 90-105% completeness

* Keep percentage comparison

* Add new variable pre/post hl1 application

* re-write the logic of fill_ch_names

* Update documentation

* Style code

* minor typo fix

* [check-spelling] Update metadata

Update for https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/8881311681/attempts/1
Accepted in #946 (comment)

Signed-off-by: check-spelling-bot <[email protected]>
on-behalf-of: @check-spelling <[email protected]>

* update spelling expect

* update spelling expect

* fix R CMD warning of no visible binding

* Style code

* [check-spelling] Update metadata

Update for https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/8893412405/attempts/1
Accepted in #946 (comment)

Signed-off-by: check-spelling-bot <[email protected]>
on-behalf-of: @check-spelling <[email protected]>

* spelling seems not recognize variants

* only select columns we want in ltc raw data

* [check-spelling] Update metadata

Update for https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/8897746003/attempts/1
Accepted in #947 (comment)

Signed-off-by: check-spelling-bot <[email protected]>
on-behalf-of: @check-spelling <[email protected]>

* fix care home cancelled dates might be 1900-01-01

* for some reason the latest scid code was overwritten after the march update?? anyway, now it is fixed.

* Style code

* add checking ch_postcode in England, quality 15

* Update documentation

* Style code

* [check-spelling] Update metadata

Update for https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/8971882687/attempts/1
Accepted in #946 (comment)

Signed-off-by: check-spelling-bot <[email protected]>
on-behalf-of: @check-spelling <[email protected]>

* spelling metadata

* Merge May24 NI update into June update branch (#949)

Collect data before manipulations

* update metadata for fill_ch_names

* Update documentation

* add rounding to one decimal place on percentage

* Add write to disk

* update `write_tests_xlsx`

* Style code

* Update documentation

* Add to targets pipeline

* Update NEWS.md

* Update NEWS.md

* Added function for get_all_slf_deaths_lookup_path

* Update documentation

* Style code

* Add vars for activity after death flag

* Add activity after death flag

* Join data back to episode file

* Style code

* Update documentation

* fix a bug for quality 21

* Update `00_sort_bi_extracts` to write anon_chi (#952)

* Update `00_sort_BI_extracts`
Save a new file with `anon-` prefix and use slfhelper to get the anon_chi

* remove file copy

* Update `00_sort_bi_extracts` note

* Style code

* Update chi when this is different e.g UPI number or PAT_UPI

* remove storing as a dataframe

* Add condition if CHI exists in data file

* update 00_Sort_BI_Extracts
replace for loop by function to enable parallel computing with lapply

* Style code

* merge similar code

* simplify sort_bi_extracts

---------

Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Zihao Li <[email protected]>
Co-authored-by: lizihao-anu <[email protected]>

* Update refs

* changes to activity after death flag

* Update documentation

* Update R/add_activity_after_death_flag.R

Co-authored-by: Jennit07 <[email protected]>

* Update R/add_activity_after_death_flag.R

Co-authored-by: Jennit07 <[email protected]>

* added .data$ to variables

* Update documentation

* Style code

* comment out cross_year_tests for now

* Update anon_chi for dn and cmh

* Update boxi filepath ("anon-")

* remove file copy

* Update `00_sort_bi_extracts` note

* Style code

* Update `get_source_extract_path` (anon- prefix)

* Update chi when this is different e.g UPI number or PAT_UPI

* Change `read` functions to read anon_chi

* change `process` functions to read `anon_chi`

* remove storing as a dataframe

* Add condition if CHI exists in data file

* Update dd path

* switch between chi - ooh and dd

* Update chi when this is different e.g UPI number or PAT_UPI

* remove storing as a dataframe

* Add condition if CHI exists in data file

* update 00_Sort_BI_Extracts
replace for loop by function to enable parallel computing with lapply

* Style code

* merge similar code

* simplify sort_bi_extracts

* update sparra/hhg paths (anon_chi)

* use anon_chi for sc demogs

* Update documentation

* Update `create_episode_file`

* update NSU path

* Use `get_chi` before phs methods check - ooh

* Update LTCs

* Style code

* Update sc paths to `anon-` prefix

* update cohorts paths

* Update deaths paths with `anon-` prefix

* sc client anon_chi

* match files with chi

* Update `create_episode_file` joins

* Update documentation

* update get sandpit extracts

* update tests to use `chi`

* Style code

* Update IT extracts to maintain chi

* Update sort_bi_extracts

* Update bracket

* update parameter

* Update documentation

* bugs fix

* fix reading data from plateform and homelessness chi

* update sc demog path

* update homelessness lookup

* Update documentation

* supply get_chi() where needed in targets

* Style code

* Update documentation

* Update targets with get_chi()

* Update targets with get_chi()

* Update client script

* Update documentation

* fix fill_ch_names

* add anon- and update targets

* fix add_activity_after_death in create_episode_file

* Style code

* process_tests_sc_client_lookup fix

* fix anon-chi issues in create_episode_file

* Update documentation

* fix typo

* Update documentation

* fix write_tests_xlsx path

* minor fix

* fix R package build warnings

* Style code

* aligning

* [check-spelling] Update metadata

Update for https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/9419296266/attempts/1
Accepted in #962 (comment)

Signed-off-by: check-spelling-bot <[email protected]>
on-behalf-of: @check-spelling <[email protected]>

* remove version 3.6
arrow package requries 4.0 or newer

* spelling checking fix trial

* Revert "spelling checking fix trial"

This reverts commit 1df8bc4.

* new github spell check workflows

* Revert "new github spell check workflows"

This reverts commit a35dc65.

* trial spell checking

* update expected word list

* update word list

* Update metadata

check-spelling run (push) for 966-github-action-spell-checking-issues-cannot-properly-recognize-variants

Signed-off-by: check-spelling-bot <[email protected]>
on-behalf-of: @check-spelling <[email protected]>

* spell checking update

* Update metadata

check-spelling run (pull_request_target) for June-24-update

Signed-off-by: check-spelling-bot <[email protected]>
on-behalf-of: @check-spelling <[email protected]>

---------

Signed-off-by: check-spelling-bot <[email protected]>
Co-authored-by: Jennifer Thom <[email protected]>
Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Megan McNicol <[email protected]>
Co-authored-by: SwiftySalmon <[email protected]>
Co-authored-by: marjom02 <[email protected]>
Co-authored-by: lizihao-anu <[email protected]>
Co-authored-by: rchlv <[email protected]>
Co-authored-by: rachev04 <[email protected]>
Co-authored-by: rchlv <[email protected]>
Jennit07 added a commit that referenced this pull request Sep 17, 2024
…s. (#988)

* fix sc_client_lookup sc_send_lca

* fix an issue of get_pop_path

* Style code

* fix the rest of get_pop_path from get_datazone_pop_path

* Update documentation

* fix sc_send_lca

* add missing year column

* Remove redundant code

* Update documentation

* Style code

* explicitly specify the argument year to avoid corruption of targets

* Update documentation

* Reorder when we match on client variables
This was causing NSUs to show a social care id. This now resolves this.

* Update documentation

* Style code

* Add chi parameter to `create_demog_test_flags`

* Style code

* Use CHI parameter for ep/indiv tests

* Use CHI parameter for extract tests (chi)

* Change test sheet names to lowercase

* Change date to lowercase

* Update documentation

* new data pipeline with targets
remove create_individual_files from targets and append it to run_targets script

* minor changes

* Style code

* Update documentation

* Update documentation

* Style code

* undo sc_send_lca bit

* Add code for running years available

* Update `_targets.R` script for running old years

* Style code

* Update `check_year_valid` for running old years

* Use `check_year_valid` where no data for old yrs

* Style code

* Fix pick variables
This was not taking the correct variables, leading to NSUs being assigned psychiatry

* SC Demographics and SDS (#900)

* Style code

* # read in sc demographics

different variables - removed extract date as not accurate, using chi over upi after discussion with social care data management. Added in date of death just for fun.

* social care demographics first draft

removed a lot of the submitted variables and instead using chi variables from chi seeding. Other changes:
- Fill in missing values,
- create flag for latest social care id (one from database is not accurate), this makes sure that each chi only has ONE sc id as the latest to stop it creating duplicates
- change postcode to choose chi over submitted

* Style code

* had a github error? Not sure what happened but commiting first draft of sc demographics

* Style code

* first draft sds.
No major changes - only how demographics is matched on and how latest social care id is selected

* Update documentation

* demographics - add sending location to group by

* Style code

* Update documentation

* Added ungroup()

* Remove comments

* Remove comments

* Style code

---------

Co-authored-by: SwiftySalmon <[email protected]>
Co-authored-by: marjom02 <[email protected]>
Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Zihao Li <[email protected]>

* Sc all at speedup (#904)

* speed up process_sc_all_alarms_telecare function with data.table package

* Update documentation

---------

Co-authored-by: lizihao-anu <[email protected]>
Co-authored-by: Megan McNicol <[email protected]>
Co-authored-by: Jennit07 <[email protected]>

* Add case_when statement for `high_cc` cohort

* Bug - `high_cc` in demographic cohort showing `NAs` instead of `TRUE/FALSE` (#911)

Add case_when statement for `high_cc` cohort

* added a casewhen to update property type description for homelessness

* Update documentation

* Style code

* Bug - deal with missing variables (#914)

* Add missing sc variables for no sc data

* Fix code for including `_inc_dna` variables

* Remove commented line

* Bug - Fix get pop path failing and preventing the indiv file from running.  (#913)

Fix bug - pop file paths breaking indiv file

* correct file hscp file path

* Declare missing variables for older years

* setup targets scripts for old years

* Style code

* Include `check_year_valid` for sc client path

* Add check year valid to join sc client

* Add if else statement

* WIP - TO DO - fix dummy path for `get_chi()`

* Style code

* update dummy data file to read empty tibble

* Update `check_year_valid`

* Update declared `NA` variables

* Update documentation

* declare `count_not_known` as NA

* supply year as default in `aggregate_by_chi`

* Decalre unused variables

* Style code

* Update sc client with sept update new code

* Specify code for running older years

* Style code

* Add Running SLF files manually scripts

* Style code

* update write_tests_xlsx

* update process_refined_death

* fix tests by removing get_chi

* add 2425

* Style code

* fix NA matches in refined_death

* move latest_cost_year() to cost_uplift()

* improve automation

* Update documentation

* fix `cij_ppa` in DD data

* fix bugs of dd and populate cij_delay back to episodes

* Style code

* keep all variable for delayed discharge episodes

* remove dummy variable names from dd_date

* Style code

* remove `deceased_boxi` variable - bug

* remove `create_person_id`. Its matched in client

* remove `create_person_id`

* Update `run_slf_manually` scripts

* further remove person_id

* fix duplicate row introduced by adding death

* remove duplicated chi when joining death data

* TODO: check distinct death data by chi while keeping chi==NA records

* add parameter for year

* fix duplicate in add_activity_after_death_flag

* Update `check_year_valid`

* Declare DN variables

* Style code

* Declare client variables

* remove extra dd variables

* remove redundant variables

* remove fy variable

* Remove redundant variable `count_not_known`

* Remove duplicate code

* revert commit - remove fy

* update manual run

* declare missing sc variables indiv file

* Style code

---------

Co-authored-by: Zihao Li <[email protected]>
Co-authored-by: lizihao-anu <[email protected]>
Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Megan McNicol <[email protected]>
Co-authored-by: SwiftySalmon <[email protected]>
Co-authored-by: Zihao Li <[email protected]>
Co-authored-by: marjom02 <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request Sep 17, 2024
* update documentation

* Update sc connection name

* Update documentation

* 936 - Update parameters with file paths  (#939)

Specify file paths in sc function parameters

* remove and merge overlapping records in GP OoHs

* Style code

* update spelling to lowercases

* update spelling

* Create tests for social care sandpit extracts (#943)

* Update `write_tests_xlsx`

* Update documentation

* Add in sandpit tests where the extract is saved

* Setup tests for sandpit
Further checks needed for writing to disk

* Update documentation

* Amend case_when statement

* rename function to include 'sc'

* Update documentation

* Use `is.null` instead of `missing`

* Update documentation

* Add `year` as a parameter

* Update documentation

* Setup for writing sandpit tests to disk

* Update parameters for sandpit tests

* Update documentation

* Use `process_tests_sc_sandpit`

* Apply styling

* Style code

* update documentation

Co-authored-by: Zihao Li <[email protected]>

* Rename variable sc_id

Co-authored-by: Zihao Li <[email protected]>

* Rename variable

Co-authored-by: Zihao Li <[email protected]>

* Rename variable

Co-authored-by: Zihao Li <[email protected]>

* Update documentation

* [check-spelling] Update metadata

Update for https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/8689503990/attempts/1
Accepted in #943 (comment)

Signed-off-by: check-spelling-bot <[email protected]>
on-behalf-of: @check-spelling <[email protected]>

* update spelling

* update spelling expect variant

---------

Signed-off-by: check-spelling-bot <[email protected]>
Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Zihao Li <[email protected]>
Co-authored-by: Zihao Li <[email protected]>

* only select columns we want in ltc raw data

* [check-spelling] Update metadata

Update for https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/8897746003/attempts/1
Accepted in #947 (comment)

Signed-off-by: check-spelling-bot <[email protected]>
on-behalf-of: @check-spelling <[email protected]>

* for some reason the latest scid code was overwritten after the march update?? anyway, now it is fixed.

* Style code

* Merge May24 NI update into June update branch (#949)

Collect data before manipulations

* Update NEWS.md

* link GP-OoH with CUP markers

* Style code

* update gp ooh cup

* link cup to acute

* Update documentation

* adding the death dates to activity after death cases (#972)

adding the death dates to the cases where there is activity after death

Co-authored-by: marjom02 <[email protected]>

* Add sys time to functions (#971)

* adding in syst_time alerts for all functions in create episode and create individual. So that when it runs manually I can see where it is and where it's getting stuck

* Style code

---------

Co-authored-by: marjom02 <[email protected]>
Co-authored-by: SwiftySalmon <[email protected]>
Co-authored-by: Jennit07 <[email protected]>

* Update slf deaths lookup function name

* automate combined deaths lookup

* Update documentation

* Update targets script

* Update years

* Update running process manually

* re-arrange brackets

* Style code

* Update run targets scripts

* Set up all-targets script

* Style code

* Update documentation

* Update targets script

* Update years

* Update running process manually

* re-arrange brackets

* Style code

* Update run targets scripts

* Set up all-targets script

* Style code

* Style code

* Automate the combined slf deaths lookup (#973)

Closes #957 and #968

* Update documentation

* Update Run_SLF_Files_targets/run_all_targets.R

Co-authored-by: Zihao Li <[email protected]>

* Style code

* remove combined_deaths_lookup from targets

* Style code

* fix acute_cup and gp_ooh_cup paths

* Update documentation

* fix typo

* adapt acute_cup for anon_chi

* Style code

* minor changes

* Style code

* Update documentation

* Person id sds (#981)

* added back in missing person_id for SDS.
also added latest_flag back in to client lookup in targets

* Update documentation

* added back in missing person_id for SDS.
also added latest_flag back in to client lookup in targets

* Update documentation

* change as suggested by Jen

* Update documentation

---------

Co-authored-by: marjom02 <[email protected]>
Co-authored-by: SwiftySalmon <[email protected]>
Co-authored-by: Jennifer Thom <[email protected]>
Co-authored-by: Jennit07 <[email protected]>

* Update NEWS.md

* unify file names for cup files

* Add client flags (#979)

* New methodology for social care client data.

- removed code that wasn't needed.
- updated housing codes
- latest social care ID
- changed "mental health problems" to "mental health disorders" in line with PHS style guide

* Update documentation

* Style code

* add person ID to client so it carries through to match on to all cases

* New methodology for social care client data.

- removed code that wasn't needed.
- updated housing codes
- latest social care ID
- changed "mental health problems" to "mental health disorders" in line with PHS style guide

* Update documentation

* Style code

* add person ID to client so it carries through to match on to all cases

---------

Co-authored-by: marjom02 <[email protected]>
Co-authored-by: SwiftySalmon <[email protected]>
Co-authored-by: Jennifer Thom <[email protected]>
Co-authored-by: Jennit07 <[email protected]>

* Update lookup to use anon-chi

* Remove redundant code
This uses the NRS Weekly dates and if this is blank use the chi death date. This methodology is wrong. We want to use the monthly nrs boxi date by default and chi date if there is an issue

* Update documentation

* remove weekly nrs date variable

* Use boxi nrs date or chi death date

* Use `get_combined_slf_deaths_path`

* add catch for NAs

* add notes

* Fix typo

* remove redundant code

* Style code

* add a function of combine nrs and it_chi death

* Update documentation

* Style code

* minor changes

* remove process_slf_deaths_lookup

* Update documentation

* Major update of Care Home script (#945)

* # major changes to care home script

see document on sharepoint for description

also:
- added in type of admission description
- updated care home contact in fill_ch_name script

* minor note updates

* Style code

* Update documentation

* couple of note updates

* Update R/process_sc_all_care_home.R

Co-authored-by: Jennit07 <[email protected]>

* Update R/process_sc_all_care_home.R

Co-authored-by: Jennit07 <[email protected]>

* Style code

* Update R/process_sc_all_care_home.R

* change to ch name lookup

* Update documentation

* remove fill ch provider fill line

* update fill ch names so it works with new ch methodology

* Style code

* Update documentation

* Style code

* Update documentation

* Remove redundant variable `latest_sc_id`

* use slfhelper::get_chi

* new section for sc_ch_id_markers

* Style code

* Update documentation

* Remove extra text and white space

* add rename to use death_date_chi

* use `read_excel` function

* Update documentation

* Return the paths only for SPD and ch name

* Update documentation

* Remove rename - no longer needed

* fix typo

* remove variables that dont exist

---------

Co-authored-by: marjom02 <[email protected]>
Co-authored-by: SwiftySalmon <[email protected]>
Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Jennifer Thom <[email protected]>

* update `ch_chi_cis` methodology

* update `ch_sc_id_cis` methodology

* Update notes

* Use `right_join`

* Update process_sc_all_care_home.R

added in missing variable at the end

* Add new nsu (#991)

* Add NSU code to github
Includes extracting the service user cohort to send to the chili team and then NSU extraction.

* Style code

* Add compression and package library

* Style code

* pick up latest geography file, and save out with compression (#983)

* pick up latest geography file, and save out with compression

* Style code

* use `get_spd_path`

Co-authored-by: James McMahon <[email protected]>

---------

Co-authored-by: marjom02 <[email protected]>
Co-authored-by: SwiftySalmon <[email protected]>
Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: James McMahon <[email protected]>

---------

Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Megan McNicol <[email protected]>
Co-authored-by: marjom02 <[email protected]>
Co-authored-by: SwiftySalmon <[email protected]>
Co-authored-by: James McMahon <[email protected]>

* update reference

* Reduce dependencies (#984)

* removing packages that I don't think get used anywhere. and removing references to fst and spss files

* Update documentation

* Update authors in description

* [check-spelling] Update metadata

Update for https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/10469297653/attempts/1
Accepted in #984 (comment)

Signed-off-by: check-spelling-bot <[email protected]>
on-behalf-of: @check-spelling <[email protected]>

* [check-spelling] Update metadata

Update for https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/10469690723/attempts/1https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/10469690723/attempts/1
Accepted in #984 (comment)

Signed-off-by: check-spelling-bot <[email protected]>
on-behalf-of: @check-spelling <[email protected]>

---------

Signed-off-by: check-spelling-bot <[email protected]>
Co-authored-by: marjom02 <[email protected]>
Co-authored-by: SwiftySalmon <[email protected]>
Co-authored-by: Jennifer Thom <[email protected]>
Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Jennit07 <[email protected]>

* minor changes to social care code

* Style code

* Update process_sc_all_care_home.R

* Update NEWS.md

* merge Sep2024 fix into sep24 branch (#1003)

* update write_tests_xlsx

* update process_refined_death

* fix tests by removing get_chi

* add 2425

* Style code

* fix NA matches in refined_death

* move latest_cost_year() to cost_uplift()

* improve automation

* Update documentation

* fix `cij_ppa` in DD data

* fix bugs of dd and populate cij_delay back to episodes

* Style code

* keep all variable for delayed discharge episodes

* remove dummy variable names from dd_date

* Style code

* remove `deceased_boxi` variable - bug

* remove `create_person_id`. Its matched in client

* remove `create_person_id`

* Update `run_slf_manually` scripts

* further remove person_id

* fix duplicate row introduced by adding death

* remove duplicated chi when joining death data

* TODO: check distinct death data by chi while keeping chi==NA records

* add parameter for year

* fix duplicate in add_activity_after_death_flag

* Update `check_year_valid`

* Declare DN variables

* Style code

* remove redundant variables

---------

Co-authored-by: Zihao Li <[email protected]>
Co-authored-by: lizihao-anu <[email protected]>
Co-authored-by: Jennit07 <[email protected]>

* update copy_to_hscdiip.R

* Update older years to bring the data in line with our newest processes.  (#988)

* fix sc_client_lookup sc_send_lca

* fix an issue of get_pop_path

* Style code

* fix the rest of get_pop_path from get_datazone_pop_path

* Update documentation

* fix sc_send_lca

* add missing year column

* Remove redundant code

* Update documentation

* Style code

* explicitly specify the argument year to avoid corruption of targets

* Update documentation

* Reorder when we match on client variables
This was causing NSUs to show a social care id. This now resolves this.

* Update documentation

* Style code

* Add chi parameter to `create_demog_test_flags`

* Style code

* Use CHI parameter for ep/indiv tests

* Use CHI parameter for extract tests (chi)

* Change test sheet names to lowercase

* Change date to lowercase

* Update documentation

* new data pipeline with targets
remove create_individual_files from targets and append it to run_targets script

* minor changes

* Style code

* Update documentation

* Update documentation

* Style code

* undo sc_send_lca bit

* Add code for running years available

* Update `_targets.R` script for running old years

* Style code

* Update `check_year_valid` for running old years

* Use `check_year_valid` where no data for old yrs

* Style code

* Fix pick variables
This was not taking the correct variables, leading to NSUs being assigned psychiatry

* SC Demographics and SDS (#900)

* Style code

* # read in sc demographics

different variables - removed extract date as not accurate, using chi over upi after discussion with social care data management. Added in date of death just for fun.

* social care demographics first draft

removed a lot of the submitted variables and instead using chi variables from chi seeding. Other changes:
- Fill in missing values,
- create flag for latest social care id (one from database is not accurate), this makes sure that each chi only has ONE sc id as the latest to stop it creating duplicates
- change postcode to choose chi over submitted

* Style code

* had a github error? Not sure what happened but commiting first draft of sc demographics

* Style code

* first draft sds.
No major changes - only how demographics is matched on and how latest social care id is selected

* Update documentation

* demographics - add sending location to group by

* Style code

* Update documentation

* Added ungroup()

* Remove comments

* Remove comments

* Style code

---------

Co-authored-by: SwiftySalmon <[email protected]>
Co-authored-by: marjom02 <[email protected]>
Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Zihao Li <[email protected]>

* Sc all at speedup (#904)

* speed up process_sc_all_alarms_telecare function with data.table package

* Update documentation

---------

Co-authored-by: lizihao-anu <[email protected]>
Co-authored-by: Megan McNicol <[email protected]>
Co-authored-by: Jennit07 <[email protected]>

* Add case_when statement for `high_cc` cohort

* Bug - `high_cc` in demographic cohort showing `NAs` instead of `TRUE/FALSE` (#911)

Add case_when statement for `high_cc` cohort

* added a casewhen to update property type description for homelessness

* Update documentation

* Style code

* Bug - deal with missing variables (#914)

* Add missing sc variables for no sc data

* Fix code for including `_inc_dna` variables

* Remove commented line

* Bug - Fix get pop path failing and preventing the indiv file from running.  (#913)

Fix bug - pop file paths breaking indiv file

* correct file hscp file path

* Declare missing variables for older years

* setup targets scripts for old years

* Style code

* Include `check_year_valid` for sc client path

* Add check year valid to join sc client

* Add if else statement

* WIP - TO DO - fix dummy path for `get_chi()`

* Style code

* update dummy data file to read empty tibble

* Update `check_year_valid`

* Update declared `NA` variables

* Update documentation

* declare `count_not_known` as NA

* supply year as default in `aggregate_by_chi`

* Decalre unused variables

* Style code

* Update sc client with sept update new code

* Specify code for running older years

* Style code

* Add Running SLF files manually scripts

* Style code

* update write_tests_xlsx

* update process_refined_death

* fix tests by removing get_chi

* add 2425

* Style code

* fix NA matches in refined_death

* move latest_cost_year() to cost_uplift()

* improve automation

* Update documentation

* fix `cij_ppa` in DD data

* fix bugs of dd and populate cij_delay back to episodes

* Style code

* keep all variable for delayed discharge episodes

* remove dummy variable names from dd_date

* Style code

* remove `deceased_boxi` variable - bug

* remove `create_person_id`. Its matched in client

* remove `create_person_id`

* Update `run_slf_manually` scripts

* further remove person_id

* fix duplicate row introduced by adding death

* remove duplicated chi when joining death data

* TODO: check distinct death data by chi while keeping chi==NA records

* add parameter for year

* fix duplicate in add_activity_after_death_flag

* Update `check_year_valid`

* Declare DN variables

* Style code

* Declare client variables

* remove extra dd variables

* remove redundant variables

* remove fy variable

* Remove redundant variable `count_not_known`

* Remove duplicate code

* revert commit - remove fy

* update manual run

* declare missing sc variables indiv file

* Style code

---------

Co-authored-by: Zihao Li <[email protected]>
Co-authored-by: lizihao-anu <[email protected]>
Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Megan McNicol <[email protected]>
Co-authored-by: SwiftySalmon <[email protected]>
Co-authored-by: Zihao Li <[email protected]>
Co-authored-by: marjom02 <[email protected]>

* Update `replace_sc_id_with_latest` function

* Style code

---------

Signed-off-by: check-spelling-bot <[email protected]>
Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Zihao Li <[email protected]>
Co-authored-by: Zihao Li <[email protected]>
Co-authored-by: lizihao-anu <[email protected]>
Co-authored-by: marjom02 <[email protected]>
Co-authored-by: Megan McNicol <[email protected]>
Co-authored-by: SwiftySalmon <[email protected]>
Co-authored-by: rchlv <[email protected]>
Co-authored-by: James McMahon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants