Analyzing Emergency Room triage notes to support better treatment and improve health outcomes in Pakistan

To view the code produced for this project by the 2022 Data Science for Social Good (DSSG) fellows, click here.

The Policy Problem:

The Indus Hospital and Health Network (IHHN) is a non-profit healthcare provider in Pakistan serving more than 5.4 million patients a year. Our partner is a team of doctors and researchers with the Emergency Department (ED) of IHHN’s flagship hospital in Korangi, Karachi. IHHN’s hospitals are the preferred destination for many patients because its services are provided free of charge. However, patients regularly face long wait times because the demand for seats outstrips the available staff.

The Machine Learning Problem:

Objective

Due to high patient volume and limited hospital resources, the hospital is also constrained in its ability to triage and diagnose patients efficiently and effectively. Additionally, IHHN lacks structured workflows and follow-up planning because the intake process relies on unstructured text data (nurse and physician notes) and is subjective (varies across physicians), variable (varies for each physician over time), inefficient (takes too long to do manually), and error-prone (due to human involvement). Our work with IHHN focuses on building a system to convert nurse and physician notes into structured International Classification of Disease (ICD-10) codes that IHHN can use to create workflows and tools for physicians, nurses, and hospital staff as they work to improve patient care and outcomes.

Unit of Analysis + Temporal Cross-Validation

Predictions are made at the patient-visit level (eg. as of January 1, 2019, what are the top 10 predicted ICD-10 categories that a given diagnosis is associated with?). We used data from 2019-2021 for this analysis. We filter for non-pregnant adults. We used a temporal cross-validation approach to train the models to ensure that our final model will generalize effectively to new data. As illustrated by Figure 1 below, this temporal cross-validation approach splits the data by time - training each individual model on increasing amounts of data

We have two types of cohorts: training and validation cohorts. The training cohort refers to all data points before the start of the validation cohort. Validation cohorts are three months of data. The first validation cohort begins at the tail end of our data set (2021-09-01 to 2021-12-30). The training cohort for this validation set begins 2019-01-01 and ends 2021-08-31. The validation window is then sequentially moved back to cover the previous 3 months preceeding the current validation cohort start date. The training cohort then becomes every data point preceeding the start date of the current validation cohort.

Feature Generation

We generated several features

Feature Name	Data Source(s)	Description
Text Features
Triage_complaint	HMIS data	Categorical, short text notes, assigned to patients at the first point of contact
HOPI	HMIS data	Clinician note
ED_DX	HMIS data	Provisional diagnosis
Categorical Features
Acuity	HMIS data	Severity of a patient’s medical condition
Gender	HMIS data	Sex assigned at birth
Investigations completed during past visits	Investigations data	Refers to how many medical investigations have been carried out on the patient prior to the current visit
Continuous Features
Age	HMIS data	The age of the patient
Number of previous visits	HMIS data	The number of times a patient has been to the hospital in the past
Number of notes completed during current visit	HMIS data	How many notes did the physician
Length of doctor's note in words or characters	HMIS data	Possible proxy for case complexity
Pulse
Temperature	Temperature in Fahrenheit
Breathing rate
Temporal Features
Season	Season corresponding to `triage_datetime`	Season of the year (e.g., fall, winter, spring, summer)

Pipeline Configuration File

The ETL and machine learning pipeline can be configured using the config file config/model_settings.py.

Please reference the readme_config.yaml file for the set of configuration parameters that build all of the final models discussed in this document.

We take advantage of some secondary data sources provided Original files: https://www.cms.gov/medicare/icd-10/2022-icd-10-cm Google Drive: https://drive.google.com/drive/folders/1Y1rAYzsHBddZIdinMYMqTcIYPJs0Fp8f?usp=sharing

These are required to be uploaded to the directory where all the other raw files from

ETL

To run the full ETL and machine learning pipeline complete the following steps:

bash run_full_pipeline.sh /path/to/raw/files

This will complete 5 tasks:

converts all raw xls, xlsx and text files to csv
converts "raw" csv files into a "processed" csv directory
From the "processed" directory, we write the csvs to a raw schema
From the raw schema, we clean data and return to processed schema
We build our training set from this and return to model_output schema.
Running the full machine learning pipeline on the dev schema (limiting training data to 1000 rows for testing purposes).

to run only ETL comment out the final line such that:

# pakistan-ihhn run-pipeline --schema_type dev

Machine Learning Pipeline

With the settings in the config, the machine learning pipeline can be run from the command line using the CLI we set up: pakistan-ihhn run-pipeline --schema_type prod

For testing changes to the pipeline using small amounts of data: pakistan-ihhn run-pipeline --schema_type dev

This will create a new schema called {SCHEMA_NAME}_dev if it does not already exist and will save all output there. It will also append dev to the model objects written to the server.
Setting --schema_type to prod to run on full dataset.

Results

For our analysis, we have configured our pipeline to use random forest classifiers, decision tree classifiers, multinomial logistic regression, multinomial naive bayes, and XGBoost models². However, for our current model output, we prioritized running random forest classifiers, decision tree classifiers, and multinomial naive bayes. We evaluated or models based on the following metrics at various constraints:

Average recall at 5,10,15,20 ICD-10 categories
Average precision at 5,10,15,20 ICD-10 categories
Average accuracy at 5,10,15, 20 ICD-10 categories

We compared our results of our three models against two baselines: 1) we compared a given patient diagnosis to similar cases which have passed through the ED in the past and 2) treated the provisional diagnoses as “assumed to be correct”, and then predicted the code with maximum similarity to the provisional diagnosis.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github		.github
config		config
references		references
reports		reports
scripts		scripts
sql		sql
src		src
.env.sample		.env.sample
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
__init__.py		__init__.py
main.py		main.py
requirements.txt		requirements.txt
run_full_pipeline.sh		run_full_pipeline.sh
setup.py		setup.py
setup_environment.py		setup_environment.py
test_environment.py		test_environment.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analyzing Emergency Room triage notes to support better treatment and improve health outcomes in Pakistan

The Policy Problem:

The Machine Learning Problem:

Objective

Unit of Analysis + Temporal Cross-Validation

Feature Generation

Pipeline Configuration File

ETL

Machine Learning Pipeline

Results

About

Releases

Packages

Languages

License

dssg/pakistan_ihhn_public

Folders and files

Latest commit

History

Repository files navigation

Analyzing Emergency Room triage notes to support better treatment and improve health outcomes in Pakistan

The Policy Problem:

The Machine Learning Problem:

Objective

Unit of Analysis + Temporal Cross-Validation

Feature Generation

Pipeline Configuration File

ETL

Machine Learning Pipeline

Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages