Skip to content

Commit 6008371

Browse files
authored
Merge pull request #320 from cmu-delphi/changehc
CHC Signal
2 parents 12aac02 + a2fc87b commit 6008371

26 files changed

+563757
-0
lines changed

changehc/README.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# Change Healthcare Indicator
2+
3+
COVID-19 indicator using outpatient visits from Change Healthcare claims data.
4+
Reads claims data into pandas dataframe.
5+
Makes appropriate date shifts, adjusts for backfilling, and smooths estimates.
6+
Writes results to csvs.
7+
8+
9+
## Running the Indicator
10+
11+
The indicator is run by directly executing the Python module contained in this
12+
directory. The safest way to do this is to create a virtual environment,
13+
installed the common DELPHI tools, and then install the module and its
14+
dependencies. To do this, run the following code from this directory:
15+
16+
```
17+
python -m venv env
18+
source env/bin/activate
19+
pip install ../_delphi_utils_python/.
20+
pip install .
21+
```
22+
23+
*Note*: you may need to install blas, in Ubuntu do
24+
```
25+
sudo apt-get install libatlas-base-dev gfortran
26+
```
27+
28+
All of the user-changable parameters are stored in `params.json`. To execute
29+
the module and produce the output datasets (by default, in `receiving`), run
30+
the following:
31+
32+
```
33+
env/bin/python -m delphi_changehc
34+
```
35+
36+
Once you are finished with the code, you can deactivate the virtual environment
37+
and (optionally) remove the environment itself.
38+
39+
```
40+
deactivate
41+
rm -r env
42+
```
43+
44+
## Testing the code
45+
46+
To do a static test of the code style, it is recommended to run **pylint** on
47+
the module. To do this, run the following from the main module directory:
48+
49+
```
50+
env/bin/pylint delphi_changehc
51+
```
52+
53+
The most aggressive checks are turned off; only relatively important issues
54+
should be raised and they should be manually checked (or better, fixed).
55+
56+
Unit tests are also included in the module. To execute these, run the following
57+
command from this directory:
58+
59+
```
60+
(cd tests && ../env/bin/pytest --cov=delphi_changehc --cov-report=term-missing)
61+
```
62+
63+
The output will show the number of unit tests that passed and failed, along
64+
with the percentage of code covered by the tests. None of the tests should
65+
fail and the code lines that are not covered by unit tests should be small and
66+
should not include critical sub-routines.
67+
68+
## Code tour
69+
70+
- update_sensor.py: CHCSensorUpdator: reads the data, makes transformations, writes results to file
71+
- sensor.py: CHCSensor: methods for transforming data, including backfill and smoothing
72+
- smooth.py: implements local linear left Gaussian filter
73+
- load_data.py: methods for loading denominator and covid data
74+
- config.py: Config: constants for reading data and transformations, Constants: constants for sanity checks
75+
- constants.py: constants for signal names
76+
- weekday.py: Weekday: Adjusts for weekday effect
77+

changehc/REVIEW.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
## Code Review (Python)
2+
3+
A code review of this module should include a careful look at the code and the
4+
output. To assist in the process, but certainly not in replace of it, please
5+
check the following items.
6+
7+
**Documentation**
8+
9+
- [ ] the README.md file template is filled out and currently accurate; it is
10+
possible to load and test the code using only the instructions given
11+
- [ ] minimal docstrings (one line describing what the function does) are
12+
included for all functions; full docstrings describing the inputs and expected
13+
outputs should be given for non-trivial functions
14+
15+
**Structure**
16+
17+
- [ ] code should use 4 spaces for indentation; other style decisions are
18+
flexible, but be consistent within a module
19+
- [ ] any required metadata files are checked into the repository and placed
20+
within the directory `static`
21+
- [ ] any intermediate files that are created and stored by the module should
22+
be placed in the directory `cache`
23+
- [ ] final expected output files to be uploaded to the API are placed in the
24+
`receiving` directory; output files should not be committed to the respository
25+
- [ ] all options and API keys are passed through the file `params.json`
26+
- [ ] template parameter file (`params.json.template`) is checked into the
27+
code; no personal (i.e., usernames) or private (i.e., API keys) information is
28+
included in this template file
29+
30+
**Testing**
31+
32+
- [ ] module can be installed in a new virtual environment
33+
- [ ] pylint with the default `.pylint` settings run over the module produces
34+
minimal warnings; warnings that do exist have been confirmed as false positives
35+
- [ ] reasonably high level of unit test coverage covering all of the main logic
36+
of the code (e.g., missing coverage for raised errors that do not currently seem
37+
possible to reach are okay; missing coverage for options that will be needed are
38+
not)
39+
- [ ] all unit tests run without errors

changehc/cache/.gitignore

Whitespace-only changes.

changehc/delphi_changehc/__init__.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# -*- coding: utf-8 -*-
2+
"""Module to pull and clean indicators from the CHC source.
3+
4+
This file defines the functions that are made public by the module. As the
5+
module is intended to be executed though the main method, these are primarily
6+
for testing.
7+
"""
8+
9+
from __future__ import absolute_import
10+
11+
from . import config
12+
from . import load_data
13+
from . import run
14+
from . import sensor
15+
from . import smooth
16+
from . import update_sensor
17+
from . import weekday
18+
19+
__version__ = "0.0.0"

changehc/delphi_changehc/__main__.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# -*- coding: utf-8 -*-
2+
"""Call the function run_module when executed.
3+
4+
This file indicates that calling the module (`python -m MODULE_NAME`) will
5+
call the function `run_module` found within the run.py file. There should be
6+
no need to change this template.
7+
"""
8+
9+
from .run import run_module # pragma: no cover
10+
11+
run_module() # pragma: no cover

changehc/delphi_changehc/config.py

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
"""
2+
This file contains configuration variables used to generate the CHC signal.
3+
4+
Author: Aaron Rumack
5+
Created: 2020-10-14
6+
"""
7+
8+
from datetime import datetime, timedelta
9+
import numpy as np
10+
11+
12+
class Config:
13+
"""Static configuration variables.
14+
"""
15+
16+
## dates
17+
FIRST_DATA_DATE = datetime(2020, 1, 1)
18+
19+
# number of days training needs to produce estimate
20+
# (one day needed for smoother to produce values)
21+
BURN_IN_PERIOD = timedelta(days=1)
22+
23+
# shift dates forward for labeling purposes
24+
DAY_SHIFT = timedelta(days=1)
25+
26+
## data columns
27+
COVID_COL = "COVID"
28+
DENOM_COL = "Denominator"
29+
COUNT_COLS = ["COVID"] + ["Denominator"]
30+
DATE_COL = "date"
31+
GEO_COL = "fips"
32+
ID_COLS = [DATE_COL] + [GEO_COL]
33+
FILT_COLS = ID_COLS + COUNT_COLS
34+
DENOM_COLS = [GEO_COL, DATE_COL, DENOM_COL]
35+
COVID_COLS = [GEO_COL, DATE_COL, COVID_COL]
36+
DENOM_DTYPES = {"date": str, "Denominator": str, "fips": str}
37+
COVID_DTYPES = {"date": str, "COVID": str, "fips": str}
38+
39+
SMOOTHER_BANDWIDTH = 100 # bandwidth for the linear left Gaussian filter
40+
MIN_DEN = 100 # number of total visits needed to produce a sensor
41+
MAX_BACKFILL_WINDOW = (
42+
7 # maximum number of days used to average a backfill correction
43+
)
44+
MIN_CUM_VISITS = 500 # need to observe at least 500 counts before averaging
45+
46+
47+
class Constants:
48+
"""
49+
Contains the maximum number of geo units for each geo type
50+
Used for sanity checks
51+
"""
52+
# number of counties in usa, including megacounties
53+
NUM_COUNTIES = 3141 + 52
54+
NUM_HRRS = 308
55+
NUM_MSAS = 392 + 52 # MSA + States
56+
NUM_STATES = 52 # including DC and PR
57+
58+
MAX_GEO = {"county": NUM_COUNTIES,
59+
"hrr": NUM_HRRS,
60+
"msa": NUM_MSAS,
61+
"state": NUM_STATES}

changehc/delphi_changehc/constants.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
"""Registry for signal names and geo types"""
2+
SMOOTHED = "smoothed_chc"
3+
SMOOTHED_ADJ = "smoothed_adj_chc"
4+
SIGNALS = [SMOOTHED, SMOOTHED_ADJ]
5+
NA = "NA"
6+
HRR = "hrr"
7+
FIPS = "fips"

changehc/delphi_changehc/load_data.py

Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
"""
2+
Load CHC data.
3+
4+
Author: Aaron Rumack
5+
Created: 2020-10-14
6+
"""
7+
8+
# third party
9+
import pandas as pd
10+
11+
# first party
12+
from .config import Config
13+
14+
15+
def load_denom_data(denom_filepath, dropdate, base_geo):
16+
"""Load in and set up denominator data.
17+
18+
Args:
19+
denom_filepath: path to the aggregated denominator data
20+
dropdate: data drop date (datetime object)
21+
base_geo: base geographic unit before aggregation ('fips')
22+
23+
Returns:
24+
cleaned denominator dataframe
25+
"""
26+
assert base_geo == "fips", "base unit must be 'fips'"
27+
28+
denom_suffix = denom_filepath.split("/")[-1].split(".")[0][9:]
29+
assert denom_suffix == "All_Outpatients_By_County"
30+
denom_filetype = denom_filepath.split("/")[-1].split(".")[1]
31+
assert denom_filetype == "dat"
32+
33+
denom_data = pd.read_csv(
34+
denom_filepath,
35+
sep="|",
36+
header=None,
37+
names=Config.DENOM_COLS,
38+
dtype=Config.DENOM_DTYPES,
39+
)
40+
41+
denom_data[Config.DATE_COL] = \
42+
pd.to_datetime(denom_data[Config.DATE_COL],errors="coerce")
43+
44+
# restrict to start and end date
45+
denom_data = denom_data[
46+
(denom_data[Config.DATE_COL] >= Config.FIRST_DATA_DATE) &
47+
(denom_data[Config.DATE_COL] < dropdate)
48+
]
49+
50+
# counts between 1 and 3 are coded as "3 or less", we convert to 1
51+
denom_data[Config.DENOM_COL][
52+
denom_data[Config.DENOM_COL] == "3 or less"
53+
] = "1"
54+
denom_data[Config.DENOM_COL] = denom_data[Config.DENOM_COL].astype(int)
55+
56+
assert (
57+
(denom_data[Config.DENOM_COL] >= 0).all().all()
58+
), "Denominator counts must be nonnegative"
59+
60+
# aggregate age groups (so data is unique by date and base geography)
61+
denom_data = denom_data.groupby([base_geo, Config.DATE_COL]).sum()
62+
denom_data.dropna(inplace=True) # drop rows with any missing entries
63+
64+
return denom_data
65+
66+
def load_covid_data(covid_filepath, dropdate, base_geo):
67+
"""Load in and set up denominator data.
68+
69+
Args:
70+
covid_filepath: path to the aggregated covid data
71+
dropdate: data drop date (datetime object)
72+
base_geo: base geographic unit before aggregation ('fips')
73+
74+
Returns:
75+
cleaned denominator dataframe
76+
"""
77+
assert base_geo == "fips", "base unit must be 'fips'"
78+
79+
covid_suffix = covid_filepath.split("/")[-1].split(".")[0][9:]
80+
assert covid_suffix == "Covid_Outpatients_By_County"
81+
covid_filetype = covid_filepath.split("/")[-1].split(".")[1]
82+
assert covid_filetype == "dat"
83+
84+
covid_data = pd.read_csv(
85+
covid_filepath,
86+
sep="|",
87+
header=None,
88+
names=Config.COVID_COLS,
89+
dtype=Config.COVID_DTYPES,
90+
parse_dates=[Config.DATE_COL]
91+
)
92+
93+
covid_data[Config.DATE_COL] = \
94+
pd.to_datetime(covid_data[Config.DATE_COL],errors="coerce")
95+
96+
# restrict to start and end date
97+
covid_data = covid_data[
98+
(covid_data[Config.DATE_COL] >= Config.FIRST_DATA_DATE) &
99+
(covid_data[Config.DATE_COL] < dropdate)
100+
]
101+
102+
# counts between 1 and 3 are coded as "3 or less", we convert to 1
103+
covid_data[Config.COVID_COL][
104+
covid_data[Config.COVID_COL] == "3 or less"
105+
] = "1"
106+
covid_data[Config.COVID_COL] = covid_data[Config.COVID_COL].astype(int)
107+
108+
assert (
109+
(covid_data[Config.COVID_COL] >= 0).all().all()
110+
), "COVID counts must be nonnegative"
111+
112+
# aggregate age groups (so data is unique by date and base geography)
113+
covid_data = covid_data.groupby([base_geo, Config.DATE_COL]).sum()
114+
covid_data.dropna(inplace=True) # drop rows with any missing entries
115+
116+
return covid_data
117+
118+
119+
def load_combined_data(denom_filepath, covid_filepath, dropdate, base_geo):
120+
"""Load in denominator and covid data, and combine them.
121+
122+
Args:
123+
denom_filepath: path to the aggregated denominator data
124+
covid_filepath: path to the aggregated covid data
125+
dropdate: data drop date (datetime object)
126+
base_geo: base geographic unit before aggregation ('fips')
127+
128+
Returns:
129+
combined multiindexed dataframe, index 0 is geo_base, index 1 is date
130+
"""
131+
assert base_geo == "fips", "base unit must be 'fips'"
132+
133+
# load each data stream
134+
denom_data = load_denom_data(denom_filepath, dropdate, base_geo)
135+
covid_data = load_covid_data(covid_filepath, dropdate, base_geo)
136+
137+
# merge data
138+
data = denom_data.merge(covid_data, how="outer", left_index=True, right_index=True)
139+
assert data.isna().all(axis=1).sum() == 0, "entire row is NA after merge"
140+
141+
# calculate combined numerator and denominator
142+
data.fillna(0, inplace=True)
143+
data["num"] = data[Config.COVID_COL]
144+
data["den"] = data[Config.DENOM_COL]
145+
data = data[["num", "den"]]
146+
147+
return data

0 commit comments

Comments
 (0)