Skip to content

Convert Google Symptoms pipeline to pull data from BigQuery #699

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 48 commits into from
Feb 10, 2021
Merged
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
563e9bb
modify pull_gs_data to pull data from bigquery
nmdefries Jan 14, 2021
e63920b
loop by year to fetch data from separate yearly tables
nmdefries Jan 15, 2021
4284199
create generic data-pull function with geolevel as arg
nmdefries Jan 15, 2021
fb39b72
hand args from params to pull_gs_data. remove base_url
nmdefries Jan 15, 2021
1d10dde
update error message
nmdefries Jan 15, 2021
46fa1f7
move column rename from preprocess to initial datapull
nmdefries Jan 15, 2021
908a6e2
add BigQuery credential fields to params
nmdefries Jan 15, 2021
b589c68
get tests running
nmdefries Jan 15, 2021
a14ae12
add tests for getting and formatting dates
nmdefries Jan 16, 2021
14ded5f
update data for preprocess tests
nmdefries Jan 19, 2021
b27fbbe
update pull tests. mocking wip
nmdefries Jan 19, 2021
c97ab97
add BigQuery credentials support. add exception catching for missing …
nmdefries Jan 25, 2021
7c57ab7
add path-to-json field in params
nmdefries Jan 25, 2021
f440431
move query formatting to separate function
nmdefries Jan 25, 2021
0e948cf
update README
nmdefries Jan 25, 2021
a62c8ce
test updates. move credentials to separate function
nmdefries Jan 26, 2021
692f3a8
get pull tests working
nmdefries Jan 26, 2021
da0491b
switch expected_date calculatin to use canonical pandas func
nmdefries Jan 26, 2021
e7a9cd8
have run_as_module create receiving dir if does not exist
nmdefries Jan 26, 2021
796308c
add comments in pull test
nmdefries Jan 26, 2021
2d0e18c
add mock to conftest::run_as_module to bypass API credentials
nmdefries Jan 26, 2021
d2ffe50
add test data for pull and smooth tests
nmdefries Jan 26, 2021
5b6d664
update test params
nmdefries Jan 26, 2021
b80ae78
read test data date column in as date
nmdefries Jan 26, 2021
9143855
add description of how to recreate test data
nmdefries Jan 26, 2021
a8d947e
remove unused test data
nmdefries Jan 26, 2021
ae82eaf
create receiving dir if doesn't exist when finding existing output files
nmdefries Jan 27, 2021
c3148b4
handle empty dataframes. print message about date range new data was …
nmdefries Jan 27, 2021
5246882
add message reporting dates retrieved
nmdefries Jan 27, 2021
38cf6bb
lint improvements
nmdefries Jan 27, 2021
bddc705
add tests for preprocess and pull empty df
nmdefries Jan 27, 2021
6d490e7
update tests to use reflect get_all_dates
nmdefries Jan 27, 2021
03d134d
add comment
nmdefries Jan 27, 2021
88be903
remove empty df check
nmdefries Jan 27, 2021
7376b2c
Revert "remove empty df check"
nmdefries Jan 27, 2021
1a8348b
Merge branch 'main' into gs-pull-from-bigquery
nmdefries Jan 27, 2021
6b17640
reduce number of data files
nmdefries Jan 28, 2021
1692a5c
switch to pull from new BigQuery tables
nmdefries Jan 29, 2021
fdd67ae
update tests to reflect func changes from new BQ tables
nmdefries Jan 29, 2021
c77c989
lint updates
nmdefries Jan 29, 2021
1e56761
mock empty test files
nmdefries Jan 30, 2021
c830b13
Merge branch 'main' into gs-pull-from-bigquery
nmdefries Jan 30, 2021
a2401b6
lint updates
nmdefries Jan 30, 2021
1629e47
suppress invalid unused-import lint error
nmdefries Feb 1, 2021
68fedd2
add ArchiveDiffer to makefile
nmdefries Feb 1, 2021
635fbbe
remove funcs that fetch missing dates from local files. update docstr…
nmdefries Feb 1, 2021
7d2542e
update tests to reflect new func structure
nmdefries Feb 1, 2021
8124f2d
set default num days to export to all since export start date
nmdefries Feb 1, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 11 additions & 10 deletions google_symptoms/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Google Symptoms

We import the normalized symptom search term popularity data from the Google
Research's Open COVID-19 Data project and export the county-level and state-level
data as-is. We also aggregate the data to the MSA and HRR levels. For detailed
We import the normalized symptom search term popularity data from the Google
Research's Open COVID-19 Data project via BigQuery and export the county-level and state-level
data as-is. We also aggregate the data to the MSA and HRR levels. For detailed
information see the files `DETAILS.md` contained in this directory.

## Running the Indicator
Expand All @@ -17,19 +17,20 @@ make install
```

This command will install the package in editable mode, so you can make changes that
will automatically propagate to the installed package.
will automatically propagate to the installed package.

All of the user-changable parameters are stored in `params.json`. To execute the module
All of the user-changable parameters are stored in `params.json`. You will need to
acquire a BigQuery API key with affiliated billing to fetch data. To execute the module
and produce the output datasets (by default, in `receiving`), run the following.

```
env/bin/python -m delphi_google_symptoms
```

If you want to enter the virtual environment in your shell,
you can run `source env/bin/activate`. Run `deactivate` to leave the virtual environment.
If you want to enter the virtual environment in your shell,
you can run `source env/bin/activate`. Run `deactivate` to leave the virtual environment.

Once you are finished, you can remove the virtual environment and
Once you are finished, you can remove the virtual environment and
params file with the following:

```
Expand Down Expand Up @@ -58,7 +59,7 @@ To run individual tests, run the following:
```

The output will show the number of unit tests that passed and failed, along
with the percentage of code covered by the tests.
with the percentage of code covered by the tests.

None of the linting or unit tests should fail, and the code lines that are not covered by unit tests should be small and
should not include critical sub-routines.
should not include critical sub-routines.
28 changes: 14 additions & 14 deletions google_symptoms/delphi_google_symptoms/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,12 @@
COMBINED_METRIC = "sum_anosmia_ageusia"
SMOOTHERS = ["raw", "smoothed"]
GEO_RESOLUTIONS = [
"state",
"county",
"msa",
"hrr",
"hhs",
"nation"
"state",
"county",
"msa",
"hrr",
"hhs",
"nation"
]

seven_day_moving_average = partial(kday_moving_average, k=7)
Expand All @@ -26,19 +26,19 @@
"smoothed": (seven_day_moving_average, lambda d: d),
}

STATE_TO_ABBREV = {'Alabama':'al',
STATE_TO_ABBREV = {'Alabama': 'al',
'Alaska': 'ak',
# 'American Samoa': 'as',
# 'American Samoa': 'as',
'Arizona': 'az',
'Arkansas': 'ar',
'California': 'ca',
'Colorado': 'co',
'Connecticut': 'ct',
'Delaware': 'de',
# 'District of Columbia': 'dc',
# 'District of Columbia': 'dc',
'Florida': 'fl',
'Georgia': 'ga',
# 'Guam': 'gu',
# 'Guam': 'gu',
'Hawaii': 'hi',
'Idaho': 'id',
'Illinois': 'il',
Expand All @@ -59,24 +59,24 @@
'Nevada': 'nv',
'New_Hampshire': 'nh',
'New_Jersey': 'nj',
'New_Mexico':'nm',
'New_Mexico': 'nm',
'New_York': 'ny',
'North_Carolina': 'nc',
'North_Dakota': 'nd',
# 'Northern Mariana Islands': 'mp',
# 'Northern Mariana Islands': 'mp',
'Ohio': 'oh',
'Oklahoma': 'ok',
'Oregon': 'or',
'Pennsylvania': 'pa',
# 'Puerto Rico': 'pr',
# 'Puerto Rico': 'pr',
'Rhode_Island': 'ri',
'South_Carolina': 'sc',
'South_Dakota': 'sd',
'Tennessee': 'tn',
'Texas': 'tx',
'Utah': 'ut',
'Vermont': 'vt',
# 'Virgin Islands': 'vi',
# 'Virgin Islands': 'vi',
'Virginia': 'va',
'Washington': 'wa',
'West_Virginia': 'wv',
Expand Down
Loading