To gather daily new case totals for each county in the state and calculate a set of 14 7-day rolling averages used to visualize how each county is doing.
This scraper hits two DSHS files each day. The first is DSHS's daily feed of cases by county, found here. The second is a general configuration file for DSHS that contains the last update date. If the update date is later than the last update within our trend file, the first file is used to update our file of 7-day averages.
scraper.py
: Set of functions to complete the daily update and repair files should a daily update be missed.service.py
: File run on AWS Lambda. Runs the daily update and uploads the resulting data file back to AWS S3.utils.py
: Simple function that handles the uploading of files to S3.zappa_settings.json
: Zappa configuration file containing project name, description runtime environment, and most importantly, schedule for the scraper to run.
Download the repository and run $ pipenv install --development
.
Copy the .env-example
file and rename it .env
. Add your own AWS_ACCESS_KEY
and AWS_SECRET_ACCESS_KEY
. For messaging to slack, you'll also need our SLACK_TOKEN
.
Other environment variables:
TREND_DATA_FILE
: url path of your current json data file. Results of scraper are output to this url as well.REPAIR_FILE
: url path of a copy of the current json file. This file is edited to backfill missing data and used to repair averages for missing periods of time.TARGET_BUCKET
: sub directory path to bucket where json file lives. Combined withROOT_BUCKET
url for complete file path of json data file.ROOT_BUCKET
: Root AWS bucket where data is stored. Combined withTARGET_BUCKET
to construct complete AWS file path to json file
This project uses zappa to upload and schedule the scraper to our AWS Lambda. After making changes to the scraper, run pipenv run zappa update
to push those changes to Lambda. Scheduling is handled via the zappa-settings.json
file. The events
key is an array of events objects. The service.handler
opject has an expression
key that can either take a schedule in cron format or a rate (rate(12 hours)
).
To run the scraper locally, run the following command in the command line:
$ pipenv run python service.py
Note, this will simulate a scheduled scraper run, so any files generated by this will be uploaded to S3. To run or test just the scraper locally, run:
$ pipenv run python scraper.py
$ pipenv run zappa update