backfill data 2022-08-08 through 2022-10-08 into public bucket #27

lauriemerrell · 2022-10-10T22:37:15Z

Description

This PR:

Uploads the notebook I used to copy raw data from 5/19 - 10/9 (copies fully raw data from private bucket to public)
- TODO: I need to do 10/10, I turned on the actual copying in prod today but there are like 10 hours where the data is still only in the private bucket before I turned on writing to the public bucket
Refactors rt_daily_aggregation to be a typer command line app that is more oriented towards backfilling data within the buckets (with the deployment of Update scraper script and automate aggregation #11, I kind of feel like the original purpose of rt_daily_aggregation is no longer needed? And at the very least I think it's appropriate to have it call combine_daily_files from scrape_data so that we have only one source of truth for the aggregation code?)

Resolves #17

Type of change

Bug fix
New functionality
Documentation

How has this been tested?

I have already run everything in here to do the backfills today, hope it's ok that I did that unilaterally (did not change any existing data in prod!)

dcjohnson24

Looks good overall. rt_daily_aggregation is probably less relevant now, but we could keep it for access to a sort of local lambda. In that case though, the put operations on lines 84 and 100 would throw errors for the public bucket. This could be avoided by adding an optional save argument.

data_analysis/rt_daily_aggregations.py

dcjohnson24 · 2022-10-11T23:38:30Z

data_analysis/rt_daily_aggregations.py

-    end_date = "2022-08-07"
-
-    data, errors, hourly_summary = compute_hourly_totals(start_date, end_date)
+    app()


Running python data_analysis/rt_daily_aggregations.py from the project root will throw a ModuleNotFoundError: No module named 'scrape_data'. For this to work, you'll have to create a root level module that imports rt_daily_aggregations. You can create a module called application.py, for example, and then add

from data_analysis import rt_daily_aggregations rt_daily_aggregations.app()

to it. Then running python application.py from the project root will work.

Therefore, I would remove the if __name__ == '__main__' block because this isn't meant to be run from the command line.

I was running it as python3 -m data_analysis.rt_daily_aggregations arg arg arg from the root directory and it did work... Open to changes but this was convenient for me when I was actually using it to backfill. Does that make sense? Can also try to document this somewhere

I see. I didn't have the -m flag so the relative imports didn't work. Leaving it as is is fine, but it would be helpful to document the -m flag.

scrape_data/combine_daily_files.py

data_analysis/rt_daily_aggregations.py

lauriemerrell · 2022-10-12T03:12:47Z

Addressed first round of comments, will update to take a "save" parameter next time I have a chance to work on this. I didn't do that originally because I was trying to limit changes to the actual combine_daily_files.py, but I guess I can make more changes that don't have to be deployed to Lambda.

dcjohnson24

Thanks for these changes.

…aggregations backfill data 2022-08-08 through 2022-10-08 into public bucket

backfill data 2022-08-08 through 2022-10-08 into public bucket

505d622

lauriemerrell requested a review from dcjohnson24 October 10, 2022 22:37

lauriemerrell self-assigned this Oct 10, 2022

remove output cells and comment a little

f620a65

dcjohnson24 requested changes Oct 12, 2022

View reviewed changes

lauriemerrell added 2 commits October 11, 2022 21:46

remove unused imports

d543719

PR review comments: add help text

046a30e

dcjohnson24 approved these changes Oct 13, 2022

View reviewed changes

lauriemerrell added 2 commits October 16, 2022 21:24

add save parameter to combine_daily_files

a1627e4

actually add save parameter, make combine daily files return results

71eb59c

lauriemerrell merged commit 474d9ee into main Oct 17, 2022

lauriemerrell deleted the update-backfill-aggregations branch October 17, 2022 02:53

haileyplusplus pushed a commit to haileyplusplus/chn-ghost-buses that referenced this pull request Apr 1, 2024

Merge pull request chihacknight#27 from chihacknight/update-backfill-…

4ff77a1

…aggregations backfill data 2022-08-08 through 2022-10-08 into public bucket

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backfill data 2022-08-08 through 2022-10-08 into public bucket #27

backfill data 2022-08-08 through 2022-10-08 into public bucket #27

lauriemerrell commented Oct 10, 2022

dcjohnson24 left a comment

dcjohnson24 Oct 11, 2022

lauriemerrell Oct 12, 2022 •

edited

Loading

dcjohnson24 Oct 12, 2022

lauriemerrell commented Oct 12, 2022

dcjohnson24 left a comment

backfill data 2022-08-08 through 2022-10-08 into public bucket #27

backfill data 2022-08-08 through 2022-10-08 into public bucket #27

Conversation

lauriemerrell commented Oct 10, 2022

Description

Type of change

How has this been tested?

dcjohnson24 left a comment

Choose a reason for hiding this comment

dcjohnson24 Oct 11, 2022

Choose a reason for hiding this comment

lauriemerrell Oct 12, 2022 • edited Loading

Choose a reason for hiding this comment

dcjohnson24 Oct 12, 2022

Choose a reason for hiding this comment

lauriemerrell commented Oct 12, 2022

dcjohnson24 left a comment

Choose a reason for hiding this comment

lauriemerrell Oct 12, 2022 •

edited

Loading