-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backfill data 2022-08-08 through 2022-10-08 into public bucket #27
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. rt_daily_aggregation
is probably less relevant now, but we could keep it for access to a sort of local lambda. In that case though, the put
operations on lines 84 and 100 would throw errors for the public bucket. This could be avoided by adding an optional save
argument.
end_date = "2022-08-07" | ||
|
||
data, errors, hourly_summary = compute_hourly_totals(start_date, end_date) | ||
app() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running python data_analysis/rt_daily_aggregations.py
from the project root will throw a ModuleNotFoundError: No module named 'scrape_data'
. For this to work, you'll have to create a root level module that imports rt_daily_aggregations
. You can create a module called application.py
, for example, and then add
from data_analysis import rt_daily_aggregations
rt_daily_aggregations.app()
to it. Then running python application.py
from the project root will work.
Therefore, I would remove the if __name__ == '__main__'
block because this isn't meant to be run from the command line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was running it as python3 -m data_analysis.rt_daily_aggregations arg arg arg
from the root directory and it did work... Open to changes but this was convenient for me when I was actually using it to backfill. Does that make sense? Can also try to document this somewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I didn't have the -m
flag so the relative imports didn't work. Leaving it as is is fine, but it would be helpful to document the -m
flag.
Addressed first round of comments, will update to take a "save" parameter next time I have a chance to work on this. I didn't do that originally because I was trying to limit changes to the actual |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for these changes.
…aggregations backfill data 2022-08-08 through 2022-10-08 into public bucket
Description
This PR:
rt_daily_aggregation
to be a typer command line app that is more oriented towards backfilling data within the buckets (with the deployment of Update scraper script and automate aggregation #11, I kind of feel like the original purpose ofrt_daily_aggregation
is no longer needed? And at the very least I think it's appropriate to have it callcombine_daily_files
fromscrape_data
so that we have only one source of truth for the aggregation code?)Resolves #17
Type of change
How has this been tested?
I have already run everything in here to do the backfills today, hope it's ok that I did that unilaterally (did not change any existing data in prod!)