-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update routes for specific dates #638
base: master
Are you sure you want to change the base?
Conversation
… scrape archiving routes
…gle archived route
… need to supply own api key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this on! I think you're trying to do too much in this PR. My goal for #626 was for save_routes.py
to take in a date argument (right now it always uses today's date) and based on that date, to look up the appropriate GTFS file (which is already there; a different script will get them from transitfeeds; for now you'd test by manually downloading older GTFS files and renaming them).
More specifically, you would:
- allow save_routes to take in a date argument (optional, otherwise uses today's date and behaves the same as it currently does, which is to fetch the latest GTFS and use that).
- If a date is provided, save_routes passes the filepath to the proper GTFS (ie. the latest GTFS that is at or before the date provided) to GtfsScraper (let's use
gtfs_path
instead ofarchiving_url
). If gtfs_path is None (it's an optional argument) then it downloads the latest GTFS as it already does. - Right now the GTFS is stored as
gtfs-{agency}.zip
. With the date, a possible format could begtfs-{agency}_YYYY-MM-DD.zip
. For now you would create those files manually as to test your PR. The date of the GTFS will be its start date (which is the version_date in TransitFeeds). - When the date was passed in, the output will be stored in a a folder named the same way you've already implemented.
Each of the points below will be its own issue and PR (I'll create separate issues for these), which you're free to take on (they'll be marked as Good First Issue).
- Archived versions of GTFS files will be stored in S3 as to not have to rely on TransitFeeds each time we might want to re-make the route-configs.
gtfs_path
will turn from a local path to an S3 path that you'll have to fetch. - Write a script that, given a date and agency, downloads the appropriate GTFS from TransitFeeds and uploads it to S3. It doesn't download anything from TransitFeeds if the GTFS is already in S3. This script would be deployed to run before save_routes does.
- Have save_routes get the appropriate GTFS from S3 instead of downloading the latest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Brian-Lee I'm really sorry for the delay in reviewing this, I'll be faster in the follow-ups. Getting closer, I've added some comments. Request a review from me in the PR whenever you're ready. Thanks!
No worries! Thanks for helping me! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering, could you, in the PR description, write out a list of what you did to test your PR? A section on Tests is part of the PR template (not completely sure), so that's missing.
It should look something like this:
Tests:
Command: python3 save_routes.py --agency trimet
...
Command: python3 save_routes.py --agency trimet --date 2000-02-02
(should give error as we don't have a GTFS from then)
...
Command: python3 save_routes.py --agency trimet --date 2020-02-22
... (saves route_v3a_trimet_2020-02-22...)
self.agency = agency | ||
self.agency_id = agency_id = agency.id | ||
gtfs_cache_dir = f'{util.get_data_dir()}/gtfs-{agency_id}' | ||
|
||
download_gtfs_data(agency, gtfs_cache_dir) | ||
get_gtfs_data(agency, gtfs_cache_dir, gtfs_path=gtfs_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For say trimet, the GTFS directory ends up being:
/data/gtfs-trimet/gtfs-trimet-2020-02-22 instead of /data/gtfs-trimet_2020-02-22 (for a sample date).
Then, in line 127 the GTFS that's passed in ends up being the one in gtfs_cache_dir, so gtfs_path ends up not being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an if-else which should solve that. I think there are cleaner ways, and I haven't yet done a good job of testing this fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to re-request a review at this point. Thank you so much for looking through this over and over again!
Great suggestion. I put in a few hypothetical test cases. The actual results depend on the date and the contents of your directory. |
Eddy, does this look good at this point? |
Fixes #626
Proposed changes
Get the feed from transitfeeds (https://api.transitfeeds.com/v1/getFeedVersions?key={your-key}3&feed=sfmta%2F60&page=1&limit=10&err=1&warn=1)
Use get the last 10 (changeable) previous versions of the GTFS files.
Use those files in save_routes.py to save routes to subfolders in /backend/data.
Subfolders and files have date suffixs.
These older versions of route JSON files can then be used for historical queries.
...
Screenshot
...
Tests:
Command: python3 save_routes.py --agency trimet
(saves route_v3a_trimet_2020-MM-DD) where MM-DD is today's date)
...
Command: python3 save_routes.py --agency trimet --date 2000-02-02
(gives error "an active GTFS for this date was not found
" as we don't have a GTFS file qualifying for 2002-02-02)
...
Command: python3 save_routes.py --agency muni --date 2020-02-22
(saves routes_v3a_muni-2020-02-18 as gtfs-muni-2020-02-18.zip is the recentmost qualifying GTFS)
...
...