Feature Request: Published data products should patch in earlier data if it's missing #1225
Open
2 of 5 tasks
Labels
feature request
Issues to request new features
open-data
Work related to publishing, ingesting open data
Where does your feature apply?
Select from the below, and be sure to affix the appropriate label to this issue (e.g.
dataset
,jupyterhub
,metabase
,analysis.calitp.org
)Is your feature request related to a problem? Please describe.
Our single day snapshots that support our analytics pipeline can be subject to missing operators. This is expected, as day to day, feeds can be missing for a short period and come back soon thereafter. For users, this can prove to be frustrating as operators appear and disappear.
Describe the solution you'd like
We'll keep our analytics pipeline as is, pulling the single day and running it through. Except, let's add 2 things to help us fill in the blanks:
schedule_gtfs_dataset_name
and (last available)analysis_date
. use this to check to see if we're missing anyone...and if we are, we can pull from an earlier cached date of the processed results.dataset_name_date
, and now we'd have a version that isdataset_name_date(patched)
.Describe alternatives you've considered
We want to consider the following points:
shared_utils/rt_dates
as the list of all dates we support with all the intermediate outputs ingtfs_analytics_data.yaml
saved.gtfs_analytics_data.yml
data catalog is to know which dates are fully supported across all the analytics work, and that we can combine all those sources easily for a given dayAdditional context
The text was updated successfully, but these errors were encountered: