Research Request - Suppress private datasets from being published #1220
Labels
open-data
Work related to publishing, ingesting open data
research request
Issues that serve as a request for research (summary and handoff)
Complete the below when receiving a research request, and continue to add to this issue as you receive additional details and produce deliverables. Be sure to also add the appropriate project-level label to this issue (eg gtfs-rt, DLA).
Research Question
Single sentence description: All our published analyses (datasets, data products, etc) should remove private datasets. We will still allow the feeds to go through data processing in our pipelines, and simply exclude them at the end. Add a function in
shared_utils
to handle the list we can include...eithergtfs_utils_v2
orpublish_utils
.Detailed description:
data-infra
GH issue -- PR has been merged inmart_transit_database.dim_gtfs_datasets
, where an additional column appears. We create a crosswalk ingtfs_funnel
and then bring that crosswalk in at the last stages of the analytics pipeline (when we add columns likecaltrans_district / ntd_id, etc
), so this would be the step we want to exclude private datasets.Update these references:
ca_hq_transit_areas
,ca_hq_transit_stops
ca_transit_routes
/ca_transit_stops
speeds_by_stop_segments
,speeds_by_route_timeofday
The text was updated successfully, but these errors were encountered: