Extract a subset of a GTFS feed
Extract a subset of a GTFS feed
The extract command extends the basic copy command with a number of additional options and transformations. It can be used to pull out a single route or trip, interpolate stop times, override a single value on an entity, etc. This is a separate command to keep the basic copy command simple while allowing the extract command to grow and add more features over time.
transitland extract [flags] <reader> <writer>
# Extract a single trip from the BART GTFS, and rename the agency to "test".
% transitland extract --extract-trip "3050453" --set "agency.txt,BART,agency_id,test" "https://www.bart.gov/dev/schedules/google_transit.zip" output2.zip
# Note renamed agency
% unzip -p output2.zip agency.txt
agency_id,agency_name,agency_url,agency_timezone,agency_lang,agency_phone,agency_fare_url,agency_email
test,Bay Area Rapid Transit,https://www.bart.gov/,America/Los_Angeles,,510-464-6000,,
# Only entities related to the specified trip are included in the output.
% unzip -p output2.zip trips.txt
route_id,service_id,trip_id,trip_headsign,trip_short_name,direction_id,block_id,shape_id,wheelchair_accessible,bikes_allowed
1,2020_09_14-DX-MVS-Weekday-15,3050453,San Francisco International Airport,,1,,01_shp,0,0
$ unzip -p output2.zip routes.txt
route_id,agency_id,route_short_name,route_long_name,route_desc,route_type,route_url,route_color,route_text_color,route_sort_order
1,test,YL-S,Antioch to SFIA/Millbrae,,1,http://www.bart.gov/schedules/bylineresults?route=1,FFFF33,,0
% unzip -p output2.zip stop_times.txt
trip_id,arrival_time,departure_time,stop_id,stop_sequence,stop_headsign,pickup_type,drop_off_type,shape_dist_traveled,timepoint
3050453,04:53:00,04:53:00,CONC,0,,0,0,0.00000,0
3050453,04:58:00,04:58:00,PHIL,2,,0,0,4.06000,0
3050453,05:01:00,05:02:00,WCRK,3,,0,0,5.77000,0
3050453,05:06:00,05:07:00,LAFY,4,,0,0,9.23000,0
3050453,05:11:00,05:12:00,ORIN,5,,0,0,12.99000,0
3050453,05:17:00,05:18:00,ROCK,6,,0,0,17.38000,0
...
--allow-entity-errors Allow entities with errors to be copied
--allow-reference-errors Allow entities with reference errors to be copied
--bbox string Extract bbox as (min lon, min lat, max lon, max lat), e.g. -122.276,37.794,-122.259,37.834
--create Create a basic database schema if none exists
--create-missing-shapes Create missing Shapes from Trip stop-to-stop geometries
--deduplicate-stop-times Deduplicate StopTimes using Journey Patterns
--error-limit int Max number of detailed errors per error group (default 10)
--exclude-agency stringArray Exclude Agency
--exclude-calendar stringArray Exclude Calendar
--exclude-route stringArray Exclude Route
--exclude-route-type stringArray Exclude Routes matching route_type
--exclude-stop stringArray Exclude Stop
--exclude-trip stringArray Exclude Trip
--ext stringArray Include GTFS Extension
--extract-agency stringArray Extract Agency
--extract-calendar stringArray Extract Calendar
--extract-route stringArray Extract Route
--extract-route-type stringArray Extract Routes matching route_type
--extract-stop stringArray Extract Stop
--extract-trip stringArray Extract Trip
--fvid int Specify FeedVersionID when writing to a database
-h, --help help for extract
--interpolate-stop-times Interpolate missing StopTime arrival/departure values
--normalize-service-ids Create any missing Calendar entities for CalendarDate service_id's
--normalize-timezones Normalize timezones and apply default stop timezones based on agency and parent stops
--prefix string Prefix entities in this feed
--set stringArray Set values on output; format is filename,id,key,value
--simplify-calendars Attempt to simplify CalendarDates into regular Calendars
--simplify-shapes float Simplify shapes with this tolerance (ex. 0.000005)
--use-basic-route-types Collapse extended route_type's into basic GTFS values
--write-extra-columns Include extra columns in output
--write-extra-files Copy additional files found in source to destination
- transitland - transitland-lib utilities