From 0f5bfee6e3c369fff6ce7927d09bd9e73a7c3042 Mon Sep 17 00:00:00 2001 From: r-leyshon Date: Tue, 9 Jul 2024 10:31:30 +0100 Subject: [PATCH] chore: Rm gtfs tutorial from docs Prior to removal, checked that tutorial would render correctly by replacing local GTFS module import with assess-gtfs from PYPI install. Renders fine, though content is duplicated in assess-gtfs package, so removed from here. --- docs/tutorials/analyse_network/index.qmd | 2 +- docs/tutorials/gtfs/index.qmd | 631 ----------------------- docs/tutorials/metrics/index.qmd | 2 +- docs/tutorials/osm/index.qmd | 2 +- 4 files changed, 3 insertions(+), 634 deletions(-) delete mode 100644 docs/tutorials/gtfs/index.qmd diff --git a/docs/tutorials/analyse_network/index.qmd b/docs/tutorials/analyse_network/index.qmd index 8ed36bb4..4194c88d 100644 --- a/docs/tutorials/analyse_network/index.qmd +++ b/docs/tutorials/analyse_network/index.qmd @@ -1,5 +1,5 @@ --- -title: "5. Analyse Network" +title: "4. Analyse Network" description: Learn how to use the `transport_performance.analyse_network` module through examples. date-modified: 05/17/2024 # must be in MM/DD/YYYY format categories: ["Tutorial"] # see https://diataxis.fr/tutorials-how-to/#tutorials-how-to, delete as appropriate diff --git a/docs/tutorials/gtfs/index.qmd b/docs/tutorials/gtfs/index.qmd deleted file mode 100644 index 2483352a..00000000 --- a/docs/tutorials/gtfs/index.qmd +++ /dev/null @@ -1,631 +0,0 @@ ---- -title: "3. GTFS" -description: Learn how to inspect, validate, and clean General Transit Feed Specification inputs using the `transport_performance.gtfs` module. -date-modified: 06/06/2024 # must be in MM/DD/YYYY format -categories: ["Tutorial"] # see https://diataxis.fr/tutorials-how-to/#tutorials-how-to, delete as appropriate -toc: true -date-format: iso ---- - -## Introduction - -### Outcomes - -In this tutorial we will learn how to validate and clean -[General Transit Feed Specification (GTFS)](https://gtfs.org/schedule/) feeds. -This is an important step to ensure quality in the inputs and reduce the -computational cost of routing operations. - -While working towards this outcome, we will: - -* Download open source GTFS data. -* Carry out some basic checks across the entire GTFS feed. -* Visualise the GTFS feed's stop locations on an interactive map. -* Filter the GTFS feed to a specific bounding box. -* Filter the GTFS feed to a specific date range. -* Check if our filter operations have resulted in an empty feed. -* Reverse-engineer a calendar.txt if it is missing. -* Create summary tables of routes and trips in the feed. -* Attempt to clean the feed. -* Write the filtered feed out to file. - -### Requirements - -To complete this tutorial, you will need: - -* python 3.9 -* Stable internet connection -* Installed the `transport_performance` package (see the -[getting started explanation](/docs/getting_started/index.qmd) for help) -* The following requirements: - -```{.abc filename=requirements.txt} -geopandas -plotly -shapely -. # ensure transport_performance is installed - -``` - -## Working With GTFS - -Let's import the necessary dependencies: - -```{python} -import datetime -import os -import pathlib -import subprocess -import tempfile - -import geopandas as gpd -import plotly.express as px -from shapely.geometry import Polygon - -from transport_performance.gtfs.multi_validation import MultiGtfsInstance - -``` - -We require a source of public transit schedule data in GTFS format. The French -government publish all of their data, along with may useful validation tools to -the website [transport.data.gouv.fr](https://transport.data.gouv.fr/datasets/). - -:::{.panel-tabset} - -### Task - -Searching through this site for various regions and data types, you may be able -to find an example of GTFS for an area of interest. Make a note of the -transport modality of your GTFS, is it bus, rail or something else? - -You may wish to manually download at least one GTFS feed and store somewhere in -your file system. Alternatively you may programmatically download the data, as -in the solution here. - -### Hint - -```{python} -#| eval: false -BUS_URL = "" -RAIL_URL = "" - -BUS_PTH = "" -RAIL_PTH = "" - -subprocess.run(["curl", BUS_URL, "-o", BUS_PTH]) -subprocess.run(["curl", RAIL_URL, "-o", RAIL_PTH]) - -``` - -### Solution - -```{python} -BUS_URL = "https://tsvc.pilote4.cityway.fr/api/Export/v1/GetExportedDataFile?ExportFormat=Gtfs&OperatorCode=RTM" -RAIL_URL = "https://eu.ftp.opendatasoft.com/sncf/gtfs/export-intercites-gtfs-last.zip" -# using tmp for tutorial but not necessary -tmp_path = tempfile.TemporaryDirectory() -bus_path = os.path.join(tmp_path.name, "rtm_gtfs.zip") -rail_path = os.path.join(tmp_path.name, "intercity_rail_gtfs.zip") -subprocess.run(["curl", BUS_URL, "-o", bus_path]) -subprocess.run(["curl", RAIL_URL, "-o", rail_path]) - -``` - -::: - -Now that we have ingested the GTFS feed(s), you may wish to open the files up -on your file system and inspect the contents. GTFS feeds come in compressed -formats and contain multiple text files. These files can be read together, a -bit like a relational database, to produce a feed object that is useful when -undertaking routing with public transport modalities. - -To do this, we will need to use a class from the `transport_performance` -package called `MultiGtfsInstance`. Take a look at the -`MultiGtfsInstance` API documentation for full details on -how this class works. You may wish to keep this page open for reference in -later tasks. - -`MultiGtfsInstance`; as the name sounds; can cope with multiple GTFS feeds at a -time. If you have chosen to download several feeds, then point the `path` -parameter at a directory that contains all of the feeds. If you have chosen to -download a single feed, then you may pass the full path to the feed. - -:::{.panel-tabset} - -### Task - -Instantiate a `feed` object by pointing the `MultiGtfsInstance` class at a path -to the GTFS feed(s) that you have downloaded. Once you have successfully -instantiated `feed`, inspect the correct attribute in order to confirm the -number of separate feeds instances contained within it. - -### Hint - -```{python} -#| eval: false -gtfs_pth = "" -feed = MultiGtfsInstance(path=gtfs_pth) -print(len(feed.)) - -``` - -### Solution - -```{python} -gtfs_pth = pathlib.Path(tmp_path.name) # need to use pathlib for tmp_path -feed = MultiGtfsInstance(path=gtfs_pth) -print(f"There are {len(feed.instances)} feed instances") -``` - -::: - -Each individual feed can be accessed separately. Their contents should confirm -their file contents on disk. The -`GtfsInstance` api documentation can be used to view the -methods and attributes available for the following task. - -:::{.panel-tabset} - -### Task - -By accessing the appropriate attribute, print out the first 5 stops of the -first instance within the `feed` object. - -### Hint - -```{python} -#| eval: false -feed.[0].feed.stops.(5) -``` - -These records will match the contents of the stops.txt file within the feed -that you downloaded. - -*** - -### Solution - -```{python} -feed.instances[0].feed.stops.head(5) -``` - -::: - -## Checking Validity - -Transport routing operations require services that run upon a specified date. -It is a useful sanity check to confirm that the dates that you expect to -perform routing on exist within the GTFS feed. To do this, we can use the -`get_dates()` method to print out the first and last date in the available date -range, as below. - -```{python} -s0, e0 = feed.get_dates() -print(f"Feed starts at: {s0}\nFeed ends at: {e0}") -``` - -:::{.panel-tabset} - -### Task - -How can we have this method print out the full list of dates available within -the feed? - -### Hint - -Examine the `MultiGtfsInstance` docstring printed earlier and find the name of -the parameter that controls the behaviour of `get_dates()`. - -### Solution - -:::{.scrolling} - -```{python} -feed.get_dates(return_range=False) - -``` - -::: - -::: - -Openly published GTFS feeds from a variety of different providers have varying -degrees of quality and not all feeds strictly adhere to the defined -specification for this type of data. When working with new sources of GTFS, it -is advisable to investigate the types of errors or warnings associated with -your particular feed(s). - -:::{.panel-tabset} - -### Task - -Check if the feed you've instantiated contains valid GTFS. - -### Hint - -Check the docstring we printed earlier for an appropriate method. Depending -upon the source of the GTFS, you may need to ensure that checking of fast -travel is switched off. To turn off fast travel checks, pass the following -dictionary to the method's appropriate argument: `{"far_stops": False}`. See -the open issue [#183](https://github.com/datasciencecampus/transport-network-performance/issues/183). - -### Solution - -:::{.scrolling} - -```{python} -feed.is_valid(validation_kwargs={"far_stops": False}) -``` - -::: - -::: - -Note that it is common to come across multiple warnings when working with GTFS. -Many providers include additional columns that are not part of the GTFS. This -typically poses no problem when using the feed for routing operations. - -In certain feeds, you may notice errors flagged due to unrecognised route -types. This is because certain providers publish feeds that conform to -[Google's proposed GTFS extension](https://developers.google.com/transit/gtfs/reference/extended-route-types). -Although flagged as an error, valid codes associated with the proposed -extension typically do not cause problems with routing operations. - -## Viz Stops - -A sensible check when working with GTFS for an area of interest, is to -visualise the stop locations of your feed. - -:::{.panel-tabset} - -### Task - -By accessing an appropriate method for your feed, plot the stop locations on an -interactive folium map. - -### Hint - -Inspect the `MultiGtfsInstance` docstring for the appropriate method. - -```{python} -#| eval: false -feed.viz_...() -``` - -### Solution - -```{python} -feed.viz_stops() -``` - -::: - -By inspecting the location of the stops, you can visually assess that they -concur with the road network depicted on the folium basemap. - -## Filtering GTFS - -Cropping GTFS feeds can help optimise routing procedures. GTFS feeds can often -be much larger than needed for smaller, more constrained routing operations. -Holding an entire GTFS in memory may be unnecessary and burdensome. In this -section, we will crop our feeds in two ways: - -* Spatially by restricting the feed to a specified bounding box. -* Temporally by providing a date (or list of dates). - -Before undertaking the filter operations, examine the size of our feed on disk: - -```{python} -out = subprocess.run( - ["du", "-sh", tmp_path.name], capture_output=True, text=True) -size_out = out.stdout.strip().split("\t")[0] -print(f"Unfiltered feed is: {size_out}") -``` - -### By Bounding Box - -To help understand the requirements for spatially cropping a feed, inspect the -[API documentation for the `filter_to_bbox()`](../../reference/multi_validation.qmd#transport_performance.gtfs.multi_validation.MultiGtfsInstance.filter_to_bbox) -method. - -To perform this crop, we need to get a bounding box. This could be: - -- The boundary of the urban centre calculated using the `transport_performance.urban_centres` module. Or -- Any boundary from an open service such as [klokantech](https://boundingbox.klokantech.com/) in csv format. - -The bounding box should be in EPSG:4326 projection (longitude & latitude). - -Below I define a bounding box and visualise for context. Feel free to update -the code with your own bounding box values. - -```{python} -BBOX = [4.932916,43.121441,5.644253,43.546931] # crop around Marseille -xmin, ymin, xmax, ymax = BBOX -poly = Polygon(((xmin,ymin), (xmin,ymax), (xmax,ymax), (xmax,ymin))) -poly_gdf = gpd.GeoDataFrame({"geometry": poly}, crs=4326, index=[0]) -poly_gdf.explore() - -``` - -:::{.panel-tabset} - -#### Task - -Crop your feed to the extent of your bounding box. - -#### Hint - -Pass the `BBOX` list in [xmin, ymin, xmax, ymax] order to the -`filter_to_bbox()` method. - -#### Solution - -```{python} -feed.filter_to_bbox(BBOX) -``` - -::: - -Notice that a progress bar confirms the number of successful filter operations -performed depending on the number of separate GTFS zip files passed to -`MultiGtfsInstance`. - -Below I plot the filtered feed stop locations in relation to the bounding box -used to restrict the feed's extent. - -```{python} -imap = feed.viz_stops() -poly_gdf.explore(m=imap) - -``` - -Although there should be fewer stops observed, you will likely observe that -stops outside of the bounding box you provided remain in the filtered feed. -This is to be expected, particularly where GTFS feeds contain long-haul -schedules that intersect with the bounding box that you provided. - -### By Date - -If the routing analysis you wish to perform takes place over a specific time -window, we can further reduce the GTFS data volume by restricting to dates. -To do this, we need to specify either a single datestring, or a list of -datestrings. The format of the date should be "YYYYMMDD". -```{python} -today = datetime.datetime.today().strftime(format="%Y%m%d") -print(f"The date this document was updated at would be passed as: {today}") -``` - - -:::{.panel-tabset} - -#### Task - -Filter your GTFS feed to a date or range of dates. - -#### Hint - -Pass either a single date string in "YYYYMMDD" format, or a list of datestrings -in this format, to the `filter_to_date` method. Print out the new start and end -dates of your feed by calling the `get_dates()` method once more. - -#### Solution - -```{python} -feed.filter_to_date(today) -print(f"Filtered GTFS feed to {today}") -``` - - -```{python} -s1, e1 = feed.get_dates() -print(f"After filtering to {today}\nstart date: {s1}\nend date: {e1}") -``` - -::: - -Notice that even if we specify a single date to restrict the feed to, -`filter_to_date()` may still return a range of dates. The filtering method -restricts the GTFS to stops, trips or shapes active upon the specified date(s). -If your GTFS contains trips/routes that are still active across a range of -dates including the date you wish to restrict to, you will return the full -range of that stop's dates. - -## Check Empty Feeds - -After performing the filter operations on GTFS, it is advisable to check in -case any of the filtered feeds are now empty. Empty feeds can cause errors when -undertaking routing analysis. Empty feeds can easily arise when filtering GTFS -to mismatched dates or bounding boxes. - -We check for empty feeds in the following way: - -```{python} -feed.validate_empty_feeds() -``` - -## Create Calendar - -Occasionally, providers will publish feeds that use a calendar-dates.txt file -rather than a calendar.txt. This is permitted within GTFS and is an alternative -approach to encoding the same sort of information about the feed timetable. - -However, missing calendar.txt files currently cause exceptions when attempting -to route with these feeds with r5py. To avoid this issue, we can use a -calendar-dates.txt to populate the required calendar.txt file. - -We can check whether any of our feed instances have no calendar file: - -```{python} -for i, inst in enumerate(feed.instances): - if inst.feed.calendar is None: - problem_ind = i -print(f"Feed instance {i} has no calendar.txt") - -``` - -:::{.panel-tabset} - -### Task - -If any of your feeds are missing calendars, ensure that these files are created -from the calendar-dates files. Once complete, print out the head of the -calendar table to ensure it is populated. - -### Hint - -Examine the `MultiGtfsInstance` docstring to find the appropriate method. -Access the calendar DataFrame attribute from the same feed and print the first -few rows. - -```{python} -#| eval: false -feed.() -print(feed.instances[].feed.calendar.head()) -``` - -### Solution - -```{python} -feed.ensure_populated_calendars() -``` - -```{python} -print("Newly populated calendar table:") -print(feed.instances[problem_ind].feed.calendar.head()) -``` - -::: - -## Trip and Route Summaries - -Now that we have ensured all GTFS instances have a calendar table, we can -calculate tables of counts and other summary statistics on the routes and trips -within the feed. - -:::{.panel-tabset} - -### Task - -Print 2 summary tables: - -1. Counts for **routes** on every **date** in the feed. -2. Statistics for **trips** on every **day** in the feed. - -### Hint - -Examine the docstring help for `MultiGtfsInstance`. Use the appropriate methods -to produce the summaries. - -1. Use the default behaviour to produce the first table. -2. Ensure that the appropriate method allowing stats for days of the week is -toggled to `True` for the trips summary. - -### Solution - -```{python} -feed.summarise_routes() -``` - - -```{python} -feed.summarise_trips(to_days=True) -``` - -::: - -From these summaries we can also create visualisations, such as a timeseries -plot of trip counts by route type and date: - -```{python} -# sort by route_type and date to order plot correctly -df = feed.summarise_trips().sort_values(["route_type", "date"]) -fig = px.line( - df, - x="date", - y="trip_count", - color="route_type", - title="Trip Counts by Route Type and Date Across All Input GTFS Feeds", -) - -# set y axis min to zero, improve y axis formatting, and overall font style -fig.update_yaxes(rangemode="tozero", tickformat=",.0f") -fig.update_layout( - font_family="Arial", - title_font_family="Arial", -) -fig.show() -``` - -Visualisations like this can be very helpful when reviewing the quality of the -input GTFS feeds and determining a suitable routing analysis date. - -## Clean Feed - -We can attempt to remove common issues with GTFS feeds by running the -`clean_feeds()` method. This may remove problems associated with trips, routes -or where the specification is violated. - -:::{.scrolling} - -```{python} -feed.clean_feeds() -feed.is_valid(validation_kwargs={"far_stops": False}) -``` - -::: - -## Write Filtered Feed - -Once we have finished the filter and cleaning operations, we can now go ahead -and write the feed out to disk, for use in future routing operations. - - -:::{.panel-tabset} - -### Task - -Write your filtered feed out to a new location on disk. Confirm that the size -of the filtered feed on disk is smaller than that of the original feed. - -### Hint - -1. Pass a string or a pathlike object to the `save_feeds()` method of -`MultiGtfsInstance`. -2. Once the feed is written successfully, check the disk usage of the new -filtered feed. - -### Solution - -```{python} -filtered_pth = os.path.join(tmp_path.name, "filtered_feed") -try: - os.mkdir(filtered_pth) -except FileExistsError: - pass -feed.save_feeds(filtered_pth, overwrite=True) -``` - -Check filtered file size - -```{python} -out = subprocess.run( - ["du", "-sh", filtered_pth], capture_output=True, text=True) -filtered_size = out.stdout.strip().split("\t")[0] -print(f"After filtering, feed size reduced from {size_out} to {filtered_size}") -``` - -::: - -## Conclusion - -Congratulations, you have successfully completed this tutorial. We have -successfully examined the features, errors and warnings within a GTFS feed. We -have also filtered the feed by bounding box and by date in order to reduce its -size on disk. - -To continue learning how to work with the `transport_performance` package, it -is suggested that you continue with the -[OpenStreetMap tutorials](/docs/tutorials/osm/index.qmd) - -For any problems encountered with this tutorial or the `transport_performance` -package, please open an issue on our [GitHub repository](https://github.com/datasciencecampus/transport-network-performance/issues). diff --git a/docs/tutorials/metrics/index.qmd b/docs/tutorials/metrics/index.qmd index af7f5cea..1e10230c 100644 --- a/docs/tutorials/metrics/index.qmd +++ b/docs/tutorials/metrics/index.qmd @@ -1,5 +1,5 @@ --- -title: "6. Metrics" +title: "5. Metrics" description: Learn how to use the `transport_performance.metrics` module through examples. date-modified: 05/17/2024 # must be in MM/DD/YYYY format categories: ["Tutorial"] # see https://diataxis.fr/tutorials-how-to/#tutorials-how-to, delete as appropriate diff --git a/docs/tutorials/osm/index.qmd b/docs/tutorials/osm/index.qmd index 769d9bab..c30b97d4 100644 --- a/docs/tutorials/osm/index.qmd +++ b/docs/tutorials/osm/index.qmd @@ -1,5 +1,5 @@ --- -title: "4. OSM" +title: "3. OSM" description: Learn how to use the `transport_performance.osm` module through examples. date-modified: 06/06/2024 # must be in MM/DD/YYYY format categories: ["Tutorial"] # see https://diataxis.fr/tutorials-how-to/#tutorials-how-to, delete as appropriate