Skip to content
Arunav Gupta edited this page Jun 1, 2024 · 5 revisions

Metroscore SDK

This section of the wiki describes how to use the metroscore python package to analyze transit health in a city.

Prerequisites:

  1. GTFS data for your chosen city
  2. Python 3.11
  3. Metroscore installed as per README.md

Concepts

Metroscore helps one compute transit health scores as a function of 3 attributes:

  • Location (lat/lon coordinates)
  • Time of day
  • Travel duration (also called "cutoff")

Roughly speaking, the pointwise metroscore for a given combination of these three parameters can be interpreted as the probability one will utilize public transit for their trip from that location, starting at that time of day, and for trips of "cutoff" length.

To calculate transit health scores, Metroscore builds three polygons:

  1. Transit Drive-Time Coverage (TDTC): This is the polygon covering the total area a traveler can access with both car-based modes and public transit modes.
  2. Transit Bonus (TB): This is the polygon covering the total area a traveler can access with only public transit, and not car-based modes.
  3. Drive-time coverage (D): This is the polygon covering the total area a traveler can access with car-based modes. Used as a normalization factor.

The TDTC and TB are used to compute the metroscore as follows:

$$Metroscore = \frac{Area(TDTC) + C \cdot Area(TB)}{Area(D)}$$

Where $C$ is a user-configurable scaling factor (defaults to 2.0). More information on the details of the algorithm can be found in the whitepaper.

The Metroscore class

To start an analysis, the Metroscore object is initialized:

m = Metroscore(name="New York City", C=2.0)
  • name: This specifies the place of the analysis. Used for naming/logging purposes and to fetch road networks.
  • C: Scaling factor used in metroscore calculation. See above for explanation.

Returns: instance of Metroscore object with configured m._drive_graph object.

Next, the transit network is built:

m = m.build(bus=bus_feed_path, metro=train_feed_path, **kwargs)
  • bus: Path to the GTFS feed for buses
  • metro: Path to the GTFS feed for metro/rail-based transit
  • **kwargs: If the user would like to specify any additional transit modes, they may do so here.

Returns: instance of Metroscore object with configured m._transit_graph object.

Why separate building of transit and drive graphs? This allows the user to try out different types of GTFS sources while using the underlying road network for a city. Graphbuild is expensive, so we want to avoid re-doing work wherever possible.

Next, the metroscores are computed:

results = m.score(locations=[], time_of_days=[], cutoffs=[], overwrite=False)
  • locations: list of (lat, lon) coordinates to use as test locations
  • time_of_days: list of times (in seconds after midnight) to use as departure times
  • cutoffs: list of times (in seconds) to use as travel times
  • overwrite: Boolean used to determine if results should overwrite existing ones. If not, new results will be appended to any existing results.

Returns: pandas DataFrame with 4 columns: location, time_of_day, cutoff, and metroscore.

This method returns the actual metroscores. Internally, a version of results which is multi-indexed by location, time_of_day, and cutoff is also saved (this internal version also contains the actual TDTC, TB, and D polygons for each row).

Some helper functions to get results also exist:

single_result = m.get_score(location, time_of_day, cutoff)

Returns a single metroscore given a location, time of day, and travel duration.

all_results = m.list_scores(locations, time_of_days, cutoffs)

Returns all metroscores with matching locations, time_of_days, and cutoffs. If nothing is specified, returns all results (without re-computing scores).

Slicing

Metroscore also supports "slicing" across any of the given query dimensions.

sliced_results = m.slice(by="location", agg="mean")
  • by: Supports "location", "time_of_day", "cutoff", or "all".
  • agg: Supports "mean", "max", "min", and "median". Aggregation function to use across the other dimensions.

Returns scores, aggregated across all possible values in the by column. For example, aggregating across "time_of_day" will produce a pandas Series where the index is all available times of day, and the values are the [average/min/max/median] of the values for each time_of_day. This can be a powerful way to understand how metroscore changes due to various spatio-temporal factors. If by="all", then this function will return a single float, which can be interpreted as the city-wide metroscore.

Interventions

Coming soon!

Clone this wiki locally