Skip to content

Latest commit

 

History

History
103 lines (52 loc) · 7.9 KB

README.md

File metadata and controls

103 lines (52 loc) · 7.9 KB

transit-data-dashboard

A dashboard for tracking transit data coverage and updates.

Loading Data

Data come from a variety of sources, mostly GTFS feeds and the National Transit Database. GTFS feeds come primarily from GTFS Data Exchange and are processed and information about them is stored in a database. Metro areas come from NTD's UZAs, with geometries from the Census Bureau. That is described in more detail below. Though the data sets are linked in the database, it is generally not possible to find datasets that are pre-linked, more on that below as well. The general procedure is to load unlinked data and then link it after the load. For more of a tutorial format, see the DataLoading page on the wiki.

Agencies

You'll need to get the latest Agency_Information.xls, Service.xls and Agency_UZA.xls files from NTD. Save those as CSV, for instance using LibreOffice. The CSV dialect is not critical; the Python CSV module is very good at detecting the variant of CSV in use and adapting to it. For reference, I used LibreOffice 3.5.3.2 on Ubuntu 12.04. Save those two CSV files in the same directory and then run the loadAgenciesFromNtd.py in that same directory. It will create a record for each agency in the NTD data.

Metro Areas

Metro areas come from NTD's UZAs, with geometries from the Census Bureau. The load process is like this: when you load agencies to the database, one of the columns that is loaded is the agency's UZAs (actually, it's not a column but a relation, but that's beside the point). Using the mapAgenciesByUZAAndMerge will merge metro areas that share agencies and map each constituent agency to the appropriate metro. Running autoNameMetroAreas after that will clean up the names.

GTFS data

There are two ways to load GTFS data; generally we follow this one.

First, configure your application-context.xml with the updaters that you want (see the Configuration page on the wiki). Then, go to the admin console at /api/admin/index and click on 'Fetch GTFS'. Wait a while while all the feeds are downloaded. Many of them will automatically be matched to agencies.

This is the older way, which should still work, although the other way is recommended to ensure that subsequent updates are painless.

Once you have a JSON file produced by the otp_gtfs tools, you can load it to the database using the utils/loadFeedsToDashboard.py script. This script is used like so:

loadFeedsToDashboard.py input.json

It loads to a server running on localhost:9000. It parses the JSON file, reformats it slightly for use in Dashboard, and then hits the API with a request to create a feed record for each feed in the file.

Linking data

Once you've loaded data, you'll want to link the different datasets together to make them more useful. The program has several tools, which must be used in order. The tool names are not links, because in some cases they are destructive. You'll have to copy and paste the URL. Be aware that many of these mappers were written before the automated review framework was created; many of them don't flag problems they find except in their HTML output.

mapFeedsToAgenciesByUrl - /api/mapper/mapFeedsToAgenciesByUrl:

This tool parses the URLs on both the feeds and the agencies and tries to map between them. You'll see a report of what matched. This should not be needed anymore, because we attempt to do this when loading feeds via updaters, but it could be useful if you loaded GTFS before NTD.

mapFeedsWithNoAgencies - /api/mapper/mapFeedsWithNoAgencies:

This tool takes all of the feeds with no agencies and creates agencies for them based on information in the GTFS. When a new agency is created, it will be: a) assigned to a metro area if it overlaps only one, b) assigned to a merged metro area if it overlaps several but only one has transit, c) assigned to a new metro area if it overlaps none, and flagged for review in the admin interface if it overlaps several with transit (to prevent intercity feeds from merging more than is wise).

clearAllAgencyFeedMappings - /api/mapper/clearAllAgencyFeedMappings:

DANGER! Clears all agency to feed mappings, regardless of whether the mapper created them.

mapAgenciesToMetroAreasSpatially - /api/mapper/mapAgenciesToMetroAreasSpatially:

This tries to set the metro area for each agency that has GTFS feeds by looking at what metro area (if any) the GTFS feeds lie in. Generally, mapAgenciesByUzaAndMerge is used nowadays.

clearAllAgencyMetroAreaMappings - /api/mapper/clearAllAgencyMetroAreaMappings:

DANGER! Clears all agency to metro area mappings, regardless of how they were created.

autoNameMetroAreas - /api/mapper/autoNameMetroAreas:

Attempt to auto name metro areas based on the agencies in them. DANGER! Will overwrite any already-defined names.

mapAgenciesByUZAAndMerge - /api/mapper/mapAgenciesByUZAAndMerge

Assign transit agencies to metro areas by their UZAs, then merge metro areas that share agencies. You must pass a ?commit=true or ?commit=false to this mapper; it is destructive and, since it is mapping based on free form text, a bit tricky. It is recommended that it be run with ?commit=false and its output reviewed before being run with ?commit=true

moveFeedsRemoveAgency - /api/mapper/moveFeedsRemoveAgency?from=<id>&to=<id>

This moves the feeds for the agnecy specified by from to the agency specified by to. It does not remove the initial agency or even disable it; that could be improved upon.

removeMetroAreasWithNoAgencies - /api/mapper/removeMetroAreasWithNoAgencies

This removes all metros with no transit. I recommend you do not do this, because future transit agencies that release data may need these metros.

createGtfsBundles - /api/mapper/createGtfsBundles?metro.id=<id>

This creates an OTP GTFS bundles section. It is deprecated because it does not use the deployment planner but instead uses a very simple algorithm.

recalculateFeedStats - /api/mapper/recalculateFeedStats

This recalculates the feed statistics for every feed.

createDeletedMetroAreas

This attempts to recreate deleted Google metro area placeholders based on the Google agencies in the file. It is not used very often, because it is very specialized. Generally, using database backups and transactions can avoid the need to use this one.

calculateFeedStats?storedId=id

This calculates statistics for the feed in storage described by ID (note that this is the ID in storage, not the ID of the GtfsFeed in the database; this is so that one can diagnose load crashes. It renders the statistics on the feed as JSON; these statistics are what drive the deployment planner (start date, end date, &c.).

setGoogleGtfsFromParse?name=...&areaName=...&lat=...&lon=...

Attempts to match an agency from the Google Transit text page to a metro area based on its location and name, using Postgres full-text search tools. It is used by loadGoogleTransit.js and should generally not be used directly.

connectFeedsAndAgencies?feed=...&feed=...&agency=...&agency=... . . .

This is used by /admin/mapfeeds.html, the bulk feed mapper, to connect agencies together.

Many of these tasks can also be performed in a one-by-one fashion using the admin interface (see the wiki).

Serving map tiles

You need to install TileLite to serve the map tiles. Then, you can run liteserv.py -p 8001 path/to/app/dir/tiles/tiles.xml. You'll also need to edit public/javascripts/client.js to find the map.

Using the public web interface

Once you've loaded and linked your data, you can use the dashboard. Load up [http://localhost:9000/] in your browser. There you will see the map on the first tab, showing the extent of your metro areas. On the next tab, the data tab, you will see a list of all the NtdAgencies you have loaded. You can sort them by clicking the column heads (click a second time to sort descending).