Skip to content
Matt Conway edited this page Sep 19, 2012 · 4 revisions

Loading data to the dashboard

This is the procedure I used to create the public-facing Transit Data Dashboard, using this software and data from a variety of places.

First off, you want to load several pieces of static data into your database. You'll want to load the National Transit Database by following the instructions in the README of this repository, and then load the Census Urban Areas also by following the instructions in the README. Then, you'll want to visit /api/mapper/mapAgenciesByUzaAndMerge?commit=true to merge connected urban areas.

Next, you want to set up your application-context.xml with an updater factory and a GTFS Data Exchange updater, like so:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN" "http://www.springsource.org/dtd/spring-beans-2.0.dtd">
<beans>
  <bean id="updaters" class="updaters.UpdaterFactory">
    <property name="storer">
      <bean class="updaters.S3FeedStorer" >
        <property name="accessKey" value="YOUR S3 ACCESS KEY" />
        <property name="secretKey" value="YOUR S3 SECRET KEY" />
        <property name="bucket" value="S3 BUCKET ID" />
      </bean>
    </property>

    <property name="updaters">
      <list>
        <bean class="updaters.GtfsDataExchangeUpdater" />
      </list>
    </property>

    <property name="hooks">
      <list>
        <bean class="updaters.LoggingUpdaterHook" />
        <bean class="updaters.DeploymentPlanGeneratorHook" />
      </list>
    </property>
  </bean>
</beans>

In some cases you will want to include feeds which are not on the Exchange. This is considered a temporary situation and, as such, the tools for dealing with it are a little bit strange. First off, you'll need to be using an S3FeedStorer as your feed storer (patches accepted). Put in the same bucket as your feeds a few other feeds you want to include (I'd recommend putting them in a subfolder). Then, add an S3WatcherUpdater to your updaters, with properties bucket, accessKey, secretKey and bucket. Also, add a list of strings watchedFiles that is the list of the file names you uploaded; they will be fetched, or refetched when they change. After you first load them, you can use the CRUD interface to set the info that we can't retrieve from a bare GTFS feed (as opposed to a data exchange record). This information will be propagated forward when the feed is updated.

Now, access /api/mapper/fetchGtfs. Very quickly, a JSON response will be returned saying {"status":"running"}, which means that the job has started. This will take quite some time, depending on the speed of your Internet and the system resources you have available (I was able to pull this off with an EC2 small instance running Ubuntu 12.04, with almost all of the RAM dedicated to Play! (play run -Xmx1524m or so).

Once that has completed, you can access the admin interface at /api/admin/index and figure out what to do about feeds that were not automatically mapped (also, /admin/mapfeeds.html will be useful, and /api/gtfsfeeds/feedsNoAgencies will too. This whole situation is in need of improvement; specifically, clicking on the unmapped agency button in the admin interface should do something).

Also, /api/mapper/mapFeedsWithNoAgencies will create agencies for all unmapped feeds, and attempt to give them metro areas. So, once you've manually matched everything that you can, run this and it will prepare your DB for the next step.

You'll then want to click 'Agencies matching multiple metro areas' and decide whether to split the agencies to all of the metro areas (i.e. all metro areas will contain the agency, but they will still be built as separate graphs and be otherwise separate) or to merge the metro areas. If you need something more complicated, you can merge them then split them, as described in the README.

/api/mapper/autonameMetroAreas should be used liberally throughout this process to ensure that metros have semireasonable names. Bear in mind that the names are autogenerated and the algorithm reflects the largest part of the metro, not necessarily the whole thing. For instance, in my instance, the Northeast Corridor of the United States is called something along the lines of New York, NY, when it extends from North Carolina to Massachusetts.

The hardest part is mapping the Google GTFS data, although there are a plethora of tools to help one with this. First, you'll want to load the Google GTFS data as described in Google Mapping. Watch it as it loads and take note of mismatched agencies (ignore unmatched ones for now). Then, go to the main admin page and first click on the 'unmatched metro areas' link. Proceed through the page as directed, matching each of Google's metro areas to one of ours. Keep an eye on the server WARN log output; it will tell you what additional agencies automatch, you should ensure it makes no mistakes.

Once all of the metros are matched (if you leave some unmatched, resulting agencies will also not match a metro), proceed to the admin screen again. Go to the 'unmatched GTFS providers' section and choose an agency for each of Google's agencies, or create a new agency.

At the end, there may be a little manual cleanup of the data through psql or PGAdmin. Make sure you run a pg_dump first.

Finally, to create the app, I made a new folder and ran the utils/makeStatic.py script in that folder, which fetches all the required assets from the server and puts them into the working directory. If you just load up index.html, however, nothing much will happen, because it won't find the vote counter API. The way we generally deploy is to drop the static app in the public/ folder of the vote counter, then run play war in the votecounter app to generate a unified WAR.

Clone this wiki locally