Skip to content

RyanThomas/mta-bus-archive

Repository files navigation

<<<<<<< HEAD

mta bus archive

Download archived NYC MTA bus position data, and scrape gtfs-realtime data from the MTA.

Bus position data is archived at data.mytransit.nyc.

Requirements:

  • Python 3.x
  • PostgreSQL 9.5+

Set up

Create a set of tables in the Postgres database dbname:

make install PG_DATABASE=dbname

This command will create a number of whose tables that begin with rt_, notable rt_vehicle_positions, rt_alerts and rt_trip_updates. It will also install the Python requirements, including the Google Protobuf library.

You can specify a remote table using the PSQLFLAGS or MYQSLFLAGS variables:

make install PG_DATABASE=dbname PSQLFLAGS="-U psql_user"

Download an MTA Bus Time archive file

Download a (UTC) day from data.mytransit.nyc, and import into the Postgres database dbname:

make download DATE=2016-12-31 PG_DATABASE=dbname

The same, for MySQL:

make mysql_download DATE=2016-12-31 PG_DATABASE=dbname

Scraping

Scrapers have been tested with Python 3.4 and above. Earlier versions of Python (e.g. 2.7) won't work.

Scrape

The scraper depends assumes an environment variable, BUSTIME_API_KEY, contains an MTA BusTime API key. Get a key from the MTA.

export BUSTIME_API_KEY=xyz123

Download the current positions from the MTA API and save a local PostgreSQL database named mtadb:

make positions PG_DATABASE=dbname

Download current trip updates:

make tripupdates PG_DATABASE=dbname

Download current alerts:

make alerts PG_DATABASE=dbname

Scheduling

The included crontab shows an example setup for downloading data from the MTA API. It assumes that this repository is saved in ~/mta-bus-archive. Fill-in the PG_DATABASE and BUSTIME_API_KEY variables before using.

Setting up Postgres in CentOS

Pick a database name. In this example it's mydbname.

sudo make install  # downloads requirements
sudo make create  # initializes postgresql
sudo make init PG_DATABASE=mydbname PG_HOST= PG_USER=myusername

Uploading files to Google Cloud

Setup

Create a project in the Google API Console. Make sure to enable the "Google Cloud Storage API" for your application. Then set up a service account. This will download a file containing credentials named something like myprojectname-3e1f812da9ac.json.

Then run the following (on the machine you'll be using to scrape and upload) and follow instructions:

gsutil config -e

Next, create a bucket for the data using the Google Cloud Console.

You've now authenticated yourself to the Google API. You'll now be able to run a command like:

make -e gcloud DATE=2017-07-14 PG_DATABASE=mydbname MODE=upload

By default, the Google Cloud bucket will have the same name as the database. Use the variable GOOGLE_BUCKET to customize it.

Note the MODE=upload – this tells the Makefile to use the set of commands

License

Available under the Apache License.

mta-bus-archive

479a1abf9749503a77b7df139448720dd2a7a650

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published