Skip to content

Orion Trip Generator

Eddy Ionescu edited this page Jul 12, 2018 · 21 revisions
  • takes in JSON S3 files (API output) to generate trips

What is a trip?

  • uniqueness is defined by (vehicle_id, route_id, direction_id) staying consistent with every transit state.
  • If this tuple is different in the next state, then we end the trip. If we see a tuple that we didn't see last state, we start a new trip.
  • the start/end times should be based on the timestamp of the files in the raw data being used (not of when trip is actually being made)
  • when a trip ends, we write it to a JSON file in S3 (agency_id/route_id/direction_id/start_year/start_month/start_day/start_hour/vehicle_id/start_epoch_utc/end_epoch_utc.json, where epoch is a UTC timestamp in seconds). We can worry about compressing it or loading it into some databases later.

What is a state?

  • it's a snapshot of where all Muni vehicles are with a timestamp

Why?

  • our goal is to get the speed & reliability of routes.

  • An easy way of approaching this problem is to look at individual trips and then get data about them along route segments.

  • Storing trip metrics also makes it easier to eventually quickly handle more complex analysis due to aggregated metrics.

  • our goal is to make Muni's GPS data open and accessible to use.

  • making trip data accessible via S3 keeps individual file sizes small and is straightforward & logical for open-data users to retrieve.

What the output will look like:

agency:
startTime:
endTime:
route:
direction:
vid:
states: [{
    vtime:
    lat:
    lon
}]

(json)

How it'll work:

  • how it'll persist state:
  • each time it goes to the next state, it dumps its state
  • Read raw s3 vehicle data from s3 on first startup and create trips (unique tuple in memory)
  • Write trips to state file on disk (or s3?) on first startup
  • When new raw vehicle file is put in s3, publish sqs message which orion-trip-generator is listening for
  • orion-trip-generator consumes sqs message which triggers:
    • updating trips in memory (states array)
    • write trips that no longer exist to agency trip s3 bucket and remove them from memory
    • add new trips to memory.
    • write current trips state to state file
    • all of above should be atomic
  • if orion-trip-generator restarts, it gets it's state from state file
  • if state file doesn't exist it read latest raw file to get state
  • If there is a lag between publishing raw files (hours?, days?) this could lead to trips that appears to have taken too long to complete.
  • If the trip generator crashes and is restarted after hours or days, the same issue as above could occur (trips that appear to take days to complete)
  • Should we ignore writing trips that last longer than a certain period of time?
  • Should we still write suspect trips but mark them as possibly invalid?