Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research Request - Nearest neighbor comparison #1162

Closed
tiffanychu90 opened this issue Jul 1, 2024 · 3 comments
Closed

Research Request - Nearest neighbor comparison #1162

tiffanychu90 opened this issue Jul 1, 2024 · 3 comments
Assignees
Labels
gtfs-rt Work related to GTFS-Realtime research request Issues that serve as a request for research (summary and handoff)

Comments

@tiffanychu90
Copy link
Member

tiffanychu90 commented Jul 1, 2024

Complete the below when receiving a research request, and continue to add to this issue as you receive additional details and produce deliverables. Be sure to also add the appropriate project-level label to this issue (eg gtfs-rt, DLA).

Research Question

Single sentence description: Compare the 2 or 3 nearest neighbor vehicle positions selected for speedmaps under Eric's rt_analysis and my gtfs_segments.nearest_snap.

Detailed description: Reworked nearest neighbor methodology looks roughly like:

  • Find the 1 nearest neighbor vp (vp to select only remove opposite traveling directions, and leaves "Unknown" directions in). Unknown directions mean that bus is at the same lat/lon but has different timestamp.
  • The 1 nearest neighbor vp then has up to 2 additional points added (vp_coords_trio)
  • Idea is that the stop position falls somewhere in the trio of points, either between point1-point2 or point2-point3, and the interpolated position occurs there. If it's out of bounds, then it's either pinned to point1 or point3.
  • Within a trip, there is a monotonicity check. All arrival times must be monotonically increasing.
    • If it fails this condition, the interpolated time is set to NaT
    • Within a trip, missing values are now assigned a time based on where the stop is stop_meters relative to previous stop and subsequent stop

Questions / Decision Points

  • How sensitive is speed calculation based on which vp is selected? If we are selecting slightly different vp, like being 1 off, does that matter? In theory, it shouldn't matter much.
  • What happens if we select a vp with direction Unknown? How sensitive is it now? Basically, if a bus arrives at a stop and now dwells for 5 min, it would be very sensitive if we select the first point on arrival vs the last point on arrival.
  • Can we factor in dwell time and do a smarter speed calculation?
    • Eric's methodology handles this already by asking if the position has moved
    • I do not handle this explicitly (whether the position has moved or not is not known unless positions are looked at sequentially. Need an agnostic way to "know" this)
    • Eric's methodology handles filling in missing stop arrivals with a similar monotonic check, and I do the same. But a bus that's dwelling would have the same stop_meters, so....the speed would be probably forced downward?
stop arrival dwell leave speed_comparison
stop1 8:00 5 min 8:05
stop2 8:10 2 min 8:12 8:12 compared to 8:00 when we want to compare 8:10 to 8:05
  • Sensitivity of 20th percentile vs 80th percentile speeds
    • Is the motivation for removing outliers?
    • If there 10 trips observed for a segment-time_of_day (offpeak vs peak), this is selecting the 2nd point. Do we usually have that many points in a distribution? If we have 5 points in a distribution, this is the average of the 1st and 2nd? What if the 1st point is close to 0 and we know it's wrong (but outliers only remove too high points), then we're always averaging between (0, something), effectively cutting that speed in half? Is that desired?

Deliverables

Notebooks

@tiffanychu90 tiffanychu90 added the research request Issues that serve as a request for research (summary and handoff) label Jul 1, 2024
@tiffanychu90 tiffanychu90 added the gtfs-rt Work related to GTFS-Realtime label Jul 1, 2024
@edasmalchi
Copy link
Member

Thanks for this @tiffanychu90!

About to head off on vacation, but looking forward to digging into this in the second half of July.

I wonder if it would be helpful to define a small suite of "known trips" from rt_delay calculations (perhaps even including manual verification), that could then be a test set for accuracy. Could even start with the BBB trips you've already looked at, then add a few more covering different route typologies.

@tiffanychu90
Copy link
Member Author

tiffanychu90 commented Jul 11, 2024

@edasmalchi: Aiming to check in a notebook for BBB early next week, post aligning to before/after a stop, and incorporating dwell time too (which gets me closer to "after the bus leaves the stop" compared to "before the bus arrives at the next stop") + existing interpolation method, so that will align the methodology in those steps first.

@tiffanychu90
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gtfs-rt Work related to GTFS-Realtime research request Issues that serve as a request for research (summary and handoff)
Projects
Status: Done
Development

No branches or pull requests

2 participants