Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculating graphs on this #67

Open
Dabendorf opened this issue Dec 12, 2024 · 3 comments
Open

Calculating graphs on this #67

Dabendorf opened this issue Dec 12, 2024 · 3 comments
Labels
question Further information is requested

Comments

@Dabendorf
Copy link

Dabendorf commented Dec 12, 2024

Hello. This project is a really amazing one, thank you for sharing this with us.

I was wondering about if it is actually possible to calculate static distances on a database in a smart way? It would be similar to your old generate-vbb-graph project.

I would like to end up with a bunch of edge-times describing how long it takes between pairs of stations which are directly connected with a line and no station in between.

I guess I will have to come up with a smart solution here, but I was wondering if you already are aware of some function doing this already? My first hunch after checking some of the views and the docs is that one could flatten out the connections view somehow?

So this is actually not an issue, but I could find a help section in the readme.

Thanks in advance!

@derhuerst
Copy link
Member

Hello 👋

I was wondering about if it is actually possible to calculate static distances on a database in a smart way?

By "static distances", do you mean geographic distance (i.e. length of the path the trip takes, e.g. in meters) or travel time from A to B?

total geographic extent of a trip's shape

Assuming that your GTFS dataset's shapes.txt file is correct, you can directly query the whole trip's shape's length from that.

distances_traveled[array_upper(distances_travelled, 1)] in shapes_aggregated will contain the highest shape_dist_travelled value for your shape (because distances_travelled is filled from shape_dist_traveled). Alternatively, you can directly query the highest shape_dist_traveled from shapes.

However, shape_dist_traveled is optional, so you could instead use ST_Length from PostGIS to measure the shape's geographic "length": ST_Length(shape).

geographic distance of a trip's shape between two stops

I would like to end up with a bunch of edge-times describing how long it takes between pairs of stations which are directly connected with a line and no station in between.

This sounds more as if you want to measure the trip's shape's geographic "length" between two adjacent stops.

Note that, according to the GTFS Schedule spec, the shape doesn't have to visit the stops exactly:

Shapes do not need to intercept the location of Stops exactly, but all Stops on a trip should lie within a small distance of the shape for that trip, i.e. close to straight line segments connecting the shape points.

The best approach is to find the point that's closest to each stop (using ST_LineLocatePoint()), respectively, and then measure the length between those points (using ST_LineSubstring() & ST_Length()). Because shapes_aggregated.shape is currently a geometry instead of a geography (this is a bug), we manually convert it to the latter.

WITH
	stop_a AS (
		SELECT *
		FROM stops
		WHERE stop_id = 'stop A ID'
	),
	stop_b AS (
		SELECT *
		FROM stops
		WHERE stop_id = 'stop B ID'
	)
SELECT
	trip_id,
	ST_Length(ST_LineSubstring(
		shape::geography,
		ST_LineLocatePoint(shape::geography, stop_a.stop_loc),
		ST_LineLocatePoint(shape::geography, stop_b.stop_loc)
	)) AS segment_length
FROM stop_a, stop_b, trips
JOIN shapes_aggregated ON shapes_aggregated.shape_id = trips.shape_id
WHERE trip_id = 'some trip ID'

If you don't already have a trip_id/stop_id/stop_id triple, you can use the connections view to find them.

travel time between on a trip between two stops

gtfs-via-postgres supports two "mental models":

  • the time-unexpanded data that's (more or less) directly taken from the GTFS Schedule data – This is useful if you want to do network analysis.
  • the time-expanded view that "applies" every trip's schedule (service days) to all of its stop_times – This is useful for routing & queries from the individual's perspective.

For time-unexpanded access, you can use stop_times directly:

WITH
	stop_time_a AS (
		SELECT
			coalesce(departure_time, arrival_time) AS time
		FROM stop_times
		WHERE stop_id = 'stop A ID'
		AND trip_id = 'some trip ID'
	),
	stop_time_b AS (
		SELECT
			coalesce(arrival_time, departure_time) AS time
		FROM stop_times
		WHERE stop_id = 'stop B ID'
		AND trip_id = 'some trip ID'
	)
SELECT
	stop_time_b.time - stop_time_a.time AS travel_time
FROM stop_time_a, stop_time_b

If you don't already have a trip_id/stop_id/stop_id triple, you can use the time-expanded connections view to find them.

@derhuerst derhuerst added the question Further information is requested label Dec 14, 2024
@derhuerst
Copy link
Member

If your dataset doesn't have correct shapes, you can estimate them using pfaedle.

@Dabendorf
Copy link
Author

Hey,
Thank you for your answer, that's a very good and informative text.
You are right, it was a bit badly phrased which distance I meant, but nice that you answered for all types. Maybe you can add these useful queries to the docs? :)

What I was personally talking about was the distance in time.
I am trying to make a network where each edge is the travel distance in time between two stops which are next to each other on one line. That also means I somehow need to flatten out the datastructure, especially for huge cities.
Like for Berlin, there would be an RE edge 1min Alex-Friedrichstraße, but there would be no one for S-Bahn, because that one has no direct neighbouring connection (Hackescher Markt is in between). And there wouldn't be like 50000 of these edges for this pair, there would be only one. Like a connected graph of "generally fastest connection" between two nodes. But of course that would include the question about how to measure this, because timetables changes over the day.

To your answer: So, will skip over the geographical answer and go directly to travel time. I assume I do not need the shapes.txt at all for my use case.

I guess if I want to do this for the entire graph, I should use the time-_un_expanded data and run your query above for all pairs of stops?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Development

No branches or pull requests

2 participants