A GTFS file is a ZIP archive containing a number of CSV files. A difference can concern a file, a column in a file or a row in a file.

A “GTFS Diff” file is a CSV file with 8 columns, allowing to express any difference found between two GTFS files. Each row describes a difference found.

Columns description

Field Name	Type	Required	Description
id	String	Required	Uniquely identifies a row in the GFTS Diff file.
file	String	Required	The name of the file in the GTFS archive concerned by the change
action	String. Enum: `add`, `delete`, `update`	Required	The type of change. Has something been added, deleted or updated? Files and columns can be added or deleted. Only rows can be updated.
target	String. Enum: `file`, `column`, `row`	Required	Specify what is concerned by the “action”. Can be a file, a column in a file or just a row.
identifier	json	Required	How to uniquely identify what part of the data is concerned by the change. - if target is set to file, we identify the file using the “filename” key. For example {“filename”: “shapes.txt”} - if target is set to column, we identify the column using the “column” key. For example {“column”: “bikes_allowed”} - if target is set to row, we identify the row using a list of keys. Each key being added to the other as a logical “AND”. For example, {"stop_id":"A"} identifies a row having the “A” as a stop_id. {“from_stop_id”: “A”, “to_stop_id”: “B”} identifies the row where “from_stop_id” = “A” AND “to_stop_id” = “B”
initial_value	json	Conditionally required	Required in case of an updated row. List the initial values. For example {"stop_name": "Xrain station", “stop_lat”: “”}
new_value	json	Conditionally required	Required in case of an added or updated row. A json where the keys are the column names and the values are the row values. - For an added row, it contains all the column names and values. For example for a new transfer between stations: {“from_stop_id”: “A”, “to_stop_id”: “B”, “transfer_type”: 1, “min_transfer_time”: 2} - For an updated row, it contains only the modified values. For example {"stop_name": "Train station", “stop_lat”: “45.1”}
note	String	Optional	A free field where explanations about the change can be given.

Ordering

The requested order concerns the “target” column. The differences should be listed in the following order:

“target” of type “file” come first. ie modifications on files in the archive
then “target” of type “column”, ie modifications on a column in a file
then “target” of type “row”, ie modifications on a row in a file

This order makes it easier for a human to grasp the differences between files, and for a computer to apply successive patches of changes (first create a file, then populate it).

For each given type of target, the row order is not specified.

Example

Here is an example GTFS diff file, with some explanations in the note column about what each row means.

id	file	action	target	identifier	initial_value	new_value	note
1	transfers.txt	add	file	{“filename”: “shapes.txt”}			creation of new file
2	readme.pdf	delete	file	{“filename”: “readme.pdf”}			deletion of a file
3	shapes.txt	delete	column	{“column”: “internal_id”}			delete the column “internal_id” in the “transfers.txt” file
4	transfers.txt	add	row			{“from_stop_id”: “A”, “to_stop_id”: “B”, “transfer_type”: 1, “min_transfer_time”: 2}	add a row in the transfers.txt file
5	stops.txt	delete	row	{“stop_id”: “A”}	{“stop_id”: “A”, “stop_name”: “town center”, …}		delete the row in stops.txt where “stop_id” = “A”
6	stops.txt	update	row	{“stop_id”: “B”}	{“stop_name”: “”}	{“stop_name”: “station”}	in stops.txt update the stop_name of the row identified by “stop_id” = “B”. The stop_name was empty, now it is “station”
7	calendar_dates.txt	update	row	{“service_id”: “1”, “date”: “20220928”}	{“exception_type”: “1”}	{“exception_type”: “2”}	in calendar_dates.txt, update the exception_type of the row identified by “service_id” = “1” AND “date” = “20220928”. The exception_type was 1, now it is 2.

Example

If you shuffle the rows of the stops.txt file in a GTFS archive, the resulting GTFS Diff is empty, as row order is not a relevant information in a GTFS file.

Full example

The examples folder contains simple GTFS files and the resulting GTFS Diff listing the differences between them.

Possible usages

Have a quick overview of the changes made to a GTFS file
Communicate effectively to someone the changes made to a GTFS file and give an explanation for each change.
Take two corrected GTFS files and merge them together.

Possible alternatives we thought about

Using text diff tools. CSV are just text files, so it is possible to use powerful existing tools to compare them. But if the text diff is easily made, the results are harder to interpret. For example if a column is deleted, on a 1000 rows file, text diff will show 1000 differences, whereas the current proposition will just list a single column deletion. Text diff is also order dependent, but GTFS files are not.
On the complete opposite to the text diff is the use of a GTFS library to load the data in a model. Main advantage is the possibility to interpret the changes with more depth, because the model knows what it is talking about. Could make the difference between changes impacting routing calculations, visual elements (colors, etc). But makes it more difficult to handle wrong data (a pdf file in the archive has been deleted) and needs to constantly keep track of the GTFS extensions (fares V2, pathways, etc)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!