Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META] Origin/destination analysis layer/Information extractor #126

Open
3 tasks done
aiAdrian opened this issue May 22, 2024 · 14 comments
Open
3 tasks done

[META] Origin/destination analysis layer/Information extractor #126

aiAdrian opened this issue May 22, 2024 · 14 comments
Labels

Comments

@aiAdrian
Copy link
Collaborator

aiAdrian commented May 22, 2024

Preflight Checklist

Request type

Request for a new component

Functionality

The Origin/Destination Analysis enables the analysis of travel times and connections in the network graph (timetable). In principle, the fastest connections are sought starting from a start node. In order to limit or penalize the number of transfers, the timetable planner has the option to penalize transfer connections with additional time. This allows the costs for transfers to be controlled and determines whether a travel chain may include more or fewer transfer operations. The feature should implement the following functions/requirements:

  • The data from the network graph (graphs/timetable) serves as input for searching the travel chain.
  • For each start node, the travel times and number of transfers are calculated for all destination nodes. The travel time is the calculated shortest connection, taking into account the actual travel times, actual waiting times (transfer times), and the user-controlled penalty time for transfer operations. The function uses a suitable algorithm such as Dijkstra's algorithm to calculate these best connections between the start and destination in the graph. Frequencies and minimum transfer times must be taken into account, with the minimum transfer time not to be undercut.
  • The found solution should be presented as a matrix (dialog window).
  • The found solution should be exportable as a CSV file, with three files being exported: Effective travel time, effective number of transfers, and costs (effective travel time + transfer penalty). (done - still implemented)
  • Only visible trains are used for the respective calculation, i.e., filters must be considered. For the calculation, G trains can also be used for P analysis, i.e., the train category does not have to be considered as it can be controlled via filters.
  • This function allows the timetable planner to quickly and easily analyze whether the current timetable meets the requirements. The interactive testing capability allows for trying out various scenarios to understand the effects of adjustments on the travel chains. The graphical representation facilitates the visual capture of the best connections and their spatial spread.

Embedded in a dialog

The origin/destination matrix is opened in a dialog window, similar to the trainrun dialog. The dialog box is centered in the window and can be toggled to full screen if requested by the user. Toggling the full screen mode off will reset it to the initial size. The dialog window can be freely moved. The dialog window can be closed, by clicking in the upper right corner. In the menu bar there will be a button to open/close the dialog window. The origin/destination matrix should be possible to open in the editor and in the Streckengrafik (time distance graph).

Design - proposals (TBD) for the matrix:

  • Variant 1: Travel time and number of transfer operations (symetric) image

  • Variant 2: Number of transfer operations focusedimage

  • Variant 3: Separate number of transfer operations and travel time image

  • Variant 4: Number of transfer operations focused
    image

Filtering

  • If trainruns are not visible (filtered), then the shortest distance (cost) will be calculated based only on the visible trainruns.
  • If some nodes are filtered (not visible), the calculation must include all trainruns passing through both visible and filtered nodes. This means that nodes will not be filtered in the "background" algorithm, but the nodes must not be visible in the origin/destination matrix.

Connectivty

  • If a node is not reachable through the network, especially if there is no path between node A and node B, the square in the origin/destination matrix must be empty and highlighted.

Interactivity

  • If a user clicks on a cell that represents a path in the origin/destination matrix, i.e., a cell with time and number of transfers, the cost-shortest path must be rendered in the Netzgrafik-Editor - as long as the cell is not unselected in the origin/destination matrix. This should be done in a manner similar to the current prototype v2.5.0. All nodes visited along the selected shortest path will have the information about the number of transfer operations (changes) and the time. Starting at the top-left side of the matrix (direction).

E.g. BN to ZF

Search

Cell: Enter from / to Node shortname - will reduce the matrix dimension. Either no node is specified (full matrix), or only from / to (filtered matrix) or both (just one path).

UI Prototype

Editor
image

Streckengrafik
image

(The times, ... in the example are not correct, just a Mockup.)

Real-time feedback
Once a path is selected in the origin/destination matrix, editing of the Netzgrafik is locked, which means no changes can be made. If no path is selected, any changes made in the Netzgrafik will result in an update of the matrix. It might be necessary to calculate it in a separate thread to ensure high responsiveness, but this needs to be tested during implementation.

Changing the filter immediately triggers an update.

If a path was selected and changing the filter causes the path to disappear (no longer feasible), it will result in the deselection of the path. However, if the previously selected path remains visible after changing the filter, it will remain selected as it was. The same behavior applies if the user deletes a trainrun and the path is no longer found or feasible. In such cases, the path will be deselected.

Open questions
The CVS export button is missing, and the position for the input field and button has not been fixed yet. The transfer penalty must also be insertable.

Key task
Work out a well designed user interface with high usability. The above ideas are juste early state: please answer:
Which design has a well readable and includes all requirements i a way that the user can greatly interact anc work with the o/d matrix - any ideas are welcome? The design has to fit into existing netzgrafik-software. The UX results should be published as a new issue which are ready to get implemented.

Dependancy

#77

@louisgreiner
Copy link
Collaborator

Thank you Adrian for the document, we will soon have a closer look at this feature and see how we can work from it !

@aiAdrian
Copy link
Collaborator Author

aiAdrian commented Jul 3, 2024

This comment can provide inspiration.

image

Please take a look at the feature (Analytics enabled / not yet well implemented) where the current version calculates the shortest path and uses the current network graphic for visualization.


image


image

The concept behind this online real-time analytics is to calculate the travel times from a specific node of interest to all other nodes in real-time. When the mouse pointer hovers over the "clock," the corresponding sector representing a 1-minute interval gets highlighted. If the user clicks on a sector, all paths are visualized. The visualized paths represent the shortest path taken to "implement" the travel chain, with the departure time at the node of interest set to the selected minute. Only the colors in the nodes are important, so the clocks are visualized with near-transparent opacity. If no sector/minute is selected, the colors in the nodes and the paths are not relevant, meaning the user can manipulate the network graphic normally, and the clocks will always adjust accordingly.

The business information behind this concept is that there is a passenger or goods ready to be transported at a specific node at a given time. Thus, the displayed travel chain (travel time) is the effective travel time for a passenger. The clock for #changes (Umsteigen) and #travel time (Reisezeit) gives a very quick overview of how well the concept is balanced.

Required interactions:

  • Activation of the special analyitcs travel chains visualisation
  • Selection of a node of interest (clicking) => subsequently, the selected node can be changed by clicking on a clock at a node
  • The selected node is marked as the node of interest (possibly highlighted with color or filled in)
  • Displaying the clocks
  • As long as no time is clicked on the clock (regardless of which one), only the clocks are displayed, and the user can manipulate it as usual. When a specific departure time is chosen at a node, i.e., "time ready to get transported," the paths are displayed similar to the current implementation, and the user can no longer manipulate it. Once the user clicks on the background of the network graphic, the selection is turned off, i.e., a specific time, and everything can be manipulated normally again.
  • Disabling of the special analyitcs travel chains visualisation

@aiAdrian
Copy link
Collaborator Author

aiAdrian commented Jul 3, 2024

Discussion: Meeting 03.07.2024

  • In addition to the shortest travel times, it is also interesting to know if there are any alternatives within a "time window", meaning connections that are not much worse. These calculations should also be possible so that the network planner can see: Okay, the passenger has not only the shortest option but also the possibility of n additional connections that are not much worse, for example, taking no more than 10 minutes longer and requiring only one extra transfer.
  • Connection / train switch penalty we like to introduce a global variable (penalty time)
  • Open Question: Can we count the frequcency of the shortest connection

Commitment
We agree that the Origin-Destination Matrix should be developed in small steps. We agree that it is currently very difficult to estimate how quickly the matrix can be calculated. Since each entry is a shortest path and the necessary information needs to be extracted for each of these paths, the computation time can potentially be very long. To get a sense of how quickly the many Dijkstra shortest paths can be calculated, we want to first develop or test the matrix as a pure CSV (comma-separated value) export.

Here's what we will do:

  • Calculate the shortest path (evaluated as value = travel time + transfer time + a penalty for each transfer) for each pair of nodes.
  • For each path, calculate the total travel time (travel time + transfer time).
  • For each path, calculate the number of transfers.
  • All information will be exported, meaning two files will be exported: one with travel times and one with the number of transfers (each in CSV format).

Ajdustment / Next steps

  • Once the exported information is ready, we will verify and cross-check it with the business (owner) users.

  • After finalizing the data and incorporating the feedback, we can proceed to build the UI integration. This involves designing and developing the user interface (UI) and user experience (UX) based on the insights and learnings gained from the CSV exports.

  • The implementation phase will follow the UI design. This involves translating the design into functional code and integrating it into the existing system. The implementation should align with the specifications and goals defined during the CSV export and UI/UX design phases.

  • Throughout the process, it is essential to maintain effective communication and collaboration with the business users to ensure their needs are met and any changes or adjustments are addressed in a timely manner.

@louisgreiner
Copy link
Collaborator

Thank you for the debriefing, we indeed agree on your above comment.

First, we will search for an algorithm (if there is!) that enable the computation of some nodes to some nodes only, to allow taking in account the filtering, otherwise we might stick to Dijkstra. We thought on a way to reduce computational complexity : re-use the calculations on sections already computed.

We still have a few questions:

  • How to proceed with filters? Should we let it simple for now or think about it now ? (Non-)visible elements taken in account in our algorithm might change significantly our algorithms, maybe we need to discuss about it ahead of time.
  • What to do if there is no path found?
  • About letting the user enter the penalty time. It isn't much work for us to allow that (we just need to find where the set the input field), but the user might not understand well what it does represent, nor the impact.

Note

For implementation: do not forget to take in account the minimal connection time (of a node) when trying to compute shortest path

@aiAdrian
Copy link
Collaborator Author

aiAdrian commented Jul 22, 2024

Test Scenario: The data and all lines are fictional, but having a medium-sized test network graphic is valuable:

Medium
Test_Scenario.json

Small
Test_Scenario_2.json

Small
Test_Scenario_3.json

@louisgreiner
Copy link
Collaborator

louisgreiner commented Jul 22, 2024

Technical workshop debrief (22/07/24)

Participants: Adrian, Jules, Louis, Maëlys, Stéphane

Optimization problem formulation

Constraints

  • Generate connections on the fly using the minimum connection time (i.e. if minimum time is 5min and there is only 3 minutes between two trains, the connection time will be 18min)
  • Attention to frequencies (15min, /!\ 20min, 30min, 60min, 120min). Assert that all frequencies are multiples of each other (this assumption simplifies the calculations at the beginning, but need to check if reincluding non-multiple frequencies won't make the algorithm break or too difficult to reimplement
  • Take into account whether the train stops or not, to determine whether or not if a connection is possible (be careful to calculate paths with implicit connections (not only where transitions are set)

Objective function

  • Travel time (including connections) + number of connections * penalty (fixed) per connection (penalty could be minimal connection time of the node + 5min or minimal connection time of the node * 2 i.e.
  • If equal, how to discriminate between 2 paths ? (a priori least connections)

Output for the optimal path of a pair

  • Path travel time (including connections)
  • Number of connections for that path
  • (How often does the optimal path occur? -> if different frequencies are involved i.e.)

Algorithms to test

To be done

  • Clarify the constraints
  • Proof of concept for the algorithm chosen
  • Export as CSV (mainly UI)
  • Full UI integration (but needs the POC to be done, to better identify the user needs)

@shenriotpro
Copy link
Contributor

About the algos: just wanted to mention Floyd–Warshall is an option in theory, but it may be too slow for many vertices (1000+, sparse graph).

@shenriotpro
Copy link
Contributor

shenriotpro commented Jul 23, 2024

For the graph model: each "inner" vertex will likely be a (node, "arrival" or "departure", time) tuple, each edge being derived from trainrun sections (departure, arrival) or connections/transitions (arrival, departure), including connection cost if suitable.
The algos may run more easily if we generate a departure and arrival "outer" vertex as well for each node (the expected output being shortest paths from departure to arrival outer vertices), connected to the "inner" vertices at this node (likely costing 0).
Memory may or may not be an issue, it may help to implicitly generate the graph (mostly if running dijkstra).

@louisgreiner
Copy link
Collaborator

louisgreiner commented Jul 31, 2024

Proof of concept

Author: @shenriotpro

Link : https://colab.research.google.com/drive/1Omx_qt2fnW1kJC6kQrhOgdkTZzmtibid?usp=sharing#scrollTo=EDXs4tO0kmDr

POC with input Scenario medium

Limitations of the POC (see TODO in code):

  • connections for non-stop trains are allowed
  • round trips are assymetrical (need to understand why, i.e. 3 -> 113 = 109 and 113 -> 3 = 110)
  • we count a connection for a train at each node even if this is the same train

To do

  • test the code and understand the results
  • build the graph "by hand" to compare the results and understand the algorithm

A filter has been implemented to select the nodes that we want for the output (but still search in all nodes)

Implementation

Ticket that describes:

  • Acceptance criteria
  • Implementation plan

@shenriotpro
Copy link
Contributor

A few notes:

  • The whole process takes about 2 s to run (dijkstras)
  • It is fairly easy to parallelize (0.4 s on my machine)
  • From 50 nodes we generate a graph of 24k vertices and 38k edges
  • For some reason pathfinding after the topological sort is slower

@shenriotpro
Copy link
Contributor

Fixed a few bugs, there are a few ways to optimize, but the performance got worse:

  • The whole process takes about 20 s to run (dijkstras)
  • From 50 nodes we generate a graph of 24k vertices and 192k edges

@aiAdrian aiAdrian added the enhancement New feature or request label Aug 13, 2024
@louisgreiner louisgreiner changed the title [Feature request]: Origin/destination analysis layer/Information extractor [META] Origin/destination analysis layer/Information extractor Oct 15, 2024
@louisgreiner
Copy link
Collaborator

I am trying to write down the feature for the User Interface part

I have several questions:

  • do we have a final mock-up ? It would be nice if you could also tell which interface components to re-use
  • "in the Streckengrafik, the used path is highlighted" (cf. above picture, path is in green/red), any more informations ?
  • considering the above comments, a lot of features needs to dynamically change the matrix, if opened ; do we still consider running O/D matrix calculations on any change ? which might slow the application

@shenriotpro
Copy link
Contributor

shenriotpro commented Oct 16, 2024

  • "in the Streckengrafik, the used path is highlighted" (cf. above picture, path is in green/red), any more informations ?

Note that the used path is currently not computed (it should not be too hard though).

  • considering the above comments, a lot of features needs to dynamically change the matrix, if opened ; do we still consider running O/D matrix calculations on any change ? which might slow the application

This is outside my expertise, but it could possibly be updated on a regular basis (when the user is inactive?) or on a separate worker.

@aiAdrian
Copy link
Collaborator Author

aiAdrian commented Oct 31, 2024

#199

Early experiment: Done with Version 2.9.1

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants