Skip to content

Optimisations: Rerouting and Threads

Lara Codeca edited this page Aug 30, 2016 · 4 revisions

DISCONTINUED (LuST Secnario v1.0)

Multithreads

A special thank to Stefan Neumeier (Technische Hochschule Ingolstadt)

Changes in the configure file

By adding the line <device.rerouting.threads value="X" /> and replacing X with the number of threads to use, sumo will use threads for parallel computation of rerouting. For being able to use this feature, it is necessary to compile sumo with FOX.

FOX configuration

If FOX is configured correctly, after running ./configure the config.log file should contain something like:
configure:16387: checking for fox-config
configure:16405: found /usr/bin/fox-config
configure:16417: result: /usr/bin/fox-config
configure:16448: checking fxver.h usability
configure:16448: g++ -c -O2 -DNDEBUG -I/usr/include/fox-1.6 conftest.cpp >&5

When building with make, the output has to contain "-I/usr/include/fox-1.6" so that fox is really used.

Threads

If fox is included in compilation and the configuration file is adapted, rerouting now should be done using the number of threads passed as value. The conf_threads_LuSTScenario.sumo.cfg file contains all the configuration information and can be used with sumo -c conf_threads_LuSTScenario.sumo.cfg. The usage of multiple threads can be validated by using the command ps -o nlwp PID when executing sumo. The number shown is the number of Lightweight Processes (threads). (For Mac OSX users, ps -M PID shows the threads corresponding to each task.)

The number specified in the configuration file, indicates additional threads spawned aside the already running sumo thread. For example, setting the value to 4, will lead sumo to run with a total of 5 threads (4 rerouting and 1 sumo).

Improvements (provided by Stefan Neumeier)

The gained performance improvements are shown following. Execution was done on a machine with 32 cores and 60 GB of RAM. As base configuration (100 percent of runtime), there was the normal configuration without specifying the threads parameter (which means no threads are spawned). 0 Threads mean that the thread-parameter was set in the configuration, but with a value of 0. There was also a configuration with 31 cores tested, to have the full utilisation of the server with 32 threads based on 31 threads for rerouting and 1 thread for sumo (mentioned above why this happens).

Number of Threads Time in seconds Percent of total
No config 3,580 100
0 Threads 3,575 99.86
1 Thread 3,600 100.55
2 Threads 2,301 64.27
4 Threads 1,562 43.63
8 Threads 1,152 32.17
16 Threads 1,063 29.69
31 Threads 946 26.42
32 Threads 952 26.59
64 Threads 941 26.28

Multithreading and determinism

It has to be said, that using multiple threads lead to non-deterministic results, as it is non-deterministic which thread calculates the next step. Results of the measurements showed up:

Running the simulation multiple times on a server (4 cores) using 1 thread always led to the result:

 Inserted: 138361 (Loaded: 138613)         
 Running: 104          
 Waiting: 252           
 Teleports: 46 (Collisions: 5, Jam: 10, Yield: 11, Wrong Lane: 20)            
 Emergency Stops: 8          

The same result can be achieved by using other servers without having the configuration file modified or having the values set to use 0 threads or 1 thread for computing rerouting. Having more than 1 thread for calculating the rerouting, the results are getting non-deterministic.

Running the simulation on the same server (4 cores) using 4 threads led to the results:

Run No 1:           
 Inserted: 138361 (Loaded: 138613)          
 Running: 100          
 Waiting: 252          
 Teleports: 97 (Collisions: 3, Jam: 52, Yield: 22, Wrong Lane: 20)          
Emergency Stops: 7          

Run No 2:          
 Inserted: 138361 (Loaded: 138613)          
 Running: 101          
 Waiting: 252          
 Teleports: 41 (Collisions: 7, Jam: 9, Yield: 6, Wrong Lane: 19)          
Emergency Stops: 7          

Running the simulation on another server (32 cores) while using 32 threads led to the results:

Run No 1:          
 Inserted: 138361 (Loaded: 138613)          
 Running: 100          
 Waiting: 252          
 Teleports: 47 (Collisions: 2, Jam: 1, Yield: 14, Wrong Lane: 30)          
Emergency Stops: 12          

Run No 2:          
 Inserted: 138361 (Loaded: 138613)          
 Running: 96          
 Waiting: 252          
 Teleports: 35 (Collisions: 2, Jam: 1, Yield: 9, Wrong Lane: 23)          
Emergency Stops: 9          

As it can be seen, the results differ from run to run by using more than one thread. By using no configuration (config file does not contain the threads-parameter, leading to now thread spawning), 0 threads or 1 thread the results are deterministic.

For normal simulation runs, this behaviour should be no problem.
The improvements shown here depend highly on the usage of rerouting.
If there is no rerouting, there will be no improvements.
If there is more rerouting, the improvements will be greater.