
This project aims to solve the route optimisation problem among vehicles in a fleet by reinforcement learning.
Giving a group of vehicles and a serial of demands, it can assign a particular vehicle to deal with it intuitively, and each of them will follow the route computed by RL.
The performance of RL for each episodes executed in this project can be found in gmmrr/route-optim, which is an individual version of this repo.
This repo is part of Guanming's capstone project.
Given that there are 20 vehicles, and 200 demands are randomly generated with 10 ~ 30 seconds intervals.
The visualised result is in the plot below:
where the grey part showcases the duration that the vehicle is commuting, and the black part showcases the duration that the vehicle is on duty.
These demands are consumed as below:
-- Total Time: 69.58 min
-- Average Waiting Time: 0.58 min
The graph below the topic section roughly shows 10 of demands and how vehicles find their route, where yellow lines are congested edges.
The graph below is the complete one:
-
Demands are randomly generated
The prediction of next demand is quite a issue, but it is really hard to get the real situation near NCKU. There is a dataset calledCabspotting
(a cab data revealed by Scott Snibbe from San Francisco) on which lots of related researches is based. -
Congestion is randomly generated
Similar to mentioned situation above. It is randomly chosen from edges space, and it can be defined infleet_environment.py
to low, medium, or high level. -
Speed is a constant
Net downloaded from OSM website helps classify the edge type, like primary, secondary, residential highway. Each of them has a defined speed. In this project, we don't take acceleration into consideration. Thus, it seems like to be far away from the practical case. -
Traffic light is set in a 90 seconds interval
Even if it is close to the practical case, it is still not real. They are set as a program rather than a constant pattern in reality. -
The terminal condition of RL
It is set that convergence occurs when time taken (round to the second decimal place) in 5 episodes is consistent. -
Performance issue
Due to the performance issue, part of simulation is implemented by Dijkstra algorithm. Even if so, the most important part, that is, route optimisation is still implemented by RL.
- Download SUMO (https://sumo.dlr.de/docs/Downloads.php)
- Clone this repository to your local machine
- Install the necessary packages by following operations:
$ pip3 install -r requirements.txt
- Update the main.py with your SUMO directory to set the environment variable
def sumo_config():
os.environ["SUMO_HOME"] = '$SUMO_HOME' # -- change to your path to $SUMO_HOME
...
- Upload your netedit file and update the network_file variable
network_file = './network_files/ncku_network.net.xml'
More on OSM website: https://www.openstreetmap.org/
Config command is saved in ./network_files/config.txt
- Upload your traffic_light file
tls = tls_from_tllxml('./network_files/ncku_network.tll.xml')
This file can be converted by Netedit, more on https://sumo.dlr.de/docs/Netedit/index.html
- Edit following parameters as part of environment in
main.py
# 03 Initiate Environment
...
num_vehicle = 20
evaluation = "time"
num_demands = 200
congestion_assigned = [] # Type: ["edge_id", int(minute)] It will be defined randomly if not customised
...
- Run the code
$ python3 main.py
- In
main.py
, we can setnum_vehicle = 20 evaluation = "time" num_demands = 200 congestion_assigned = []
- In
agent.py
, we can setand we have# Hyperparameters for Q_Learning learning_rate = 0.9 # alpha discount_factor = 0.1 # gamma # Hyperparameters for SARSA learning_rate = 0.9 # alpha discount_factor = 0.1 # gamma exploration_rate = 0.1 # ratio of exploration and exploitation
They are defined asreward_lst = [-50, -50, -30, 100, 50, -1]
[invalid_action_reward, dead_end_reward, loop_reward, completion_reward, bonus_reward, continue_reward]
respectively.