Skip to content

Latest commit

 

History

History
72 lines (65 loc) · 3.66 KB

README.md

File metadata and controls

72 lines (65 loc) · 3.66 KB

Dynamic Channel Allocation by Reinforcement Learning

This project implements a RL agent for doing Dynamic Channel Allocation in a simulated mobile caller environment.

The implementation is in Haskell and uses Accelerate for numerical work. It is a near-complete port of the best-performing agent (AA-VNet) from https://github.com/tsoernes/dca. The agent uses a linear neural network as state value function approximator, which is trained using a newly proposed average-reward variant of TDC gradients, originally defined for discounted returns in Sutton et al. 2009: "Fast gradient-descent methods for temporal-difference learning with linear function approximation."

For an introduction to the channel allocation problem and how RL is applied to solving it, see: Torstein Sørnes 2018: Contributions to centralized dynamic channel allocation reinforcement learning agents

See also the version written in Rust and Python.

How to build

The following builds with O2 and other optimizations.

stack build --stack-yaml stack-release.yaml

To build without optimizations but with profiling flags, drop the --stack-yaml .. option.

How to run

stack exec --stack-yaml stack-release.yaml dca-exe -- --backend cpu

Which will run the project, and on startup generate a full computational graph which contains both the call network simulator and the agent's neural network. The computational graph is compiled using Accelerate.LLVM.Native, and executed on the CPU. To use Accelerate's build-in interpreter instead, skip the --backend cpu flag. Support for compiling to GPU can be obtained by adding the dependency accelerate-llvm-ptx and switching out the imports in AccUtils.hs.

To see available options, run:

stack exec --stack-yaml stack-release.yaml dca-exe -- --help
Available options:
  --call_dur MINUTES       Call duration for new calls. (default: 3.0)
  --call_dur_hoff MINUTES  Call duration for handed-off calls. (default: 1.0)
  --call_rate PER_HOUR     Call arrival rate (new calls). (default: 200.0)
  --hoff_prob PROBABILITY  Hand-off probability. Set to 0 to disable
                           hand-offs. (default: 0.0)
  --n_events N             Simulation duration, in number of processed
                           events. (default: 10000)
  --log_iter N             How often to show run time statistics such as call
                           blocking probability. (default: 1000)
  --learning_rate F        For neural net, i.e. state value
                           update. (default: 2.52e-6)
  --learning_rate_avg F    Learning rate for the average reward
                           estimate. (default: 6.0e-2)
  --learning_rate_grad F   Learning rate for gradient
                           correction. (default: 5.0e-6)
  --backend ARG            Accepted backends are 'interp' for 'Interpreter' and
                           'cpu' for 'LLVM.Native'.The interpreter yields better
                           error messages. (default: Interpreter)
  --min_loss F             Abort simulation if loss goes below given absolute
                           value. Set to 0 to disable. (default: 0.0)
  --fixed_rng              Use a fixed (at 0) seed for the RNG. If this switch
                           is not enabled, the seed is selected at random.
  -h,--help                Show this help text

TODO

  • Implement hand-off look-ahead