Reinforcement learning toy model of a walker on a two-dimensional grid. The task is to avoid the mines (negative rewards, penalties) while reaching the treats (positive rewards and fixed points).
The learning algorithm used is the canonical Q-learning.
The user can play with different hyperparameters such as the learning rate, reward forecast discount, exploration-exploitation trade-off on grids of varying size and different reward/penalty densities.
The panel of settings:
The grid and the learned policy can be visualized.
This project was generated with Angular 7, and visualization uses roughjs.
Play with me here, or locally:
Build dependencies with:
npm install
then run with
ng serve