Introduction of Reinforcement Learning 1 Multi-Armed Bandit 2 Markov Decision Process Markov process Bellman equation: Bellman Expectation Equation & Bellman optimality equation 3 Dynamic Programming To solve: The optimal policy in Markov decision processes Policy Iteration & Value Iteration 4 Temporal Difference Model Sarsa Algorithm & Q-Learning