Skip to content

Latest commit

 

History

History
21 lines (18 loc) · 920 Bytes

Q-Learning.md

File metadata and controls

21 lines (18 loc) · 920 Bytes
title date lastmod
Q-Learning
2022-11-08
2022-11-21

Q Learning

Use temporal difference to update Q values at each time difference when the agent interacts with the environment.

Pasted image 20220415193044 New sample: This refers to the maximum utility that can be achieved in the state we are entering, i.e. $V(S_{t+1})$ using previous estimates (previous iteration if synchronous)

Grid World Scenario: Trial 1

(1,1 right) (1,2 right)
0 $Q_{new}=0+0.1\times(-5+0.9\times0-0)=-0.5$
Trial 2
(1,1 right) (1,2 right)
----------- -----------------------------------------------
0 $Q_{new}=-0.5+0.1\times(0+0.9\times0-(-0.5))=-0.45$