-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
122 lines (114 loc) · 4.52 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>RL Grid World Demo</title>
<style>
body {
font-family: Arial, sans-serif;
max-width: 800px;
margin: 0 auto;
padding: 20px;
}
#grid {
display: grid;
grid-template-columns: repeat(5, 100px);
gap: 1px;
margin-bottom: 20px;
}
.cell {
width: 100px;
height: 100px;
border: 1px solid #ccc;
display: flex;
flex-direction: column;
align-items: center;
justify-content: center;
font-size: 12px;
}
.agent { background-color: #ff9999; }
.goal { background-color: #99ff99; }
.highlighted-path { border: 2px solid #ff00ff; }
.q-values {
display: grid;
grid-template-columns: repeat(3, 1fr);
grid-template-rows: repeat(3, 1fr);
width: 100%;
height: 100%;
}
.q-value {
display: flex;
align-items: center;
justify-content: center;
font-size: 10px;
}
.explanation {
background-color: #f0f0f0;
padding: 10px;
margin-bottom: 20px;
border-radius: 5px;
}
.controls {
display: flex;
gap: 10px;
margin-bottom: 20px;
}
</style>
</head>
<body>
<h1>Reinforcement Learning Grid World Demo</h1>
<div class="explanation">
<h2>How it works:</h2>
<p>This demo shows a simple reinforcement learning (RL) agent learning to navigate a 5x5 grid world. The agent starts at the top-left corner (0,0) and aims to reach the goal at the bottom-right corner (4,4).</p>
<p>The agent learns using Q-learning, a type of RL algorithm. Each cell shows the Q-values for moving in each direction (up, right, down, left). Higher Q-values indicate more promising actions.</p>
</div>
<div id="grid"></div>
<div class="controls">
<button id="step">Step</button>
<button id="train">Train</button>
<button id="demonstrate">Demonstrate</button>
</div>
<p id="info"></p>
<div class="explanation">
<h3>Controls:</h3>
<ul>
<li><strong>Step:</strong> Make the agent take one action based on its current policy.</li>
<li><strong>Train:</strong> Run 1000 episodes of training to improve the agent's policy.</li>
<li><strong>Demonstrate:</strong> Show the agent's learned behavior by moving to the goal.</li>
</ul>
<h3>Grid Legend:</h3>
<ul>
<li><strong>A:</strong> Agent's current position</li>
<li><strong>G:</strong> Goal position</li>
<li><strong>Numbers:</strong> Q-values for each action (up, right, down, left)</li>
<li><strong>Magenta border:</strong> Path taken during demonstration</li>
</ul>
</div>
<div class="explanation">
<h2>How Q-values are updated:</h2>
<p>Q-values are updated using the Q-learning algorithm:</p>
<p><strong>Q(s, a) = Q(s, a) + α * [R + γ * max(Q(s', a')) - Q(s, a)]</strong></p>
<p>Where:</p>
<ul>
<li>Q(s, a) is the Q-value for the current state s and action a</li>
<li>α (alpha) is the learning rate (${LEARNING_RATE} in this demo)</li>
<li>R is the reward received for taking action a in state s</li>
<li>γ (gamma) is the discount factor (${DISCOUNT_FACTOR} in this demo)</li>
<li>max(Q(s', a')) is the maximum Q-value for the next state s' over all possible actions a'</li>
</ul>
<p>This update happens after each action the agent takes, gradually improving its policy.</p>
</div>
<div class="explanation">
<h2>Reward Structure:</h2>
<p>In this demo, the reward is determined as follows:</p>
<ul>
<li><strong>Reaching the goal (4,4):</strong> +1 reward</li>
<li><strong>Any other action:</strong> -0.1 reward</li>
</ul>
<p>This reward structure encourages the agent to reach the goal as quickly as possible while minimizing unnecessary steps.</p>
<p>Note: The reward is not based on the distance to the goal. This simple structure is common in introductory RL problems.</p>
</div>
<script src="app.js"></script>
</body>
</html>