Skip to content

Commit

Permalink
Deploying to gh-pages from @ 6c36145 🚀
Browse files Browse the repository at this point in the history
  • Loading branch information
PaParaZz1 committed Apr 2, 2024
1 parent 4507d30 commit 1834a8c
Show file tree
Hide file tree
Showing 15 changed files with 820 additions and 17 deletions.
4 changes: 2 additions & 2 deletions 13_envs/bitflip_zh.html
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
<link rel="stylesheet" href="../_static/css/style.css" type="text/css" />
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="LunarLander" href="lunarlander_zh.html" />
<link rel="next" title="FrozenLake" href="frozen_lake_zh.html" />
<link rel="prev" title="Acrobot" href="acrobot_zh.html" />
<link href="../_static/css/style.css" rel="stylesheet" type="text/css">

Expand Down Expand Up @@ -435,7 +435,7 @@ <h2>参考资料<a class="headerlink" href="#id4" title="Permalink to this headl

<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">

<a href="lunarlander_zh.html" class="btn btn-neutral float-right" title="LunarLander" accesskey="n"
<a href="frozen_lake_zh.html" class="btn btn-neutral float-right" title="FrozenLake" accesskey="n"
rel="next">Next <img src="../_static/images/chevron-right-blue.svg"
class="next-page"></a>

Expand Down
605 changes: 605 additions & 0 deletions 13_envs/frozen_lake_zh.html

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions 13_envs/index_zh.html
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,7 @@ <h1>强化学习环境示例<a class="headerlink" href="#id1" title="Permalink t
<li class="toctree-l1"><a class="reference internal" href="pendulum_zh.html">Pendulum</a></li>
<li class="toctree-l1"><a class="reference internal" href="acrobot_zh.html">Acrobot</a></li>
<li class="toctree-l1"><a class="reference internal" href="bitflip_zh.html">BitFlip</a></li>
<li class="toctree-l1"><a class="reference internal" href="frozen_lake_zh.html">FrozenLake</a></li>
<li class="toctree-l1"><a class="reference internal" href="lunarlander_zh.html">LunarLander</a></li>
<li class="toctree-l1"><a class="reference internal" href="bipedalwalker_zh.html">BipedalWalker</a></li>
<li class="toctree-l1"><a class="reference internal" href="minigrid_zh.html">MiniGrid</a></li>
Expand Down
4 changes: 2 additions & 2 deletions 13_envs/lunarlander_zh.html
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="BipedalWalker" href="bipedalwalker_zh.html" />
<link rel="prev" title="BitFlip" href="bitflip_zh.html" />
<link rel="prev" title="FrozenLake" href="frozen_lake_zh.html" />
<link href="../_static/css/style.css" rel="stylesheet" type="text/css">


Expand Down Expand Up @@ -622,7 +622,7 @@ <h2>基准算法性能<a class="headerlink" href="#id23" title="Permalink to thi
class="next-page"></a>


<a href="bitflip_zh.html" class="btn btn-neutral" title="BitFlip" accesskey="p"
<a href="frozen_lake_zh.html" class="btn btn-neutral" title="FrozenLake" accesskey="p"
rel="prev"><img src="../_static/images/chevron-right-blue.svg" class="previous-page"> Previous</a>

</div>
Expand Down
10 changes: 8 additions & 2 deletions 13_envs/mujoco.html
Original file line number Diff line number Diff line change
Expand Up @@ -309,7 +309,11 @@ <h2>Overview<a class="headerlink" href="#overview" title="Permalink to this head
<h2>Install<a class="headerlink" href="#install" title="Permalink to this headline"></a></h2>
<section id="installation-method">
<h3>Installation Method<a class="headerlink" href="#installation-method" title="Permalink to this headline"></a></h3>
<p>install the gym, mujoco and mujoco-py libraries, which can be installed by one-click pip or combined with DI-engine.</p>
<p>First, install a MuJoCo library of a specific version in your operation system.
Then, install three Python libraries, gym, mujoco and mujoco-py , which can be installed by one-click pip or combined with DI-engine.</p>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>pip<span class="w"> </span>install<span class="w"> </span>DI-engine<span class="o">[</span>common_env,video<span class="o">]</span>
</pre></div>
</div>
<p>Note:</p>
<ol class="arabic simple">
<li><p>The mujoco library is open-source and free to public, and thus no longer requires an activation license. You can use Deepmind’s latest mujoco library, or use OpenAI’s mujoco-py.</p></li>
Expand Down Expand Up @@ -487,7 +491,9 @@ <h3>Store Video<a class="headerlink" href="#store-video" title="Permalink to thi
<p>After the environment is created, but before reset, call the<code class="docutils literal notranslate"><span class="pre">enable_save_replay</span></code>method to specify the path to save the game replay. The environment will automatically save the local video files after each episode ends. (The default call <code class="docutils literal notranslate"><span class="pre">gym.wrappers.RecordVideo</span></code>implementation), the code shown below will run an environment episode, and save the result of this episode in a folder <code class="docutils literal notranslate"><span class="pre">./video/</span></code>:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">easydict</span> <span class="kn">import</span> <span class="n">EasyDict</span>
<span class="kn">from</span> <span class="nn">dizoo.mujoco.envs</span> <span class="kn">import</span> <span class="n">MujocoEnv</span>
<span class="n">env</span> <span class="o">=</span> <span class="n">MujocoEnv</span><span class="p">(</span><span class="n">EasyDict</span><span class="p">({</span><span class="s1">&#39;env_id&#39;</span><span class="p">:</span> <span class="s1">&#39;Hoopper-v3&#39;</span> <span class="p">}))</span>
<span class="n">config</span> <span class="o">=</span> <span class="n">MujocoEnv</span><span class="o">.</span><span class="n">default_config</span><span class="p">()</span>
<span class="n">config</span><span class="o">.</span><span class="n">env_id</span><span class="o">=</span><span class="s2">&quot;Hopper-v3&quot;</span>
<span class="n">env</span> <span class="o">=</span> <span class="n">MujocoEnv</span><span class="p">(</span><span class="n">config</span><span class="p">)</span>
<span class="n">env</span><span class="o">.</span><span class="n">enable_save_replay</span><span class="p">(</span><span class="n">replay_path</span><span class="o">=</span><span class="s1">&#39;./video&#39;</span><span class="p">)</span>
<span class="n">obs</span> <span class="o">=</span> <span class="n">env</span><span class="o">.</span><span class="n">reset</span><span class="p">()</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
Expand Down
11 changes: 7 additions & 4 deletions 13_envs/mujoco_zh.html
Original file line number Diff line number Diff line change
Expand Up @@ -309,7 +309,10 @@ <h2>概述<a class="headerlink" href="#id1" title="Permalink to this headline">
<h2>安装<a class="headerlink" href="#id2" title="Permalink to this headline"></a></h2>
<section id="id3">
<h3>安装方法<a class="headerlink" href="#id3" title="Permalink to this headline"></a></h3>
<p>安装 gym, mujoco 与 mujoco-py 即可,可以通过 pip 一键安装或结合 DI-engine 安装</p>
<p>首先根据所需版本安装 MuJoCo 。然后安装 gym, mujoco 与 mujoco-py 三个 Python 库即可,可以通过 pip 一键安装或结合 DI-engine 安装:</p>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>pip<span class="w"> </span>install<span class="w"> </span>DI-engine<span class="o">[</span>common_env,video<span class="o">]</span>
</pre></div>
</div>
<p>注:</p>
<ol class="arabic simple">
<li><p>mujoco 最新版目前已经开源免费,不再需要激活许可。你可以使用 Deepmind 最新的 mujoco 库,或使用 OpenAI 的 mujoco-py 。</p></li>
Expand Down Expand Up @@ -490,11 +493,11 @@ <h3>存储录像<a class="headerlink" href="#id20" title="Permalink to this head
<p>在环境创建之后,重置之前,调用<code class="docutils literal notranslate"><span class="pre">enable_save_replay</span></code>方法,指定游戏录像保存的路径。环境会在每个 episode 结束之后自动保存本局的录像文件。(默认调用<code class="docutils literal notranslate"><span class="pre">gym.wrappers.RecordVideo</span></code>实现 ),下面所示的代码将运行一个环境 episode,并将这个 episode 的结果保存在<code class="docutils literal notranslate"><span class="pre">./video/</span></code>中:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">easydict</span> <span class="kn">import</span> <span class="n">EasyDict</span>
<span class="kn">from</span> <span class="nn">dizoo.mujoco.envs</span> <span class="kn">import</span> <span class="n">MujocoEnv</span>

<span class="n">env</span> <span class="o">=</span> <span class="n">MujocoEnv</span><span class="p">(</span><span class="n">EasyDict</span><span class="p">({</span><span class="s1">&#39;env_id&#39;</span><span class="p">:</span> <span class="s1">&#39;Hoopper-v3&#39;</span> <span class="p">}))</span>
<span class="n">config</span> <span class="o">=</span> <span class="n">MujocoEnv</span><span class="o">.</span><span class="n">default_config</span><span class="p">()</span>
<span class="n">config</span><span class="o">.</span><span class="n">env_id</span><span class="o">=</span><span class="s2">&quot;Hopper-v3&quot;</span>
<span class="n">env</span> <span class="o">=</span> <span class="n">MujocoEnv</span><span class="p">(</span><span class="n">config</span><span class="p">)</span>
<span class="n">env</span><span class="o">.</span><span class="n">enable_save_replay</span><span class="p">(</span><span class="n">replay_path</span><span class="o">=</span><span class="s1">&#39;./video&#39;</span><span class="p">)</span>
<span class="n">obs</span> <span class="o">=</span> <span class="n">env</span><span class="o">.</span><span class="n">reset</span><span class="p">()</span>

<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="n">action</span> <span class="o">=</span> <span class="n">env</span><span class="o">.</span><span class="n">random_action</span><span class="p">()</span>
<span class="n">timestep</span> <span class="o">=</span> <span class="n">env</span><span class="o">.</span><span class="n">step</span><span class="p">(</span><span class="n">action</span><span class="p">)</span>
Expand Down
Binary file added _images/FrozenLake.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/frozen_lake_dqn.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
175 changes: 175 additions & 0 deletions _sources/13_envs/frozen_lake_zh.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
FrozenLake
~~~~~~~~~~~~~~~~~~

概述
=======
FrozenLake 是强化学习中的经典控制问题。需要控制智能体在冰冻湖面上行进,从起点穿越冰冻湖到达目标点,且不会掉到任何冰洞里
。如下图所示。

.. image:: ./images/FrozenLake.gif
:align: center
:scale: 80%

安装
====

安装方法
--------

FrozenLake 环境内置在 gymnasium 中,直接安装 gymnasium 即可。其环境 id 是\ ``FrozenLake-v1`` \。

.. code:: shell
pip install gymnasium
验证安装
--------

在 Python 命令行中运行如下命令验证安装成功。

.. code:: shell
import gymnasium
env = gymnasium.make('FrozenLake-v1', desc=None, map_name="4x4", is_slippery=True)
obs = env.reset()
print(obs)
assert env.observation_space.shape == gymnasium.spaces.Discrete(16)
assert env.action_space == gymnasium.spaces.Discrete(4)
环境介绍
=========

动作空间
----------

FrozenLake 的动作空间属于离散动作空间,动作形状为 (1,) ,范围为 {0, 3} ,表示玩家移动的方向。

- \ ``0:`` \: 向左移动

- \ ``1:`` \: 向下移动

- \ ``2:`` \: 向右移动

- \ ``3:`` \: 向上移动

使用 gymnasium 环境空间定义则可表示为:

.. code:: python
action_space = gymnasium.spaces.Discrete(4)
状态空间
----------

状态空间是一个代表玩家当前位置的值,即 \ ``current_row * nrows + current_col`` \ (其中行和列都从 0 开始)。


- \ ``15`` \: (4x4 地图中的目标位置通过下面方式计算:\ ``3*4+3 = 15`` \)

奖励空间
-----------
- \ ``+1`` \: 到达目标

- \ ``0`` \: 到达冰洞

- \ ``0`` \: 前进


终止条件
------------
FrozenLake 环境每个 episode 的终止条件是遇到以下任何一种情况:

- 玩家掉入冰洞。
- 玩家到达终点(位置 \ ``max(nrow) * max(ncol) - 1`` \)
- 达到 episode 的最大 step,默认为 ``100``。


DI-zoo 可运行代码示例
=====================


完整的训练配置文件在 `github
link <https://github.com/opendilab/DI-engine/tree/main/dizoo/frozen_lake/config>`__
内,对于具体的配置文件,例如\ ``frozen_lake_dqn_config.py``\ ,使用如下的 demo 即可运行:

.. code:: python
from easydict import EasyDict
frozen_lake_dqn_config = dict(
exp_name='frozen_lake_seed0',
env=dict(
collector_env_num=8,
evaluator_env_num=5,
n_evaluator_episode=10,
env_id='FrozenLake-v1',
desc=None,
map_name="4x4",
is_slippery=False,
save_replay_gif=False,
),
policy=dict(
cuda=True,
load_path='frozen_lake_seed0/ckpt/ckpt_best.pth.tar',
model=dict(
obs_shape=16,
action_shape=4,
encoder_hidden_size_list=[128, 128, 64],
dueling=True,
),
nstep=3,
discount_factor=0.97,
learn=dict(
update_per_collect=5,
batch_size=256,
learning_rate=0.001,
),
collect=dict(n_sample=10),
eval=dict(evaluator=dict(eval_freq=40, )),
other=dict(
eps=dict(
type='exp',
start=0.8,
end=0.1,
decay=10000,
),
replay_buffer=dict(replay_buffer_size=20000, ),
),
),
)
frozen_lake_dqn_config = EasyDict(frozen_lake_dqn_config)
main_config = frozen_lake_dqn_config
frozen_lake_dqn_create_config = dict(
env=dict(
type='frozen_lake',
import_names=['dizoo.frozen_lake.envs.frozen_lake_env'],
),
env_manager=dict(type='base'),
policy=dict(type='dqn'),
replay_buffer=dict(type='deque', import_names=['ding.data.buffer.deque_buffer_wrapper']),
)
frozen_lake_dqn_create_config = EasyDict(frozen_lake_dqn_create_config)
create_config = frozen_lake_dqn_create_config
if __name__ == "__main__":
# or you can enter `ding -m serial -c frozen_lake_dqn_config.py -s 0`
from ding.entry import serial_pipeline
serial_pipeline((main_config, create_config), max_env_step=5000, seed=0)
基准算法性能
=================
使用 DQN 算法的实验结果如下。横坐标是\ ``step`` \,纵坐标是\ ``reward_mean`` \。

.. image:: ./images/frozen_lake_dqn.jpg
:align: center
:scale: 80%


参考资料
=====================
- FrozenLake `源码 <https://github.com/opendilab/DI-engine/tree/main/dizoo/frozen_lake>`__

1 change: 1 addition & 0 deletions _sources/13_envs/index_zh.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
pendulum_zh
acrobot_zh
bitflip_zh
frozen_lake_zh

lunarlander_zh
bipedalwalker_zh
Expand Down
11 changes: 9 additions & 2 deletions _sources/13_envs/mujoco.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,12 @@ Install
Installation Method
--------------------

install the gym, mujoco and mujoco-py libraries, which can be installed by one-click pip or combined with DI-engine.
First, install a MuJoCo library of a specific version in your operation system.
Then, install three Python libraries, gym, mujoco and mujoco-py , which can be installed by one-click pip or combined with DI-engine.

.. code:: shell
pip install DI-engine[common_env,video]
Note:

Expand Down Expand Up @@ -210,7 +215,9 @@ After the environment is created, but before reset, call the\ ``enable_save_repl
from easydict import EasyDict
from dizoo.mujoco.envs import MujocoEnv
env = MujocoEnv(EasyDict({'env_id': 'Hoopper-v3' }))
config = MujocoEnv.default_config()
config.env_id="Hopper-v3"
env = MujocoEnv(config)
env.enable_save_replay(replay_path='./video')
obs = env.reset()
while True:
Expand Down
12 changes: 8 additions & 4 deletions _sources/13_envs/mujoco_zh.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,11 @@ Mujoco 是旨在促进机器人、生物力学、图形和动画等需要快速
安装方法
--------

安装 gym, mujoco 与 mujoco-py 即可,可以通过 pip 一键安装或结合 DI-engine 安装
首先根据所需版本安装 MuJoCo 。然后安装 gym, mujoco 与 mujoco-py 三个 Python 库即可,可以通过 pip 一键安装或结合 DI-engine 安装:

.. code:: shell
pip install DI-engine[common_env,video]
注:

Expand Down Expand Up @@ -220,11 +224,11 @@ hub <https://hub.docker.com/r/opendilab/ding>`_ 获取更多镜像
from easydict import EasyDict
from dizoo.mujoco.envs import MujocoEnv
env = MujocoEnv(EasyDict({'env_id': 'Hoopper-v3' }))
config = MujocoEnv.default_config()
config.env_id="Hopper-v3"
env = MujocoEnv(config)
env.enable_save_replay(replay_path='./video')
obs = env.reset()
while True:
action = env.random_action()
timestep = env.step(action)
Expand Down
1 change: 1 addition & 0 deletions index_zh.html
Original file line number Diff line number Diff line change
Expand Up @@ -424,6 +424,7 @@ <h1>欢迎来到 DI-engine 中文文档<a class="headerlink" href="#di-engine" t
<li class="toctree-l2"><a class="reference internal" href="13_envs/pendulum_zh.html">Pendulum</a></li>
<li class="toctree-l2"><a class="reference internal" href="13_envs/acrobot_zh.html">Acrobot</a></li>
<li class="toctree-l2"><a class="reference internal" href="13_envs/bitflip_zh.html">BitFlip</a></li>
<li class="toctree-l2"><a class="reference internal" href="13_envs/frozen_lake_zh.html">FrozenLake</a></li>
<li class="toctree-l2"><a class="reference internal" href="13_envs/lunarlander_zh.html">LunarLander</a></li>
<li class="toctree-l2"><a class="reference internal" href="13_envs/bipedalwalker_zh.html">BipedalWalker</a></li>
<li class="toctree-l2"><a class="reference internal" href="13_envs/minigrid_zh.html">MiniGrid</a></li>
Expand Down
Binary file modified objects.inv
Binary file not shown.
2 changes: 1 addition & 1 deletion searchindex.js

Large diffs are not rendered by default.

0 comments on commit 1834a8c

Please sign in to comment.