Deploying to gh-pages from @ a45d051 🚀

opendilab · May 8, 2024 · b5020cf · b5020cf
1 parent 2cd61d8
commit b5020cf
Show file tree

Hide file tree

Showing 315 changed files with 6,167 additions and 19,155 deletions.
diff --git a/.buildinfo b/.buildinfo
@@ -1,4 +1,4 @@
 # Sphinx build info version 1
 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
-config: 1d5a9bfa96ae3725f2dd845dc376559c
+config: af8074679ce869171777ff5aa94c8d3a
 tags: 645f666f9bcd5a90fca523b33c5a78b7
diff --git a/00_intro/index.html b/00_intro/index.html
@@ -9,7 +9,7 @@
 
 <head>
   <meta charset="utf-8">
-  <meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
+  <meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
 
   <meta name="viewport" content="width=device-width, initial-scale=1.0">
 
@@ -297,9 +297,9 @@
               <article itemprop="articleBody" id="pytorch-article" class="pytorch-article">
 
   <section id="introduction">
-<h1>Introduction<a class="headerlink" href="#introduction" title="Permalink to this headline">¶</a></h1>
+<h1>Introduction<a class="headerlink" href="#introduction" title="Permalink to this heading">¶</a></h1>
 <section id="what-is-di-engine">
-<h2>What is DI-engine?<a class="headerlink" href="#what-is-di-engine" title="Permalink to this headline">¶</a></h2>
+<h2>What is DI-engine?<a class="headerlink" href="#what-is-di-engine" title="Permalink to this heading">¶</a></h2>
 <p>DI-engine is a decision intelligence engine for PyTorch and JAX built by a group of enthusiastic researchers and engineers.</p>
 <p>It provides python-first and asynchronous-native task and middleware abstractions, and modularly integrates several of the most important decision-making concepts: Env, Policy and Model. Based on the above mechanisms, DI-engine supports various deep reinforcement learning (DRL) algorithms with superior performance, high efficiency, well-organized documentation and unittest, which will provide you with the most professional and convenient assistance for your reinforcement learning algorithm research and development work, mainly including:</p>
 <ol class="arabic simple">
@@ -312,7 +312,7 @@ <h2>What is DI-engine?<a class="headerlink" href="#what-is-di-engine" title="Per
 <img alt="../_images/system_layer.png" src="../_images/system_layer.png" />
 </section>
 <section id="key-concepts">
-<h2>Key Concepts<a class="headerlink" href="#key-concepts" title="Permalink to this headline">¶</a></h2>
+<h2>Key Concepts<a class="headerlink" href="#key-concepts" title="Permalink to this heading">¶</a></h2>
 <p>If you are not familiar with reinforcement learning, you can go to our <a class="reference external" href="../10_concepts/index_zh.html">reinforcement learning tutorial</a> for a glimpse into the wonderful world of reinforcement learning.</p>
 <p>If you have already been exposed to reinforcement learning, you will already be familiar with the basic interaction objects of reinforcement learning: <strong>environments</strong> and <strong>agents (or the policies that make them up)</strong>.</p>
 <p>Instead of creating more concepts, the DI-engine abstracts the complex interaction logic between the two into declarative middleware, such as <strong>collect</strong>, <strong>train</strong>, <strong>evaluate</strong>, and <strong>save_ckpt</strong>. You can adapt each part of the process in the most natural way.</p>
@@ -385,9 +385,8 @@ <h2>Key Concepts<a class="headerlink" href="#key-concepts" title="Permalink to t
   <script type="text/javascript" id="documentation_options" data-url_root="../"
     src="../_static/documentation_options.js"></script>
   <script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
-  <script src="../_static/jquery.js"></script>
-  <script src="../_static/underscore.js"></script>
   <script src="../_static/doctools.js"></script>
+  <script src="../_static/sphinx_highlight.js"></script>
 
 
 

diff --git a/00_intro/index_zh.html b/00_intro/index_zh.html
@@ -9,7 +9,7 @@
 
 <head>
   <meta charset="utf-8">
-  <meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
+  <meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
 
   <meta name="viewport" content="width=device-width, initial-scale=1.0">
 
@@ -297,9 +297,9 @@
               <article itemprop="articleBody" id="pytorch-article" class="pytorch-article">
 
   <section id="di-engine">
-<h1>DI-engine 简介<a class="headerlink" href="#di-engine" title="Permalink to this headline">¶</a></h1>
+<h1>DI-engine 简介<a class="headerlink" href="#di-engine" title="Permalink to this heading">¶</a></h1>
 <section id="id1">
-<h2>了解 DI-engine<a class="headerlink" href="#id1" title="Permalink to this headline">¶</a></h2>
+<h2>了解 DI-engine<a class="headerlink" href="#id1" title="Permalink to this heading">¶</a></h2>
 <p>DI-engine 是由一群充满活力的研究员和工程师打造的开源决策智能平台，它将为您的强化学习算法研究和开发工作提供最专业最便捷的帮助，主要包括：</p>
 <ol class="arabic simple">
 <li><p>完整的算法支持，例如 DQN，PPO，SAC 以及许多研究子领域的相关算法——多智能体强化学习中的 QMIX，逆强化学习中的 GAIL，探索问题中的 RND 等等。</p></li>
@@ -309,7 +309,7 @@ <h2>了解 DI-engine<a class="headerlink" href="#id1" title="Permalink to this h
 <img alt="../_images/system_layer.png" src="../_images/system_layer.png" />
 </section>
 <section id="id2">
-<h2>核心概念<a class="headerlink" href="#id2" title="Permalink to this headline">¶</a></h2>
+<h2>核心概念<a class="headerlink" href="#id2" title="Permalink to this heading">¶</a></h2>
 <p>假如您尚未了解强化学习，可以转至我们的 <a class="reference external" href="../10_concepts/index_zh.html">强化学习教程</a> 一窥强化学习的奇妙世界。</p>
 <p>假如您已经接触过强化学习，想必已经非常了解强化学习的基本交互对象： <strong>环境</strong> 和 <strong>智能体（或者构成智能体的策略）</strong>。</p>
 <p>DI-engine 没有创造更多的概念，而是将这两者之间复杂的交互逻辑抽象成了声明式的中间件，例如 <strong>采集数据（collect）</strong>，<strong>训练模型（train）</strong>，<strong>评估模型（evaluate）</strong>，<strong>保存模型（save_ckpt）</strong>，
@@ -383,9 +383,8 @@ <h2>核心概念<a class="headerlink" href="#id2" title="Permalink to this headl
   <script type="text/javascript" id="documentation_options" data-url_root="../"
     src="../_static/documentation_options.js"></script>
   <script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
-  <script src="../_static/jquery.js"></script>
-  <script src="../_static/underscore.js"></script>
   <script src="../_static/doctools.js"></script>
+  <script src="../_static/sphinx_highlight.js"></script>
 
 
 

diff --git a/01_quickstart/first_rl_program.html b/01_quickstart/first_rl_program.html
@@ -9,7 +9,7 @@
 
 <head>
   <meta charset="utf-8">
-  <meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
+  <meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
 
   <meta name="viewport" content="width=device-width, initial-scale=1.0">
 
@@ -299,14 +299,14 @@
               <article itemprop="articleBody" id="pytorch-article" class="pytorch-article">
 
   <section id="first-reinforcement-learning-program">
-<h1>First Reinforcement Learning Program<a class="headerlink" href="#first-reinforcement-learning-program" title="Permalink to this headline">¶</a></h1>
+<h1>First Reinforcement Learning Program<a class="headerlink" href="#first-reinforcement-learning-program" title="Permalink to this heading">¶</a></h1>
 <div class="toctree-wrapper compound">
 </div>
 <p>Reinforcement learning is a promising algorithm for making a decision-intelligence artificial agent, among many machine learning algorithms.
 CartPole is the ideal learning environment for an introduction to reinforcement learning, and using the DQN algorithm allows CartPole to converge (maintain equilibrium) in a very short time. We will introduce the use of DI-engine based on CartPole + DQN.</p>
 <a class="reference internal image-reference" href="../_images/cartpole_cmp.gif"><img alt="../_images/cartpole_cmp.gif" class="align-center" src="../_images/cartpole_cmp.gif" style="width: 1000px;" /></a>
 <section id="using-the-configuration-file">
-<h2>Using the Configuration File<a class="headerlink" href="#using-the-configuration-file" title="Permalink to this headline">¶</a></h2>
+<h2>Using the Configuration File<a class="headerlink" href="#using-the-configuration-file" title="Permalink to this heading">¶</a></h2>
 <p>The DI-engine uses a global configuration file to control all variables of the environment and strategy, each of which has a corresponding default configuration that can be found in <a class="reference external" href="https://github.com/opendilab/DI-engine/blob/main/dizoo/classic_control/cartpole/config/cartpole_dqn_config.py">cartpole_dqn_config</a>, in the tutorial we use the default configuration directly:</p>
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">dizoo.classic_control.cartpole.config.cartpole_dqn_config</span> <span class="kn">import</span> <span class="n">main_config</span><span class="p">,</span> <span class="n">create_config</span>
 <span class="kn">from</span> <span class="nn">ding.config</span> <span class="kn">import</span> <span class="n">compile_config</span>
@@ -316,7 +316,7 @@ <h2>Using the Configuration File<a class="headerlink" href="#using-the-configura
 </div>
 </section>
 <section id="initialize-the-environments">
-<h2>Initialize the Environments<a class="headerlink" href="#initialize-the-environments" title="Permalink to this headline">¶</a></h2>
+<h2>Initialize the Environments<a class="headerlink" href="#initialize-the-environments" title="Permalink to this heading">¶</a></h2>
 <p>In reinforcement learning, there may be a difference in the strategy for collecting environment data between the training process and the evaluation process, for example, the training process tends to train one epoch for n steps of collection, while the evaluation process requires completing the whole game to get a score. We recommend that the collection and evaluation environments be initialized separately as follows.</p>
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">ding.envs</span> <span class="kn">import</span> <span class="n">DingEnvWrapper</span><span class="p">,</span> <span class="n">BaseEnvManagerV2</span>
 
@@ -336,7 +336,7 @@ <h2>Initialize the Environments<a class="headerlink" href="#initialize-the-envir
 </div>
 </section>
 <section id="select-policy">
-<h2>Select Policy<a class="headerlink" href="#select-policy" title="Permalink to this headline">¶</a></h2>
+<h2>Select Policy<a class="headerlink" href="#select-policy" title="Permalink to this heading">¶</a></h2>
 <p>DI-engine covers most of the reinforcement learning policies, using them only requires selecting the right policy and model.
 Since DQN is off-policy, we also need to instantiate a buffer module.</p>
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">ding.model</span> <span class="kn">import</span> <span class="n">DQN</span>
@@ -350,7 +350,7 @@ <h2>Select Policy<a class="headerlink" href="#select-policy" title="Permalink to
 </div>
 </section>
 <section id="build-the-pipeline">
-<h2>Build the Pipeline<a class="headerlink" href="#build-the-pipeline" title="Permalink to this headline">¶</a></h2>
+<h2>Build the Pipeline<a class="headerlink" href="#build-the-pipeline" title="Permalink to this heading">¶</a></h2>
 <p>With the various middleware provided by DI-engine, we can easily build the entire pipeline:</p>
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">ding.framework</span> <span class="kn">import</span> <span class="n">task</span>
 <span class="kn">from</span> <span class="nn">ding.framework.context</span> <span class="kn">import</span> <span class="n">OnlineRLContext</span>
@@ -370,7 +370,7 @@ <h2>Build the Pipeline<a class="headerlink" href="#build-the-pipeline" title="Pe
 </div>
 </section>
 <section id="run-the-code">
-<h2>Run the Code<a class="headerlink" href="#run-the-code" title="Permalink to this headline">¶</a></h2>
+<h2>Run the Code<a class="headerlink" href="#run-the-code" title="Permalink to this heading">¶</a></h2>
 <p>The full example can be found in <a class="reference external" href="https://github.com/opendilab/DI-engine/blob/main/ding/example/dqn.py">DQN example</a> and can be run via <code class="docutils literal notranslate"><span class="pre">python</span> <span class="pre">dqn.py</span></code>.
 In addition, we also provide the <a class="reference external" href="https://colab.research.google.com/drive/1K3DGi3dOT9fhFqa6bBtinwCDdWkOM3zE?usp=sharing">Colab Running Example</a> from the DI-engine installation to training for reference.</p>
 <a class="reference internal image-reference" href="../_images/train_dqn.gif"><img alt="../_images/train_dqn.gif" class="align-center" src="../_images/train_dqn.gif" style="width: 1000px;" /></a>
@@ -446,9 +446,8 @@ <h2>Run the Code<a class="headerlink" href="#run-the-code" title="Permalink to t
   <script type="text/javascript" id="documentation_options" data-url_root="../"
     src="../_static/documentation_options.js"></script>
   <script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
-  <script src="../_static/jquery.js"></script>
-  <script src="../_static/underscore.js"></script>
   <script src="../_static/doctools.js"></script>
+  <script src="../_static/sphinx_highlight.js"></script>
 
 
 

diff --git a/01_quickstart/first_rl_program_zh.html b/01_quickstart/first_rl_program_zh.html
@@ -9,7 +9,7 @@
 
 <head>
   <meta charset="utf-8">
-  <meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
+  <meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
 
   <meta name="viewport" content="width=device-width, initial-scale=1.0">
 
@@ -299,15 +299,15 @@
               <article itemprop="articleBody" id="pytorch-article" class="pytorch-article">
 
   <section id="id1">
-<h1>揭秘第一个强化学习程序<a class="headerlink" href="#id1" title="Permalink to this headline">¶</a></h1>
+<h1>揭秘第一个强化学习程序<a class="headerlink" href="#id1" title="Permalink to this heading">¶</a></h1>
 <div class="toctree-wrapper compound">
 </div>
 <p>强化学习算法是众多获得决策智能体的机器学习算法之一。
 CartPole 是强化学习入门的理想学习环境，使用 DQN 算法可以在很短的时间内让 CartPole 收敛（保持平衡）。
 我们将基于 CartPole + DQN 介绍一下 DI-engine 的用法。</p>
 <a class="reference internal image-reference" href="../_images/cartpole_cmp.gif"><img alt="../_images/cartpole_cmp.gif" class="align-center" src="../_images/cartpole_cmp.gif" style="width: 1000px;" /></a>
 <section id="id2">
-<h2>使用配置文件<a class="headerlink" href="#id2" title="Permalink to this headline">¶</a></h2>
+<h2>使用配置文件<a class="headerlink" href="#id2" title="Permalink to this heading">¶</a></h2>
 <p>DI-engine 使用一个全局的配置文件来控制环境和策略的所有变量，每个环境和策略都有对应的默认配置，这个样例使用的完整配置可以在
 <a class="reference external" href="https://github.com/opendilab/DI-engine/blob/main/dizoo/classic_control/cartpole/config/cartpole_dqn_config.py">cartpole_dqn_config</a>
 看到，在教程中我们直接调用即可：</p>
@@ -319,7 +319,7 @@ <h2>使用配置文件<a class="headerlink" href="#id2" title="Permalink to this
 </div>
 </section>
 <section id="id3">
-<h2>初始化采集环境和评估环境<a class="headerlink" href="#id3" title="Permalink to this headline">¶</a></h2>
+<h2>初始化采集环境和评估环境<a class="headerlink" href="#id3" title="Permalink to this heading">¶</a></h2>
 <p>在强化学习中，训练阶段和评估阶段和环境交互的策略可能有区别，例如训练阶段往往是采集 n 个步骤就训练一次，且需要一些额外信息帮助训练
 而评估阶段则需要完成整局游戏才能得到评分，且只考虑性能评价指标本身。我们推荐将采集和评估环境分开初始化：</p>
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">ding.envs</span> <span class="kn">import</span> <span class="n">DingEnvWrapper</span><span class="p">,</span> <span class="n">BaseEnvManagerV2</span>
@@ -341,7 +341,7 @@ <h2>初始化采集环境和评估环境<a class="headerlink" href="#id3" title=
 </div>
 </section>
 <section id="id4">
-<h2>选择策略<a class="headerlink" href="#id4" title="Permalink to this headline">¶</a></h2>
+<h2>选择策略<a class="headerlink" href="#id4" title="Permalink to this heading">¶</a></h2>
 <p>DI-engine 集成了大部分强化学习策略，使用它们只需要选择相应的模型和策略即可（完整的策略列表可以参考 <a class="reference external" href="https://github.com/opendilab/DI-engine#algorithm-versatility">Policy Zoo</a> ）。
 由于 DQN 是一个 off-policy 策略，所以我们还需要实例化一个 buffer 模块。</p>
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">ding.model</span> <span class="kn">import</span> <span class="n">DQN</span>
@@ -355,7 +355,7 @@ <h2>选择策略<a class="headerlink" href="#id4" title="Permalink to this headl
 </div>
 </section>
 <section id="id5">
-<h2>构建训练管线<a class="headerlink" href="#id5" title="Permalink to this headline">¶</a></h2>
+<h2>构建训练管线<a class="headerlink" href="#id5" title="Permalink to this heading">¶</a></h2>
 <p>利用 DI-engine 提供的各类中间件，我们可以很容易的构建整个训练管线，各个中间件的功能和使用方法可以参考 <a class="reference external" href="https://di-engine-docs.readthedocs.io/zh_CN/latest/03_system/middleware_zh.html#id1">中间件入门</a> ：</p>
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">ding.framework</span> <span class="kn">import</span> <span class="n">task</span>
 <span class="kn">from</span> <span class="nn">ding.framework.context</span> <span class="kn">import</span> <span class="n">OnlineRLContext</span>
@@ -373,7 +373,7 @@ <h2>构建训练管线<a class="headerlink" href="#id5" title="Permalink to this
 </div>
 </section>
 <section id="id7">
-<h2>运行代码<a class="headerlink" href="#id7" title="Permalink to this headline">¶</a></h2>
+<h2>运行代码<a class="headerlink" href="#id7" title="Permalink to this heading">¶</a></h2>
 <p>完整的示例代码可以在 <a class="reference external" href="https://github.com/opendilab/DI-engine/blob/main/ding/example/dqn.py">DQN example</a> 中找到，通过 <code class="docutils literal notranslate"><span class="pre">python3</span> <span class="pre">-u</span> <span class="pre">dqn.py</span></code> 即可运行代码，下面的 gif 便是一个具体运行的例子。
 此外，我们提供了从 DI-engine 安装到训练的全过程 <a class="reference external" href="https://colab.research.google.com/drive/1K3DGi3dOT9fhFqa6bBtinwCDdWkOM3zE?usp=sharing">Colab 运行示例</a> 作为参考。</p>
 <a class="reference internal image-reference" href="../_images/train_dqn.gif"><img alt="../_images/train_dqn.gif" class="align-center" src="../_images/train_dqn.gif" style="width: 1000px;" /></a>
@@ -450,9 +450,8 @@ <h2>运行代码<a class="headerlink" href="#id7" title="Permalink to this headl
   <script type="text/javascript" id="documentation_options" data-url_root="../"
     src="../_static/documentation_options.js"></script>
   <script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
-  <script src="../_static/jquery.js"></script>
-  <script src="../_static/underscore.js"></script>
   <script src="../_static/doctools.js"></script>
+  <script src="../_static/sphinx_highlight.js"></script>