docs fix and v0.2.5 (#156)

* pre * update docs * update docs * $ in bash * size -> hidden_layer_size * doctest * doctest again * filter a warning * fix bug * fix examples * test fail * test succ
thu-ml · Jul 22, 2020 · bd9c3c7 · bd9c3c7
1 parent 089b85b
commit bd9c3c7
Show file tree

Hide file tree

Showing 21 changed files with 139 additions and 122 deletions.
diff --git a/.github/ISSUE_TEMPLATE.md b/.github/ISSUE_TEMPLATE.md
@@ -3,15 +3,10 @@
     + [ ] RL algorithm bug
     + [ ] documentation request (i.e. "X is missing from the documentation.")
     + [ ] new feature request
-- [ ] I have visited the [source website], and in particular read the [known issues]
-- [ ] I have searched through the [issue tracker] and [issue categories] for duplicates
+- [ ] I have visited the [source website](https://github.com/thu-ml/tianshou/)
+- [ ] I have searched through the [issue tracker](https://github.com/thu-ml/tianshou/issues) for duplicates
 - [ ] I have mentioned version numbers, operating system and environment, where applicable:
   ```python
   import tianshou, torch, sys
   print(tianshou.__version__, torch.__version__, sys.version, sys.platform)
   ```
-
-  [source website]: https://github.com/thu-ml/tianshou/
-  [known issues]: https://github.com/thu-ml/tianshou/#faq-and-known-issues
-  [issue categories]: https://github.com/thu-ml/tianshou/projects/2
-  [issue tracker]: https://github.com/thu-ml/tianshou/issues?q=
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -7,15 +7,10 @@
 
 Less important but also useful:
 
-- [ ] I have visited the [source website], and in particular read the [known issues]
-- [ ] I have searched through the [issue tracker] and [issue categories] for duplicates
+- [ ] I have visited the [source website](https://github.com/thu-ml/tianshou)
+- [ ] I have searched through the [issue tracker](https://github.com/thu-ml/tianshou/issues) for duplicates
 - [ ] I have mentioned version numbers, operating system and environment, where applicable:
   ```python
   import tianshou, torch, sys
   print(tianshou.__version__, torch.__version__, sys.version, sys.platform)
   ```
-
-  [source website]: https://github.com/thu-ml/tianshou
-  [known issues]: https://github.com/thu-ml/tianshou/#faq-and-known-issues
-  [issue categories]: https://github.com/thu-ml/tianshou/projects/2
-  [issue tracker]: https://github.com/thu-ml/tianshou/issues?q=
diff --git a/.github/workflows/flake8.yml → .github/workflows/lint_and_docs.yml b/.github/workflows/flake8.yml → .github/workflows/lint_and_docs.yml
@@ -1,4 +1,4 @@
-name: PEP8 Check
+name: PEP8 and Docs Check
 
 on: [push, pull_request]
 
@@ -11,9 +11,20 @@ jobs:
       uses: actions/setup-python@v2
       with:
         python-version: 3.8
+    - name: Upgrade pip
+      run: |
+        python -m pip install --upgrade pip setuptools wheel
     - name: Install dependencies
       run: |
         python -m pip install flake8
     - name: Lint with flake8
       run: |
         flake8 . --count --show-source --statistics
+    - name: Install dependencies
+      run: |
+        pip install ".[dev]" --upgrade
+    - name: Documentation test
+      run: |
+        cd docs
+        make html SPHINXOPTS="-W"
+        cd ..
diff --git a/README.md b/README.md
@@ -38,7 +38,7 @@ Here is Tianshou's other features:
 - Support any type of environment state (e.g. a dict, a self-defined class, ...) [Usage](https://tianshou.readthedocs.io/en/latest/tutorials/cheatsheet.html#user-defined-environment-and-different-state-representation)
 - Support customized training process [Usage](https://tianshou.readthedocs.io/en/latest/tutorials/cheatsheet.html#customize-training-process)
 - Support n-step returns estimation for all Q-learning based algorithms
-- Support multi-agent RL easily [Usage](https://tianshou.readthedocs.io/en/latest/tutorials/cheatsheet.html##multi-agent-reinforcement-learning)
+- Support multi-agent RL [Usage](https://tianshou.readthedocs.io/en/latest/tutorials/cheatsheet.html##multi-agent-reinforcement-learning)
 
 In Chinese, Tianshou means divinely ordained and is derived to the gift of being born with. Tianshou is a reinforcement learning platform, and the RL algorithm does not learn from humans. So taking "Tianshou" means that there is no teacher to study with, but rather to learn by themselves through constant interaction with the environment.
 
@@ -49,24 +49,27 @@ In Chinese, Tianshou means divinely ordained and is derived to the gift of being
 Tianshou is currently hosted on [PyPI](https://pypi.org/project/tianshou/). It requires Python >= 3.6. You can simply install Tianshou with the following command:
 
 ```bash
-pip3 install tianshou
+$ pip install tianshou
 ```
 
 You can also install with the newest version through GitHub:
 
 ```bash
-pip3 install git+https://github.com/thu-ml/tianshou.git@master
+# latest release
+$ pip install git+https://github.com/thu-ml/tianshou.git@master
+# develop version
+$ pip install git+https://github.com/thu-ml/tianshou.git@dev
 ```
 
 If you use Anaconda or Miniconda, you can install Tianshou through the following command lines:
 
 ```bash
 # create a new virtualenv and install pip, change the env name if you like
-conda create -n myenv pip
+$ conda create -n myenv pip
 # activate the environment
-conda activate myenv
+$ conda activate myenv
 # install tianshou
-pip install tianshou
+$ pip install tianshou
 ```
 
 After installation, open your python console and type
@@ -82,9 +85,9 @@ If no error occurs, you have successfully installed Tianshou.
 
 The tutorials and API documentation are hosted on [tianshou.readthedocs.io](https://tianshou.readthedocs.io/).
 
-The example scripts are under [test/](https://github.com/thu-ml/tianshou/blob/master/test) folder and [examples/](https://github.com/thu-ml/tianshou/blob/master/examples) folder. It may fail to run with PyPI installation, so please re-install the github version through `pip3 install git+https://github.com/thu-ml/tianshou.git@master`.
+The example scripts are under [test/](https://github.com/thu-ml/tianshou/blob/master/test) folder and [examples/](https://github.com/thu-ml/tianshou/blob/master/examples) folder.
 
-中文文档位于 [https://tianshou.readthedocs.io/zh/latest/](https://tianshou.readthedocs.io/zh/latest/)
+中文文档位于 [https://tianshou.readthedocs.io/zh/latest/](https://tianshou.readthedocs.io/zh/latest/)。
 
 <!-- 这里有一份天授平台简短的中文简介：https://www.zhihu.com/question/377263715 -->
 
@@ -95,7 +98,7 @@ The example scripts are under [test/](https://github.com/thu-ml/tianshou/blob/ma
 Tianshou is a lightweight but high-speed reinforcement learning platform. For example, here is a test on a laptop (i7-8750H + GTX1060). It only uses 3 seconds for training an agent based on vanilla policy gradient on the CartPole-v0 task: (seed may be different across different platform and device)
 
 ```bash
-python3 test/discrete/test_pg.py --seed 0 --render 0.03
+$ python3 test/discrete/test_pg.py --seed 0 --render 0.03
 ```
 
 <div align="center">
@@ -108,10 +111,10 @@ We select some of famous reinforcement learning platforms: 2 GitHub repos with m
 | --------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
 | GitHub Stars    | [![GitHub stars](https://img.shields.io/github/stars/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/stargazers) | [![GitHub stars](https://img.shields.io/github/stars/openai/baselines)](https://github.com/openai/baselines/stargazers) | [![GitHub stars](https://img.shields.io/github/stars/hill-a/stable-baselines)](https://github.com/hill-a/stable-baselines/stargazers) | [![GitHub stars](https://img.shields.io/github/stars/ray-project/ray)](https://github.com/ray-project/ray/stargazers) | [![GitHub stars](https://img.shields.io/github/stars/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch)](https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/stargazers) | [![GitHub stars](https://img.shields.io/github/stars/astooke/rlpyt)](https://github.com/astooke/rlpyt/stargazers) |
 | Algo - Task     | PyTorch                                                      | TensorFlow                                                   | TensorFlow                                                   | TF/PyTorch                                                   | PyTorch                                                      | PyTorch                                                      |
-| PG - CartPole   | 6.09±4.60s                                                   | None                                                         | None                                                         | 19.26±2.29s                                                  | None                                                         | ?                                                            |
-| DQN - CartPole  | 6.09±0.87s                                                   | 1046.34±291.27s                                              | 93.47±58.05s                                                 | 28.56±4.60s                                                  | 31.58±11.30s \*\*                                            | ?                                                            |
-| A2C - CartPole  | 10.59±2.04s                                                   | \*(~1612s)                                                   | 57.56±12.87s                                                 | 57.92±9.94s                                                  | \*(Not converged)                                            | ?                                                            |
-| PPO - CartPole  | 31.82±7.76s                                                  | \*(~1179s)                                                   | 34.79±17.02s                                                 | 44.60±17.04s                                                 | 23.99±9.26s \*\*                                             | ?                                                            |
+| PG - CartPole   | 9.02±6.79s                                                   | None                                                         | None                                                         | 19.26±2.29s                                                  | None                                                         | ?                                                            |
+| DQN - CartPole  | 6.72±1.28s                                                   | 1046.34±291.27s                                              | 93.47±58.05s                                                 | 28.56±4.60s                                                  | 31.58±11.30s \*\*                                            | ?                                                            |
+| A2C - CartPole  | 15.33±4.48s                                                  | \*(~1612s)                                                   | 57.56±12.87s                                                 | 57.92±9.94s                                                  | \*(Not converged)                                            | ?                                                            |
+| PPO - CartPole  | 6.01±1.14s                                                   | \*(~1179s)                                                   | 34.79±17.02s                                                 | 44.60±17.04s                                                 | 23.99±9.26s \*\*                                             | ?                                                            |
 | PPO - Pendulum  | 16.18±2.49s                                                  | 745.43±160.82s                                               | 259.73±27.37s                                                | 123.62±44.23s                                                | Runtime Error                                                | ?                                                            |
 | DDPG - Pendulum | 37.26±9.55s                                                  | \*(>1h)                                                      | 277.52±92.67s                                                | 314.70±7.92s                                                 | 59.05±10.03s \*\*                                            | 172.18±62.48s                                                |
 | TD3 - Pendulum  | 44.04±6.37s                                                  | None                                                         | 99.75±21.63s                                                 | 149.90±7.54s                                                 | 57.52±17.71s \*\*                                            | 210.31±76.30s                                                |
@@ -142,7 +145,7 @@ We decouple all of the algorithms into 4 parts:
 - `process_fn`: to preprocess data from replay buffer (since we have reformulated all algorithms to replay-buffer based algorithms);
 - `learn`: to learn from a given batch data.
 
-Within these API, we can interact with different policies conveniently.
+Within this API, we can interact with different policies conveniently.
 
 ### Elegant and Flexible
 
@@ -182,17 +185,12 @@ Define some hyper-parameters:
 
 ```python
 task = 'CartPole-v0'
-lr = 1e-3
-gamma = 0.9
-n_step = 3
-eps_train, eps_test = 0.1, 0.05
-epoch = 10
-step_per_epoch = 1000
-collect_per_step = 10
-target_freq = 320
-batch_size = 64
+lr, epoch, batch_size = 1e-3, 10, 64
 train_num, test_num = 8, 100
+gamma, n_step, target_freq = 0.9, 3, 320
 buffer_size = 20000
+eps_train, eps_test = 0.1, 0.05
+step_per_epoch, collect_per_step = 1000, 10
 writer = SummaryWriter('log/dqn')  # tensorboard is also supported!
 ```
 
@@ -208,7 +206,8 @@ Define the network:
 
 ```python
 from tianshou.utils.net.common import Net
-
+# you can define other net by following the API:
+# https://tianshou.readthedocs.io/en/latest/tutorials/dqn.html#build-the-network
 env = gym.make(task)
 state_shape = env.observation_space.shape or env.observation_space.n
 action_shape = env.action_space.shape or env.action_space.n
@@ -219,8 +218,7 @@ optim = torch.optim.Adam(net.parameters(), lr=lr)
 Setup policy and collectors:
 
 ```python
-policy = ts.policy.DQNPolicy(net, optim, gamma, n_step,
-                             target_update_freq=target_freq)
+policy = ts.policy.DQNPolicy(net, optim, gamma, n_step, target_update_freq=target_freq)
 train_collector = ts.data.Collector(policy, train_envs, ts.data.ReplayBuffer(buffer_size))
 test_collector = ts.data.Collector(policy, test_envs)
 ```
@@ -236,7 +234,7 @@ result = ts.trainer.offpolicy_trainer(
 print(f'Finished training! Use {result["duration"]}')
 ```
 
-Save / load the trained policy (it's exactly the same as PyTorch nn.module):
+Save / load the trained policy (it's exactly the same as PyTorch `nn.module`):
 
 ```python
 torch.save(policy.state_dict(), 'dqn.pth')
@@ -254,26 +252,26 @@ collector.close()
 Look at the result saved in tensorboard: (with bash script in your terminal)
 
 ```bash
-tensorboard --logdir log/dqn
+$ tensorboard --logdir log/dqn
 ```
 
 You can check out the [documentation](https://tianshou.readthedocs.io) for advanced usage.
 
 ## Contributing
 
-Tianshou is still under development. More algorithms and features are going to be added and we always welcome contributions to help make Tianshou better. If you would like to contribute, please check out [docs/contributing.rst](https://github.com/thu-ml/tianshou/blob/master/docs/contributing.rst).
+Tianshou is still under development. More algorithms and features are going to be added and we always welcome contributions to help make Tianshou better. If you would like to contribute, please check out [this link](https://tianshou.readthedocs.io/en/latest/contributing.html).
 
 ## TODO
 
-Check out the [Issue/PR Categories](https://github.com/thu-ml/tianshou/projects/2) and [Support Status](https://github.com/thu-ml/tianshou/projects/3) page for more detail.
+Check out the [Project](https://github.com/thu-ml/tianshou/projects) page for more detail.
 
 ## Citing Tianshou
 
 If you find Tianshou useful, please cite it in your publications.
 
 ```latex
 @misc{tianshou,
-  author = {Jiayi Weng, Minghao Zhang, Dong Yan, Hang Su, Jun Zhu},
+  author = {Jiayi Weng, Minghao Zhang, Alexis Duburcq, Kaichao You, Dong Yan, Hang Su, Jun Zhu},
   title = {Tianshou},
   year = {2020},
   publisher = {GitHub},

diff --git a/docs/conf.py b/docs/conf.py
@@ -41,7 +41,7 @@
     'sphinx.ext.doctest',
     'sphinx.ext.intersphinx',
     'sphinx.ext.coverage',
-    'sphinx.ext.imgmath',
+    # 'sphinx.ext.imgmath',
     'sphinx.ext.mathjax',
     'sphinx.ext.ifconfig',
     'sphinx.ext.viewcode',
@@ -58,7 +58,9 @@
 # directories to ignore when looking for source files.
 # This pattern also affects html_static_path and html_extra_path.
 exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
-autodoc_default_options = {'special-members': '__call__, __getitem__, __len__'}
+autodoc_default_options = {'special-members': ', '.join([
+    '__len__', '__call__', '__getitem__', '__setitem__',
+    '__getattr__', '__setattr__'])}
 
 # -- Options for HTML output -------------------------------------------------
 

diff --git a/docs/contributing.rst b/docs/contributing.rst
@@ -8,13 +8,14 @@ To install Tianshou in an "editable" mode, run
 
 .. code-block:: bash
 
-    pip3 install -e ".[dev]"
+    $ git checkout dev
+    $ pip install -e ".[dev]"
 
 in the main directory. This installation is removable by
 
 .. code-block:: bash
 
-    python3 setup.py develop --uninstall
+    $ python setup.py develop --uninstall
 
 PEP8 Code Style Check
 ---------------------
@@ -23,7 +24,7 @@ We follow PEP8 python code style. To check, in the main directory, run:
 
 .. code-block:: bash
 
-    flake8 . --count --show-source --statistics
+    $ flake8 . --count --show-source --statistics
 
 Test Locally
 ------------
@@ -32,7 +33,7 @@ This command will run automatic tests in the main directory
 
 .. code-block:: bash
 
-    pytest test --cov tianshou -s --durations 0 -v
+    $ pytest test --cov tianshou -s --durations 0 -v
 
 Test by GitHub Actions
 ----------------------
@@ -65,11 +66,13 @@ To compile documentation into webpages, run
 
 .. code-block:: bash
 
-    make html
+    $ make html
 
 under the ``docs/`` directory. The generated webpages are in ``docs/_build`` and can be viewed with browsers.
 
-Chinese Documentation
----------------------
+Chinese documentation is in https://tianshou.readthedocs.io/zh/latest/, and the develop version of documentation is in https://tianshou.readthedocs.io/en/dev/.
+
+Pull Request
+------------
 
-Chinese documentation is in https://tianshou.readthedocs.io/zh/latest/
+All of the commits should merge through the pull request to the ``dev`` branch. The pull request must have 2 approvals before merging.
diff --git a/docs/contributor.rst b/docs/contributor.rst
@@ -6,4 +6,4 @@ We always welcome contributions to help make Tianshou better. Below are an incom
 * Jiayi Weng (`Trinkle23897 <https://github.com/Trinkle23897>`_)
 * Minghao Zhang (`Mehooz <https://github.com/Mehooz>`_)
 * Alexis Duburcq (`duburcqa <https://github.com/duburcqa>`_)
-* Kaichao You (`youkaichao <https://github.com/youkaichao>`_)
+* Kaichao You (`youkaichao <https://github.com/youkaichao>`_)