Skip to content

Releases: thu-ml/tianshou

0.3.0rc0

23 Sep 13:07
dcfcbb3
Compare
Choose a tag to compare
0.3.0rc0 Pre-release
Pre-release

This is a pre-release for testing anaconda.

0.2.7

08 Sep 13:38
64af7ea
Compare
Choose a tag to compare

API Change

  1. exact n_episode for a list of n_episode limitation and save fake data in cache_buffer when self.buffer is None (#184)
  2. add save_only_last_obs for replay buffer in order to save the memory. (#184)
  3. remove default value in batch.split() and add merge_last argument (#185)
  4. fix tensorboard logging: h-axis stands for env step instead of gradient step; add test results into tensorboard (#189)
  5. add max_batchsize in onpolicy algorithms (#189)
  6. keep only sumtree in segment tree implementation (#193)
  7. add __contains__ and pop in batch: key in batch, batch.pop(key, deft) (#189)
  8. remove dict return support for collector preprocess_fn (#189)
  9. remove **kwargs in ReplayBuffer (#189)
  10. add no_grad argument in collector.collect (#204)

Enhancement

  1. add DQN Atari examples (#187)
  2. change the type-checking order in batch.py and converter.py in order to meet the most often case first (#189)
  3. Numba acceleration for GAE, nstep, and segment tree (#193)
  4. add policy.eval() in all test scripts' "watch performance" (#189)
  5. add test_returns (both GAE and nstep) (#189)
  6. improve the code-coverage (from 90% to 95%) and remove the dead code (#189)
  7. polish examples/box2d/bipedal_hardcore_sac.py (#207)

Bug fix

  1. fix a bug in MAPolicy: buffer.rew = Batch() doesn't change buffer.rew (thanks mypy) (#207)
  2. set policy.eval() before collector.collect (#204) This is a bug
  3. fix shape inconsistency for torch.Tensor in replay buffer (#189)
  4. potential bugfix for subproc.wait (#189)
  5. fix RecurrentActorProb (#189)
  6. fix some incorrect type annotation (#189)
  7. fix a bug in tictactoe set_eps (#193)
  8. dirty fix for asyncVenv check_id test

0.2.6

19 Aug 07:21
a9f9940
Compare
Choose a tag to compare

API Change

  1. Replay buffer allows stack_num = 1 (#165)
  2. add policy.update to enable post process and remove collector.sample (#180)
  3. Remove collector.close and rename VectorEnv to DummyVectorEnv (#179)

Enhancement

  1. Enable async simulation for all vector envs (#179)
  2. Improve PER (#159): use segment tree and enable all Q-learning algorithms to use PER
  3. unify single-env and multi-env in collector (#157)
  4. Pickle compatible for replay buffer and improve buffer.get (#182): fix #84 and make buffer more efficient
  5. Add ShmemVectorEnv implementation (#174)
  6. Add Dueling DQN implementation (#170)
  7. Add profile workflow (#143)
  8. Add BipedalWalkerHardcore-v3 SAC example (#177) (about 1 hour it is well-trained)

Bug fix

  1. fix #162 of multi-dim action (#160)

Note: 0.3 is coming soon!

0.2.5

22 Jul 06:59
bd9c3c7
Compare
Choose a tag to compare

New feature

Multi-agent Reinforcement Learning: https://tianshou.readthedocs.io/en/latest/tutorials/tictactoe.html (#122)

Documentation

Add a tutorial of Batch class to standardized the behavior of Batch: https://tianshou.readthedocs.io/en/latest/tutorials/batch.html (#142)

Bugfix

  • Fix inconsistent shape in A2CPolicy and PPOPolicy. Please be careful when dealing with log_prob (#155)
  • Fix list of tensors inside Batch, e.g., Batch(a=[np.zeros(3), torch.zeros(3)]) (#147)
  • Fix buffer update when stack_num > 0 (#154)
  • Remove useless kwargs

0.2.4.post1

14 Jul 00:00
Compare
Choose a tag to compare

Several bug fix and enhancement:

  • remove deprecated API append (#126)
  • Batch.cat_ and Batch.stack_ is now working well with inconsistent keys (#130)
  • Batch.is_empty now correctly recognizes empty over empty Batch (#128)
  • reconstruct collector: remove multiple buffer case, change the internal data to Batch, and add reward_metric for MARL usage (#125)
  • add Batch.update to mimic dict.update (#128)

0.2.4

10 Jul 09:50
47e8e26
Compare
Choose a tag to compare

Algorithm Implementation

  1. n_step returns for all Q-learning based algorithms; (#51)
  2. Auto alpha tuning in SAC (#80)
  3. Reserve policy._state to support saving hidden states in replay buffer (#19)
  4. Add sample_avail argument in ReplayBuffer to sample only available index in RNN training mode (#19)

New Feature

  1. Batch.cat (#87), Batch.stack (#93), Batch.empty (#106, #110)
  2. Advanced slicing method of Batch (#106)
  3. Batch(kwargs, copy=True) will perform a deep copy (#110)
  4. Add random=True argument in collector.collect to perform sampling with random policy (#78)

API Change

  1. Batch.append -> Batch.cat
  2. Remove atari wrapper to examples, since it is not a key feature in tianshou (#124)
  3. Add some pre-defined nets in tianshou.utils.net. Since we only define API instead of a class, we do not present it in tianshou.net. (#123)

Docs

Add cheatsheet: https://tianshou.readthedocs.io/en/latest/tutorials/cheatsheet.html

0.2.3

01 Jun 01:50
Compare
Choose a tag to compare

Enhancement

  1. Multimodal obs (also support any type obs) (#38, #69)
  2. Batch over Batch
  3. preprocess_fn (#42)
  4. Type annotation
  5. batch.to_torch, batch.to_numpy
  6. pickle support for batch

Fixed Bugs

  1. SAC/PPO diag gaussian
  2. PPO orthogonal init
  3. DQN zero eps
  4. Fix type infer in replay buffer

0.2.2

26 Apr 07:25
Compare
Choose a tag to compare

Algorithm Implementation

  1. Generalized Advantage Estimation (GAE);
  2. Update PPO algorithm with arXiv:1811.02553 and arXiv:1912.09729;
  3. Vanilla Imitation Learning (BC & DA, with continuous/discrete action space);
  4. Prioritized DQN;
  5. RNN-style policy network;
  6. Fix SAC with torch==1.5.0

API change

  1. change __call__ to forward in policy;
  2. Add save_fn in trainer;
  3. Add __repr__ in tianshou.data, e.g. print(buffer)

0.2.1

07 Apr 03:52
Compare
Choose a tag to compare

First version with full documentation.
Support algorithms: DQN/VPG/A2C/DDPG/PPO/TD3/SAC