Skip to content

Releases: thu-ml/tianshou

0.4.5

28 Nov 15:14
3592f45
Compare
Choose a tag to compare

Bug Fix

  1. Fix tqdm issue (#481)
  2. Fix atari wrapper to be deterministic (#467)
  3. Add writer.flush() in TensorboardLogger to ensure real-time logging result (#485)

Enhancement

  1. Implements set_env_attr and get_env_attr for vector environments (#478)
  2. Implement BCQPolicy and offline_bcq example (#480)
  3. Enable test_collector=None in 3 trainers to turn off testing during training (#485)
  4. Fix an inconsistency in the implementation of Discrete CRR. Now it uses Critic class for its critic, following conventions in other actor-critic policies (#485)
  5. Update several offline policies to use ActorCritic class for its optimizer to eliminate randomness caused by parameter sharing between actor and critic (#485)
  6. Move Atari offline RL examples to examples/offline and tests to test/offline (#485)

0.4.4

13 Oct 16:30
Compare
Choose a tag to compare

API Change

  1. add a new class DataParallelNet for multi-GPU training (#461)
  2. add ActorCritic for deterministic parameter grouping for share-head actor-critic network (#458)
  3. collector.collect() now returns 4 extra keys: rew/rew_std/len/len_std (previously this work is done in logger) (#459)
  4. rename WandBLogger -> WandbLogger (#441)

Bug Fix

  1. fix logging in atari examples (#444)

Enhancement

  1. save_fn() will be called at the beginning of trainer (#459)
  2. create a new page for logger (#463)
  3. add save_data and restore_data in wandb, allow more input arguments for wandb init, and integrate wandb into test/modelbase/test_psrl.py and examples/atari/atari_dqn.py (#441)

0.4.3

02 Sep 21:20
fc251ab
Compare
Choose a tag to compare

Bug Fix

  1. fix a2c/ppo optimizer bug when sharing head (#428)
  2. fix ppo dual clip implementation (#435)

Enhancement

  1. add Rainbow (#386)
  2. add WandbLogger (#427)
  3. add env_id in preprocess_fn (#391)
  4. update README, add new chart and bibtex (#406)
  5. add Makefile, now you can use make commit-checks to automatically perform almost all checks (#432)
  6. add isort and yapf, apply to existing codebase (#432)
  7. add spelling check by using make spelling (#432)
  8. update contributing.rst (#432)

0.4.2

26 Jun 10:24
ebaca6f
Compare
Choose a tag to compare

Enhancement

  1. Add model-free dqn family: IQN (#371), FQF (#376)
  2. Add model-free on-policy algorithm: NPG (#344, #347), TRPO (#337, #340)
  3. Add offline-rl algorithm: CQL (#359), CRR (#367)
  4. Support deterministic evaluation for onpolicy algorithms (#354)
  5. Make trainer resumable (#350)
  6. Support different state size and fix exception in venv.__del__ (#352, #384)
  7. Add vizdoom example (#384)
  8. Add numerical analysis tool and interactive plot (#335, #341)

0.4.1

04 Apr 09:36
dd4a011
Compare
Choose a tag to compare

API Change

  1. Add observation normalization in BaseVectorEnv (norm_obs, obs_rms, update_obs_rms and RunningMeanStd) (#308)
  2. Add policy.map_action to bound with raw action (e.g., map from (-inf, inf) to [-1, 1] by clipping or tanh squashing), and the mapped action won't store in replaybuffer (#313)
  3. Add lr_scheduler in on-policy algorithms, typically for LambdaLR (#318)

Note

To adapt with this version, you should change the action_range=... to action_space=env.action_space in policy initialization.

Bug Fix

  1. Fix incorrect behaviors (error when n/ep==0 and reward shown in tqdm) with on-policy algorithm (#306, #328)
  2. Fix q-value mask_action error for obs_next (#310)

Enhancement

  1. Release SOTA Mujoco benchmark (DDPG/TD3/SAC: #305, REINFORCE: #320, A2C: #325, PPO: #330) and add corresponding notes in /examples/mujoco/README.md
  2. Fix numpy>=1.20 typing issue (#323)
  3. Add cross-platform unittest (#331)
  4. Add a test on how to deal with finite env (#324)
  5. Add value normalization in on-policy algorithms (#319, #321)
  6. Separate advantage normalization and value normalization in PPO (#329)

0.4.0

02 Mar 12:40
389bdb7
Compare
Choose a tag to compare

This release contains several API and behavior changes.

API Change

Buffer

  1. Add ReplayBufferManager, PrioritizedReplayBufferManager, VectorReplayBuffer, PrioritizedVectorReplayBuffer, CachedReplayBuffer (#278, #280);
  2. Change buffer.add API from buffer.add(obs, act, rew, done, obs_next, info, policy, ...) to buffer.add(batch, buffer_ids) in order to add data more efficient (#280);
  3. Add set_batch method in buffer (#278);
  4. Add sample_index method, same as sample but only return index instead of both index and batch data (#278);
  5. Add prev (one-step previous transition index), next (one-step next transition index) and unfinished_index (the last modified index whose done==False) (#278);
  6. Add internal method _alloc_by_keys_diff in batch to support any form of keys pop up (#280);

Collector

  1. Rewrite the original Collector, split the async function to AsyncCollector: Collector only supports sync mode, AsyncCollector support both modes (#280);
  2. Drop collector.collect(n_episode=List[int]) because the new collector can collect episodes without bias (#280);
  3. Move reward_metric from Collector to trainer (#280);
  4. Change Collector.collect logic: AsyncCollector.collect's semantic is the same as previous version, where collect(n_step or n_episode) will not collect exact n_step or n_episode transitions; Collector.collect(n_step or n_episode)'s semantic now changes to exact n_step or n_episode collect (#280);

Policy

  1. Add policy.exploration_noise(action, batch) -> action method instead of implemented in policy.forward() (#280);
  2. Add Timelimit.truncate handler in compute_*_returns (#296);
  3. remove ignore_done flag (#296);
  4. remove reward_normalization option in offpolicy-algorithm (will raise Error if set to True) (#298);

Trainer

  1. Change collect_per_step to step_per_collect (#293);
  2. Add update_per_step and episode_per_collect (#293);
  3. onpolicy_trainer now supports either step_collect or episode_collect (#293)
  4. Add BasicLogger and LazyLogger to log data more conveniently (#295)

Bug Fix

  1. Fix VectorEnv action_space seed randomness -- when call env.seed(seed), it will call env.action_space.seed(seed); otherwise using Collector.collect(..., random=True) will produce different result each time (#300, #303).

0.3.2

16 Feb 01:41
cb65b56
Compare
Choose a tag to compare

Bug Fix

  1. fix networks under utils/discrete and utils/continuous cannot work well under CUDA+torch<=1.6.0 (#289)
  2. fix 2 bugs of Batch: creating keys in Batch.__setitem__ now throws ValueError instead of KeyError; _create_value now allows placeholder with stack=False option (#284)

Enhancement

  1. Add QR-DQN algorithm (#276)
  2. small optimization for Batch.cat and Batch.stack (#284), now it is almost as fast as v0.2.3

0.3.1

20 Jan 10:24
a511cb4
Compare
Choose a tag to compare

API Change

  1. change utils.network args to support any form of MLP by default (#275), remove layer_num and hidden_layer_size, add hidden_sizes (a list of int indicate the network architecture)
  2. add HDF5 save/load method for ReplayBuffer (#261)
  3. add offline_trainer (#263)
  4. move Atari-related network to examples/atari/atari_network.py (#275)

Bug Fix

  1. fix a potential bug in discrete behavior cloning policy (#263)

Enhancement

  1. update SAC mujoco result (#246)
  2. add C51 algorithm with benchmark result (#266)
  3. enable type checking in utils.network (#275)

0.3.0.post1

08 Oct 15:24
Compare
Choose a tag to compare

Several bug fix (trainer, test and docs)

0.3.0

26 Sep 08:39
710966e
Compare
Choose a tag to compare

Since at this point, the code has largely changed from v0.2.0, we release version 0.3 from now on.

API Change

  1. add policy.updating and clarify collecting state and updating state in training (#224)
  2. change train_fn(epoch) to train_fn(epoch, env_step) and test_fn(epoch) to test_fn(epoch, env_step) (#229)
  3. remove out-of-the-date API: collector.sample, collector.render, collector.seed, VectorEnv (#210)

Bug Fix

  1. fix a bug in DDQN: target_q could not be sampled from np.random.rand (#224)
  2. fix a bug in DQN atari net: it should add a ReLU before the last layer (#224)
  3. fix a bug in collector timing (#224)
  4. fix a bug in the converter of Batch: deepcopy a Batch in to_numpy and to_torch (#213)
  5. ensure buffer.rew has a type of float (#229)

Enhancement

  1. Anaconda support: conda install -c conda-forge tianshou (#228)
  2. add PSRL (#202)
  3. add SAC discrete (#216)
  4. add type check in unit test (#200)
  5. format code and update function signatures (#213)
  6. add pydocstyle and doc8 check (#210)
  7. several documentation fix (#210)