Skip to content

0.4.0

Compare
Choose a tag to compare
@Trinkle23897 Trinkle23897 released this 02 Mar 12:40
· 608 commits to master since this release
389bdb7

This release contains several API and behavior changes.

API Change

Buffer

  1. Add ReplayBufferManager, PrioritizedReplayBufferManager, VectorReplayBuffer, PrioritizedVectorReplayBuffer, CachedReplayBuffer (#278, #280);
  2. Change buffer.add API from buffer.add(obs, act, rew, done, obs_next, info, policy, ...) to buffer.add(batch, buffer_ids) in order to add data more efficient (#280);
  3. Add set_batch method in buffer (#278);
  4. Add sample_index method, same as sample but only return index instead of both index and batch data (#278);
  5. Add prev (one-step previous transition index), next (one-step next transition index) and unfinished_index (the last modified index whose done==False) (#278);
  6. Add internal method _alloc_by_keys_diff in batch to support any form of keys pop up (#280);

Collector

  1. Rewrite the original Collector, split the async function to AsyncCollector: Collector only supports sync mode, AsyncCollector support both modes (#280);
  2. Drop collector.collect(n_episode=List[int]) because the new collector can collect episodes without bias (#280);
  3. Move reward_metric from Collector to trainer (#280);
  4. Change Collector.collect logic: AsyncCollector.collect's semantic is the same as previous version, where collect(n_step or n_episode) will not collect exact n_step or n_episode transitions; Collector.collect(n_step or n_episode)'s semantic now changes to exact n_step or n_episode collect (#280);

Policy

  1. Add policy.exploration_noise(action, batch) -> action method instead of implemented in policy.forward() (#280);
  2. Add Timelimit.truncate handler in compute_*_returns (#296);
  3. remove ignore_done flag (#296);
  4. remove reward_normalization option in offpolicy-algorithm (will raise Error if set to True) (#298);

Trainer

  1. Change collect_per_step to step_per_collect (#293);
  2. Add update_per_step and episode_per_collect (#293);
  3. onpolicy_trainer now supports either step_collect or episode_collect (#293)
  4. Add BasicLogger and LazyLogger to log data more conveniently (#295)

Bug Fix

  1. Fix VectorEnv action_space seed randomness -- when call env.seed(seed), it will call env.action_space.seed(seed); otherwise using Collector.collect(..., random=True) will produce different result each time (#300, #303).