This release contains several API and behavior changes.

API Change

Buffer

Add ReplayBufferManager, PrioritizedReplayBufferManager, VectorReplayBuffer, PrioritizedVectorReplayBuffer, CachedReplayBuffer (#278, #280);
Change buffer.add API from buffer.add(obs, act, rew, done, obs_next, info, policy, ...) to buffer.add(batch, buffer_ids) in order to add data more efficient (#280);
Add set_batch method in buffer (#278);
Add sample_index method, same as sample but only return index instead of both index and batch data (#278);
Add prev (one-step previous transition index), next (one-step next transition index) and unfinished_index (the last modified index whose done==False) (#278);
Add internal method _alloc_by_keys_diff in batch to support any form of keys pop up (#280);

Rewrite the original Collector, split the async function to AsyncCollector: Collector only supports sync mode, AsyncCollector support both modes (#280);
Drop collector.collect(n_episode=List[int]) because the new collector can collect episodes without bias (#280);
Move reward_metric from Collector to trainer (#280);
Change Collector.collect logic: AsyncCollector.collect's semantic is the same as previous version, where collect(n_step or n_episode) will not collect exact n_step or n_episode transitions; Collector.collect(n_step or n_episode)'s semantic now changes to exact n_step or n_episode collect (#280);

Add policy.exploration_noise(action, batch) -> action method instead of implemented in policy.forward() (#280);
Add Timelimit.truncate handler in compute_*_returns (#296);
remove ignore_done flag (#296);
remove reward_normalization option in offpolicy-algorithm (will raise Error if set to True) (#298);

Fix VectorEnv action_space seed randomness -- when call env.seed(seed), it will call env.action_space.seed(seed); otherwise using Collector.collect(..., random=True) will produce different result each time (#300, #303).