0.4.0
This release contains several API and behavior changes.
API Change
Buffer
- Add ReplayBufferManager, PrioritizedReplayBufferManager, VectorReplayBuffer, PrioritizedVectorReplayBuffer, CachedReplayBuffer (#278, #280);
- Change
buffer.add
API frombuffer.add(obs, act, rew, done, obs_next, info, policy, ...)
tobuffer.add(batch, buffer_ids)
in order to add data more efficient (#280); - Add
set_batch
method in buffer (#278); - Add
sample_index
method, same assample
but only return index instead of both index and batch data (#278); - Add
prev
(one-step previous transition index),next
(one-step next transition index) andunfinished_index
(the last modified index whosedone==False
) (#278); - Add internal method
_alloc_by_keys_diff
in batch to support any form of keys pop up (#280);
Collector
- Rewrite the original Collector, split the async function to AsyncCollector: Collector only supports sync mode, AsyncCollector support both modes (#280);
- Drop
collector.collect(n_episode=List[int])
because the new collector can collect episodes without bias (#280); - Move
reward_metric
from Collector to trainer (#280); - Change
Collector.collect
logic:AsyncCollector.collect
's semantic is the same as previous version, wherecollect(n_step or n_episode)
will not collect exact n_step or n_episode transitions;Collector.collect(n_step or n_episode)
's semantic now changes to exact n_step or n_episode collect (#280);
Policy
- Add
policy.exploration_noise(action, batch) -> action
method instead of implemented inpolicy.forward()
(#280); - Add
Timelimit.truncate
handler incompute_*_returns
(#296); - remove
ignore_done
flag (#296); - remove
reward_normalization
option in offpolicy-algorithm (will raise Error if set to True) (#298);
Trainer
- Change
collect_per_step
tostep_per_collect
(#293); - Add
update_per_step
andepisode_per_collect
(#293); onpolicy_trainer
now supports either step_collect or episode_collect (#293)- Add BasicLogger and LazyLogger to log data more conveniently (#295)