Release v1.1.0
MDPDataset
The timestep alignment is now exactly the same as D4RL:
# observations = [o_1, o_2, ..., o_n]
observations = np.random.random((1000, 10))
# actions = [a_1, a_2, ..., a_n]
actions = np.random.random((1000, 10))
# rewards = [r(o_1, a_1), r(o_2, a_2), ...]
rewards = np.random.random(1000)
# terminals = [t(o_1, a_1), t(o_2, a_2), ...]
terminals = ...
where r(o, a)
is the reward function and t(o, a)
is the terminal function.
The reason of this change is that the many users were confused with the difference between d3rlpy and D4RL. But, now it's aligned in the same way. This change might break your dataset.
Algorithms
- Neural Fitted Q-iteration (NFQ)
Enhancements
- AWAC, CRR and IQL use a non-squashed gaussian policy function.
- The more tutorial pages have been added to the documentation.
- The software design page has been added to the documentation.
- The reproduction script for IQL has been added.
- The progress bar in online training is visually improved in Jupyter Notebook #161 (thanks, @aiueola )
- The nan checks have been added to
MDPDataset
. - The
target_reduction_type
andbootstrap
options have been removed.