Skip to content

Release v1.1.0

Compare
Choose a tag to compare
@takuseno takuseno released this 27 Apr 15:37
· 445 commits to master since this release

MDPDataset

The timestep alignment is now exactly the same as D4RL:

# observations = [o_1, o_2, ..., o_n]
observations = np.random.random((1000, 10))

# actions = [a_1, a_2, ..., a_n]
actions = np.random.random((1000, 10))

# rewards = [r(o_1, a_1), r(o_2, a_2), ...]
rewards = np.random.random(1000)

# terminals = [t(o_1, a_1), t(o_2, a_2), ...]
terminals = ...

where r(o, a) is the reward function and t(o, a) is the terminal function.

The reason of this change is that the many users were confused with the difference between d3rlpy and D4RL. But, now it's aligned in the same way. This change might break your dataset.

Algorithms

Enhancements

  • AWAC, CRR and IQL use a non-squashed gaussian policy function.
  • The more tutorial pages have been added to the documentation.
  • The software design page has been added to the documentation.
  • The reproduction script for IQL has been added.
  • The progress bar in online training is visually improved in Jupyter Notebook #161 (thanks, @aiueola )
  • The nan checks have been added to MDPDataset.
  • The target_reduction_type and bootstrap options have been removed.

Bugfix

  • The unnecessary test conditions have been removed
  • Typo in dataset.pyx has been fixed #167 (thanks, @zbzhu99 )
  • The details of IQL implementation have been fixed.