Release Release v1.1.0 · takuseno/d3rlpy

MDPDataset

The timestep alignment is now exactly the same as D4RL:

# observations = [o_1, o_2, ..., o_n]
observations = np.random.random((1000, 10))

# actions = [a_1, a_2, ..., a_n]
actions = np.random.random((1000, 10))

# rewards = [r(o_1, a_1), r(o_2, a_2), ...]
rewards = np.random.random(1000)

# terminals = [t(o_1, a_1), t(o_2, a_2), ...]
terminals = ...

where r(o, a) is the reward function and t(o, a) is the terminal function.

The reason of this change is that the many users were confused with the difference between d3rlpy and D4RL. But, now it's aligned in the same way. This change might break your dataset.

Algorithms

Neural Fitted Q-iteration (NFQ)
- https://link.springer.com/chapter/10.1007/11564096_32

Enhancements

AWAC, CRR and IQL use a non-squashed gaussian policy function.
The more tutorial pages have been added to the documentation.
The software design page has been added to the documentation.
The reproduction script for IQL has been added.
The progress bar in online training is visually improved in Jupyter Notebook #161 (thanks, @aiueola )
The nan checks have been added to MDPDataset.
The target_reduction_type and bootstrap options have been removed.

Bugfix

The unnecessary test conditions have been removed
Typo in dataset.pyx has been fixed #167 (thanks, @zbzhu99 )
The details of IQL implementation have been fixed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v1.1.0

MDPDataset

Algorithms

Enhancements

Bugfix

Contributors