Release v2.0.2
The major update has been finally released! Since the start of the project, this project has earned almost 1K GitHub stars ⭐ , which is a great milestone of d3rlpy. In this update, there are many major changes.
Upgrade Gym version
From this version, d3rlpy only supports the latest Gym version 0.26.0
. This change allows us to support Gymnasium
in the future update.
Algorithm
Clear separation between configuration and algorithm
From this version, each algorithm (e.g. "DQN") has a config class (e.g. "DQNConfig"). This allows us to serialize and deserialize algorithms as described later.
dqn = d3rlpy.algos.DQNConfig(learning_rate=3e-4).create(device="cuda:0")
Decision Transformer
Decision Transformer
is finally available! You can check reproduction code to see how to use it.
import d3rlpy
dataset, env = d3rlpy.datasets.get_pendulum()
dt = d3rlpy.algos.DecisionTransformerConfig(
batch_size=64,
learning_rate=1e-4,
optim_factory=d3rlpy.models.AdamWFactory(weight_decay=1e-4),
encoder_factory=d3rlpy.models.VectorEncoderFactory(
[128],
exclude_last_activation=True,
),
observation_scaler=d3rlpy.preprocessing.StandardObservationScaler(),
reward_scaler=d3rlpy.preprocessing.MultiplyRewardScaler(0.001),
context_size=20,
num_heads=1,
num_layers=3,
warmup_steps=10000,
max_timestep=1000,
).create(device="cuda:0")
dt.fit(
dataset,
n_steps=100000,
n_steps_per_epoch=1000,
save_interval=10,
eval_env=env,
eval_target_return=0.0,
)
Serialization
In this version, d3rlpy introduces a compact serialization, d3
format, that includes both hyperparameters and model parameters in a single file. This makes it possible for you to easily save checkpoints and reconstruct algorithms for evaluation and deployment.
import d3rlpy
dataset, env = d3rlpy.datasets.get_cartpole()
dqn = d3rlpy.algos.DQNConfig().create()
dqn.fit(dataset, n_steps=10000)
# save as d3 file
dqn.save("model.d3")
# reconstruct the exactly same DQN
new_dqn = d3rlpy.load_learnable("model.d3")
ReplayBuffer
From this version, there is no clear separation between ReplayBuffer
and MDPDataset
anymore. Instead, ReplayBuffer
has unlimited flexibility to support any kinds of algorithms and experiments. Please check details at documentation.