-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to introduce other optimizers into DI-engine? #813
Comments
Thanks for your attention about DI-engine. Now I will give some basic explanations about your questions:
In most RL problems, the
If you want to use different optimizer, you can indeed modify the relevant policy file to suit your specific requirements. The DI-engine framework is designed with flexibility in mind, allowing users to customize the
There are indeed some kinds of coding design patterns about how to elegantly compose different modules. However, in the early stage of DI-engine, we thought the optimizer is important to hack in the large-scale RL training such as DI-star. In this scenario, we modified the optimizer in-place to implement some special gradient clipping operations with little overhead of GPU memory. And this historical implementation is left to today's version. From the viewpoint of most classical RL algorithms, I think it is feasible to decoupling the RL algorithm and optimizer, but it needs some detailed design to figure out the problems I mentioned above. If interested, we can provide the corresponding support about the refactor plan. |
Hello, thank you so much for your explanation. The With this implementation, we can separate the However, after carefully studying the implementation of DI-engine, I feel that refactoring this part does introduce a lot of complexity. If maintenance and modification costs outweigh the benefits, perhaps maintaining the existing implementation is a better choice. I still provide a sketch of this idea. If you evaluate it and think it is not necessary for any reason, you can just close this issue. Alternatively, if you think this refactoring is valuable, I'd be willing to discuss it further. Thanks again for your reply. By the way, your code is very good and it makes me feel happy physically and mentally while reading it. class TemplateOptimizer(torch.optim.Optimizer):
def __init__(self, optimizer: torch.optim.Optimizer):
self.optimizer = optimizer
# Initialize the base Optimizer with the parameters of the passed optimizer
super(TemplateOptimizer, self).__init__(
optimizer.param_groups, optimizer.defaults
)
def before_step(self):
# This method can be overridden by subclasses if needed
...
def step(self, closure=None):
self.before_step()
self.optimizer.step(closure)
def __getattr__(self, name):
return getattr(self.optimizer, name)
class DingOptimizer(TemplateOptimizer):
def __init__(
self,
optimizer: torch.optim.Optimizer,
optim_type: str = "",
grad_clip_type: str = None,
clip_value: Union[float, None] = None,
clip_coef: float = 5,
clip_norm_type: float = 2.0,
clip_momentum_timestep: int = 100,
grad_norm_type: str = None,
grad_ignore_type: str = None,
ignore_value: Union[float, None] = None,
ignore_coef: float = 5,
ignore_norm_type: float = 2.0,
ignore_momentum_timestep: int = 100,
):
super(DingOptimizer, self).__init__(optimizer)
self.optim_type = optim_type
self.grad_clip_type = grad_clip_type
self.clip_value = clip_value
self.clip_coef = clip_coef
self.clip_norm_type = clip_norm_type
self.clip_momentum_timestep = clip_momentum_timestep
self.grad_norm_type = grad_norm_type
self.grad_ignore_type = grad_ignore_type
self.ignore_value = ignore_value
self.ignore_coef = ignore_coef
self.ignore_norm_type = ignore_norm_type
self.ignore_momentum_timestep = ignore_momentum_timestep
def clipping(self):
...
def ignoring(self):
...
@override
def before_step(self):
self.clipping()
self.ignoring() |
Hi,
I would like to incorporate optimizers from the parameterfree library into RL training.
However, I've noticed that DI-engine has hard-coded the optimizer in most of its RL algorithms.
e.g:
DI-engine/ding/policy/offppo_collect_traj.py
Line 92 in 7f95159
DI-engine/ding/policy/qgpo.py
Line 83 in 7f95159
DI-engine/ding/policy/a2c.py
Line 110 in 7f95159
I have a few questions regarding this:
Why is the optimizer hard-coded in most of DI-engine's RL algorithms?
What is the best practice if I really want to use a different optimizer?
Why does DI-engine implement a different optimizer for each that from Pytorch instead of using a design pattern like the strategy pattern or template pattern, which would enable users to easily compose or implement their optimizers?
Based on my understanding, the optimizer and the "thing" to clip are independent. Therefore, is decoupling the RL algorithm and optimizer feasible?
Feel free to correct any misunderstandings or gaps in my knowledge. If you have plans to refactor this part, I am willing to contribute code.
The text was updated successfully, but these errors were encountered: