Gradient clipping across multiple GPUs #16882

zeeshansayyed · 2019-11-21T19:42:44Z

zeeshansayyed
Nov 21, 2019

Hello,

Can someone please point me to an example where gradient clipping can be performed on multiple GPUs.

Thanks
Zeeshan

Nov 21, 2019

The example which I found was using gluonnlp.utils.clip_grad_global_norm as follows:

trainer.allreduce_grads()
nlp.utils.clip_grad_global_norm(params, 1)
trainer.update(accumulate if accumulate else 1)
step_num += 1
if accumulate and accumulate > 1:
    # set grad to zero for gradient accumulation
    all_model_params.zero_grad()

View full answer

zeeshansayyed · 2019-11-21T20:17:05Z

zeeshansayyed
Nov 21, 2019
Author

The example which I found was using gluonnlp.utils.clip_grad_global_norm as follows:

trainer.allreduce_grads()
nlp.utils.clip_grad_global_norm(params, 1)
trainer.update(accumulate if accumulate else 1)
step_num += 1
if accumulate and accumulate > 1:
    # set grad to zero for gradient accumulation
    all_model_params.zero_grad()

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient clipping across multiple GPUs #16882

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Gradient clipping across multiple GPUs #16882

zeeshansayyed Nov 21, 2019

Replies: 1 comment

zeeshansayyed Nov 21, 2019 Author

zeeshansayyed
Nov 21, 2019

zeeshansayyed
Nov 21, 2019
Author