Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Gradient clipping across multiple GPUs #16882

Discussion options

You must be logged in to vote

The example which I found was using gluonnlp.utils.clip_grad_global_norm as follows:

trainer.allreduce_grads()
nlp.utils.clip_grad_global_norm(params, 1)
trainer.update(accumulate if accumulate else 1)
step_num += 1
if accumulate and accumulate > 1:
    # set grad to zero for gradient accumulation
    all_model_params.zero_grad()

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by szha
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant
Converted from issue

This discussion was converted from issue #16882 on September 05, 2020 19:32.