Torch multiple simultaneous gradient_checkpoint_scope #1583

albertz · 2024-07-15T13:32:30Z

There will only be one saved_tensors_hooks active, specifically for the most recent gradient_checkpoint_scope. So any of the earlier pack hooks will not be used, when there are multiple simultaneous gradient_checkpoint_scopes.

Example code:

def get_var1():
    with gradient_checkpoint_scope():
        return var1 + torch.randn_like(var1)

def get_var2():
    with gradient_checkpoint_scope():
        return var2 + torch.randn_like(var2)

x = get_var1() * get_var2()

A solution is that we keep a global weak tensor key dictionary for all registered tensors of any gradient_checkpoint_scope, and in the pack hook, check that instead of the local.

It's currently maybe not so important, as this is a case we likely do not run into (yet; I guess).

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torch multiple simultaneous gradient_checkpoint_scope #1583

Torch multiple simultaneous gradient_checkpoint_scope #1583

albertz commented Jul 15, 2024

Torch multiple simultaneous gradient_checkpoint_scope #1583

Torch multiple simultaneous gradient_checkpoint_scope #1583

Comments

albertz commented Jul 15, 2024