linear: clear row-wise weight at the end of forward #1770

kshitij12345 · 2025-05-12T12:00:39Z

Description

Per the above image, row-wise quantized weight can be freed post the forward. However, this does not happen without explicit update_usage(columnwise=True, rowwise=False) (as with the default, row-wise copy is preserved)

TransformerEngine/transformer_engine/pytorch/module/linear.py

Lines 357 to 359 in 51cd441

    
           if inp.requires_grad: 
        
               if isinstance(weightmat, QuantizedTensorBase): 
        
                   weightmat.update_usage(columnwise_usage=True)

Implementation of update_usage for MXFP8TensorBase.

TransformerEngine/transformer_engine/pytorch/tensor/_internal/mxfp8_tensor_base.py

Line 159 in 51cd441

def update_usage(

Implementation of update_usage for Float8TensorBase.

TransformerEngine/transformer_engine/pytorch/tensor/_internal/float8_tensor_base.py

Line 163 in 51cd441

def update_usage(

Example Script

import torch

import transformer_engine
from transformer_engine.pytorch import fp8_autocast, Linear, fp8_model_init

dim = 1024 * 22 # Large input for demonstration of memory change.

with fp8_model_init(enabled=False):
    linear = Linear(dim, dim, bias=False, params_dtype=torch.bfloat16)

# 1015.021568 MB PARAM
print(torch.cuda.memory_allocated() / 1e6, "MB PARAM")

x = torch.randn(dim, dim, requires_grad=True, device="cuda", dtype=torch.bfloat16)

# 2030.043136 MB X
print(torch.cuda.memory_allocated() / 1e6, "MB X")

for _ in range(10):
    with fp8_autocast():
        o = linear(x)
        g_o = torch.randn_like(o)
    

    o.backward(g_o)

# Without patch - 10661.943808 MB
# With patch - 10154.433024 MB MB
print(torch.cuda.max_memory_allocated() / 1e6, "MB")

NOTE: Will add a test if the patch makes sense.

Signed-off-by: kshitij12345 <[email protected]>

…-rowwise-weight-forward

Signed-off-by: kshitij12345 <[email protected]>

kshitij12345 · 2025-05-12T13:29:05Z

Ping @ksivaman @ptrendx to see if it makes sense.

kshitij12345 added 4 commits May 12, 2025 13:42

linear: clear row-wise weight at the end of forward

fa1459c

Signed-off-by: kshitij12345 <[email protected]>

Merge branch 'main' of github.com:NVIDIA/TransformerEngine into clear…

43d15eb

…-rowwise-weight-forward

update

cbdc872

Signed-off-by: kshitij12345 <[email protected]>

remove unused import

4fd89b9

Signed-off-by: kshitij12345 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

linear: clear row-wise weight at the end of forward #1770

linear: clear row-wise weight at the end of forward #1770

kshitij12345 commented May 12, 2025 •

edited

Loading

kshitij12345 commented May 12, 2025

	if inp.requires_grad:
	if isinstance(weightmat, QuantizedTensorBase):
	weightmat.update_usage(columnwise_usage=True)

linear: clear row-wise weight at the end of forward #1770

Are you sure you want to change the base?

linear: clear row-wise weight at the end of forward #1770

Conversation

kshitij12345 commented May 12, 2025 • edited Loading

Description

kshitij12345 commented May 12, 2025

kshitij12345 commented May 12, 2025 •

edited

Loading