Misc 2.4 #1780

cyanguwa · 2025-05-13T21:45:50Z

Description

This PR addresses a few minor issues with TE-PyTorch:

add missing args such as cu_seqlens and max_seqlen to cross-attention in TransformerLayer
allow attn_input_format=thd in TransformerLayer
add a note regarding token reordering for context parallelism

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

See description.

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa · 2025-05-13T21:46:11Z

/te-ci pytorch L1 L2

Signed-off-by: Charlene Yang <[email protected]>

transformer_engine/pytorch/attention/dot_product_attention/context_parallel.py

Signed-off-by: Charlene Yang <[email protected]>

ptrendx · 2025-05-14T00:18:59Z

transformer_engine/pytorch/attention/dot_product_attention/context_parallel.py

+        Context parallelism distributes chunks of the sequence onto different GPUs. To help with
+        load balancing, users are expected to reorder their tokens before entering this function.
+        For example, given cp_size = 2, we divide each sequence in a batch into 4 chunks, and
+        distribute chunk 0 and chunk 3 onto GPU 0, and chunk 1 and chunk 2 onto GPU 1. This requires


I would say there should be some small example of what the end result here should be. For example, this note does not really tell me that those chunks need to be parts of a single tensor, laid out one after another in memory.

Signed-off-by: Charlene Yang <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Charlene Yang <[email protected]>

for more information, see https://pre-commit.ci

cyanguwa added 3 commits May 12, 2025 11:58

add missing args in cross-attn

1dd7b42

Signed-off-by: Charlene Yang <[email protected]>

allow thd for TELayer

b6c7d5c

Signed-off-by: Charlene Yang <[email protected]>

add CP note for reordering

dc81a89

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa requested a review from xrennvidia May 13, 2025 21:46

fix wording about CP

bbccfe3

Signed-off-by: Charlene Yang <[email protected]>

xrennvidia reviewed May 13, 2025

View reviewed changes

transformer_engine/pytorch/attention/dot_product_attention/context_parallel.py Show resolved Hide resolved

add modulo cpx2 requirement

0d8fb83

Signed-off-by: Charlene Yang <[email protected]>

ptrendx reviewed May 14, 2025

View reviewed changes

xrennvidia previously approved these changes May 15, 2025

View reviewed changes

cyanguwa added the 2.4.0 label May 15, 2025

cyanguwa and others added 2 commits May 17, 2025 02:34

Merge branch 'NVIDIA:main' into misc_2.4

fc4d718

add example of token reordering

52fcbc5

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa dismissed xrennvidia’s stale review via 52fcbc5 May 18, 2025 22:13

cyanguwa and others added 5 commits May 19, 2025 06:13

Merge branch 'main' into misc_2.4

206e9e3

improve the CP docstring

090ff5e

Signed-off-by: Charlene Yang <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

e0b0f30

for more information, see https://pre-commit.ci

tweak CP wording

12528ea

Signed-off-by: Charlene Yang <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

d6f761a

for more information, see https://pre-commit.ci

xrennvidia approved these changes May 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc 2.4 #1780

Misc 2.4 #1780

cyanguwa commented May 13, 2025

cyanguwa commented May 13, 2025

ptrendx May 14, 2025

Misc 2.4 #1780

Are you sure you want to change the base?

Misc 2.4 #1780

Conversation

cyanguwa commented May 13, 2025

Description

Type of change

Changes

Checklist:

cyanguwa commented May 13, 2025

ptrendx May 14, 2025

Choose a reason for hiding this comment