Skip to content

Misc 2.4 #1780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Misc 2.4 #1780

wants to merge 12 commits into from

Conversation

cyanguwa
Copy link
Collaborator

Description

This PR addresses a few minor issues with TE-PyTorch:

  • add missing args such as cu_seqlens and max_seqlen to cross-attention in TransformerLayer
  • allow attn_input_format=thd in TransformerLayer
  • add a note regarding token reordering for context parallelism

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • See description.

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

cyanguwa added 3 commits May 12, 2025 11:58
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
@cyanguwa
Copy link
Collaborator Author

/te-ci pytorch L1 L2

@cyanguwa cyanguwa requested a review from xrennvidia May 13, 2025 21:46
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Context parallelism distributes chunks of the sequence onto different GPUs. To help with
load balancing, users are expected to reorder their tokens before entering this function.
For example, given cp_size = 2, we divide each sequence in a batch into 4 chunks, and
distribute chunk 0 and chunk 3 onto GPU 0, and chunk 1 and chunk 2 onto GPU 1. This requires
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say there should be some small example of what the end result here should be. For example, this note does not really tell me that those chunks need to be parts of a single tensor, laid out one after another in memory.

xrennvidia
xrennvidia previously approved these changes May 15, 2025
@cyanguwa cyanguwa added the 2.4.0 label May 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants