Skip to content

Ulysses SP for HF Integration #7268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: master
Choose a base branch
from
Open

Ulysses SP for HF Integration #7268

wants to merge 28 commits into from

Conversation

stas00
Copy link
Collaborator

@stas00 stas00 commented Apr 30, 2025

This is the Deepspeed counterpart of snowflakedb/ArcticTraining#45 - as the new feature(s) require changes on both sides.

For PR reviewers:

Readiness status:

  • Code
  • Tests
  • Docs - working on it

Features:

  • add support for delaying grad addition via param.ds_grad_is_ready flag (used when performing tiled compute in an autograd function)
  • add light sp-only mpu version (Jeff Rasley)
  • improved debug
  • added all_gather_object to dist
  • UlyssesSPAttentionHF (port of UlyssesAttention from Megatron-Deepspeed plus modern MHA-variations)
  • UlyssesSPDataLoaderAdapter - DL adapter to shard the normal DL batches to be used by UlyssesSPAttentionHF
  • SequenceTiledCompute - generic autograd function to perform compute after tiling on the sequence dimension
  • TiledMLP - a specific autograd function to perform tiled MLP (it's much easier to understand before trying to grok SequenceTiledCompute)
  • added a differentiable _DimZeroAllToAll (Samyam Rajbhandari)
  • torch-dist-check now allows torch.distributed.nn (which is needed since deepspeed's dist is not up to date with torch.distributed.nn)

@stas00 stas00 changed the title add ds_grad_is_ready flag support Ulysses SP for HF Integration Apr 30, 2025
jeffra and others added 19 commits May 8, 2025 00:33
Signed-off-by: Stas Bekman <[email protected]>
Signed-off-by: Stas Bekman <[email protected]>
Signed-off-by: Stas Bekman <[email protected]>
Signed-off-by: Stas Bekman <[email protected]>
Signed-off-by: Stas Bekman <[email protected]>
Signed-off-by: Stas Bekman <[email protected]>
Signed-off-by: Stas Bekman <[email protected]>
Signed-off-by: Stas Bekman <[email protected]>
Signed-off-by: Stas Bekman <[email protected]>
Signed-off-by: Stas Bekman <[email protected]>
Signed-off-by: Stas Bekman <[email protected]>
Signed-off-by: Stas Bekman <[email protected]>
Signed-off-by: Stas Bekman <[email protected]>
@stas00 stas00 marked this pull request as ready for review May 21, 2025 23:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants