deepspeedai / DeepSpeed Public

Notifications You must be signed in to change notification settings
Fork 4.2k
Star 36.6k

Code
Issues 999
Pull requests 106
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Issues: deepspeedai/DeepSpeed

[Roadmap] DeepSpeed Roadmap Q1 2025

#6946 opened Jan 13, 2025 by loadams

Open

Labels 30 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

999 Open 1,961 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[REQUEST] option to shard weights only in each node enhancement

New feature or request

#7019 opened Feb 8, 2025 by cyr0930

[Question] How to control the parallelism way of modules

#7017 opened Feb 8, 2025 by mengniwang95

[BUG] Fix ds_chat regression bug

Something isn't working

training

#7014 opened Feb 7, 2025 by tjruwase

[REQUEST] Support Offload deepspeed engine in RLHF training enhancement

New feature or request

#7013 opened Feb 7, 2025 by hijkzzz

nv-ds-chat CI test failure ci-failure

#7012 opened Feb 7, 2025 by github-actions bot

How to make allreduce fully-overlapped in ZeRO-2

#7009 opened Feb 6, 2025 by 2012zzhao

[REQUEST] Possiblity of integrating LongVU with DeepSpeed enhancement

New feature or request

#7006 opened Feb 5, 2025 by xiaoqian-shen

For Flux inpaint distribute inference (Multi-GPU)

#7005 opened Feb 5, 2025 by DENGBOYU-REX

[BUG] mpi based training error bug

Something isn't working

training

#6997 opened Feb 4, 2025 by cyr0930

How can I log accuracy after each epoch?

#6996 opened Feb 4, 2025 by Tess314

AttributeError: 'DeepSpeedZeroOptimizer' object has no attribute 'ipg_index'

#6995 opened Feb 3, 2025 by Tengxf

[BUG] loading model error bug

Something isn't working

training

#6994 opened Feb 3, 2025 by tengwang0318

TypeError: DeepSpeedZeroOptimizer is not an Optimizer

#6992 opened Feb 3, 2025 by Tengxf

[REQUEST] adding type hints and py.typed metadata enhancement

New feature or request

#6988 opened Jan 31, 2025 by jamesbraza

model.parameters() return [Parameter containing: tensor([], device='cuda:0', dtype=torch.bfloat16, requires_grad=True)] when using zero3 bug

Something isn't working

training

#6987 opened Jan 31, 2025 by fanfanffff1

[BUG] Invalidate trace cache warning bug

Something isn't working

training

#6985 opened Jan 30, 2025 by leachim

[BUG] pdsh runner doesn't work with tqdm bar bug

Something isn't working

training

#6978 opened Jan 29, 2025 by Superskyyy

[BUG] Errors in GPT-MoE models Inferences bug

Something isn't working

inference

#6973 opened Jan 25, 2025 by 1155157110

[BUG] libaio on amd node bug

Something isn't working

training

#6972 opened Jan 25, 2025 by GuanhuaWang

GPUUtil-0 remains 0 during the process loading a 72B model

#6970 opened Jan 24, 2025 by NivinaNull

[BUG] the input variables may be changed to scalars when use activation checkpoint bug

Something isn't working

training

#6969 opened Jan 23, 2025 by zhangvia

[BUG] z3+compile+gradient checkpoint uses more memory bug

Something isn't working

training

#6966 opened Jan 22, 2025 by oraluben

Is "Hierarchical All-to-all" feat available in current version?

#6957 opened Jan 16, 2025 by GalanPei

[REQUEST] FPDT backward test enhancement

New feature or request

#6955 opened Jan 16, 2025 by YizhouZ

[REQUEST] Pipeline Parallelism support multi optimizer to train enhancement

New feature or request

#6951 opened Jan 15, 2025 by whcjb

Previous 1 2 3 4 5 … 39 40 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly