-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Issues: deepspeedai/DeepSpeed
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[REQUEST] option to shard weights only in each node
enhancement
New feature or request
#7019
opened Feb 8, 2025 by
cyr0930
[BUG] Fix ds_chat regression
bug
Something isn't working
training
#7014
opened Feb 7, 2025 by
tjruwase
[REQUEST] Support Offload deepspeed engine in RLHF training
enhancement
New feature or request
#7013
opened Feb 7, 2025 by
hijkzzz
[REQUEST] Possiblity of integrating LongVU with DeepSpeed
enhancement
New feature or request
#7006
opened Feb 5, 2025 by
xiaoqian-shen
[BUG] mpi based training error
bug
Something isn't working
training
#6997
opened Feb 4, 2025 by
cyr0930
AttributeError: 'DeepSpeedZeroOptimizer' object has no attribute 'ipg_index'
#6995
opened Feb 3, 2025 by
Tengxf
[BUG] loading model error
bug
Something isn't working
training
#6994
opened Feb 3, 2025 by
tengwang0318
[REQUEST] adding type hints and New feature or request
py.typed
metadata
enhancement
#6988
opened Jan 31, 2025 by
jamesbraza
model.parameters() return [Parameter containing: tensor([], device='cuda:0', dtype=torch.bfloat16, requires_grad=True)] when using zero3
bug
Something isn't working
training
#6987
opened Jan 31, 2025 by
fanfanffff1
[BUG] Invalidate trace cache warning
bug
Something isn't working
training
#6985
opened Jan 30, 2025 by
leachim
[BUG] pdsh runner doesn't work with tqdm bar
bug
Something isn't working
training
#6978
opened Jan 29, 2025 by
Superskyyy
[BUG] Errors in GPT-MoE models Inferences
bug
Something isn't working
inference
#6973
opened Jan 25, 2025 by
1155157110
[BUG] libaio on amd node
bug
Something isn't working
training
#6972
opened Jan 25, 2025 by
GuanhuaWang
[BUG] the input variables may be changed to scalars when use activation checkpoint
bug
Something isn't working
training
#6969
opened Jan 23, 2025 by
zhangvia
[BUG] z3+compile+gradient checkpoint uses more memory
bug
Something isn't working
training
#6966
opened Jan 22, 2025 by
oraluben
Is "Hierarchical All-to-all" feat available in current version?
#6957
opened Jan 16, 2025 by
GalanPei
[REQUEST] FPDT backward test
enhancement
New feature or request
#6955
opened Jan 16, 2025 by
YizhouZ
[REQUEST] Pipeline Parallelism support multi optimizer to train
enhancement
New feature or request
#6951
opened Jan 15, 2025 by
whcjb
Previous Next
ProTip!
Follow long discussions with comments:>50.