-
Notifications
You must be signed in to change notification settings - Fork 166
Issues: intelligent-machine-learning/dlrover
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Could DLRover be able to apply to the diffusion transformer training? And combined with deepspeed?
#1314
opened Oct 29, 2024 by
TomSuen
How does dlrover make sure all the nodes in one job are in one switch
#1298
opened Oct 17, 2024 by
gangxie112
while using megatron distributed flash-checkpoint to recovery, error ocurs when load_checkpoint
#1233
opened Aug 13, 2024 by
deepcoldfish
Why model_optim_rng.pt is saved in a seperate directory?
stale
#1225
opened Aug 2, 2024 by
zhaoyang-star
scale down allreduct pytorch job won't complete and report error
stale
#1215
opened Jul 29, 2024 by
cocodee
megatron-lm flash-ckpt can not save ckpt to disk when use pipeline parallel
#1146
opened May 29, 2024 by
Lzhang-hub
[Example]Expand the nanogpt example to babyllama.
enhancement
New feature or request
example
This issue request a new usage example for user
stale
#543
opened Jul 28, 2023 by
Antlera
Develop algorithms for auto-tuning both GPU memory usage and training performance.
stale
#470
opened Jul 1, 2023 by
workingloong
A document to deploy the Brain service to optimize a job.
dlrover-brain
stale
#357
opened Mar 30, 2023 by
workingloong
Make Job's Brain Relevant Parameters Configurable in Job Yaml
enhancement
New feature or request
stale
#294
opened Mar 8, 2023 by
samplise
Use ray
placement group
for resource allocation and actor scheduling
stale
#242
opened Feb 22, 2023 by
hxdtest
Provide definition and suggested usage for NodeGroupResource, lunch_nodes, removed_nodes in Scaler.
stale
#218
opened Feb 9, 2023 by
hxdtest
ProTip!
no:milestone will show everything without a milestone.