Skip to content

Pull requests: intelligent-machine-learning/dlrover

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Fix known issue of job context using.
#1326 opened Nov 7, 2024 by BalaBalaYi Loading…
Job exit when all nodecheck failed
#1323 opened Nov 6, 2024 by majieyue Loading…
feat: generate pyi files for protobuf definitions
#1322 opened Nov 6, 2024 by Peefy Loading…
Expose ckpt events
#1321 opened Nov 5, 2024 by samplise Loading…
fix: typo RayJobSubmitter in ray_job_submitter.py
#1320 opened Nov 5, 2024 by Peefy Loading…
WIP: Refactor diagnosis manager
#1318 opened Nov 5, 2024 by samplise Loading…
[WIP] Refactor diagnosis manager
#1302 opened Oct 18, 2024 by samplise Loading…
[WIP] Training hang detection based on XPU Timer metric. enhancement New feature or request
#1288 opened Oct 11, 2024 by BalaBalaYi Loading…
【WIP】add pod diagnosis feature
#1219 opened Aug 1, 2024 by xiaochaoren Loading…
Add sockct close v2
#1168 opened Jun 26, 2024 by yangrudan Loading…
add util for loss spike save and decode.
#1044 opened Mar 21, 2024 by haikuotiankong1212 Loading…
ProTip! Exclude everything labeled bug with -label:bug.