-
Notifications
You must be signed in to change notification settings - Fork 130
Issues: rwth-i6/returnn
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Train proc manager restarts after Bus error crash, still consumes GPU memory, get OutOfMemoryError
#1649
opened Nov 15, 2024 by
albertz
Tensor
Dim
, support Dim.capacity > max(Dim.dyn_size_ext)
JAX
TPU
#1641
opened Nov 6, 2024 by
albertz
Potential timeout during data caching in multi-node trainings
bug
#1638
opened Oct 23, 2024 by
NeoLegends
RF cross_entropy (matmul, gather) should maybe have allow_broadcast?
#1636
opened Oct 19, 2024 by
albertz
rf.BatchNorm keeps updating statistics when used in eval mode in training
returnn-frontend
#1625
opened Sep 12, 2024 by
mnghiap
rf.RelPosCausalSelfAttention
fails with single_step_dim
returnn-frontend
#1585
opened Jul 17, 2024 by
LucaG1
Torch gradient_checkpoint_scope could trigger segmentation fault?
#1581
opened Jul 12, 2024 by
albertz
RuntimeError: CUDA error: an illegal memory access was encountered
#1577
opened Jul 9, 2024 by
albertz
Datasets: blocklist in addition to allowlist for segment list file
#1566
opened Jul 2, 2024 by
NeoLegends
SlowMo (BMUF) support for PyTorch distributed training
MultiGPU
PyTorch
#1553
opened Jun 26, 2024 by
albertz
Previous Next
ProTip!
Follow long discussions with comments:>50.