rwth-i6 / returnn Public

Notifications You must be signed in to change notification settings
Fork 130
Star 349

Code
Issues 165
Pull requests 25
Discussions
Actions
Projects
Wiki
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Wiki
Security
Insights

Issues: rwth-i6/returnn

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

165 Open 491 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Train proc manager restarts after Bus error crash, still consumes GPU memory, get OutOfMemoryError

#1649 opened Nov 15, 2024 by albertz

Unexpected bus error encountered in worker

#1648 opened Nov 14, 2024 by albertz

remove Nose dependency

#1647 opened Nov 14, 2024 by albertz

Plan for packed dims

#1645 opened Nov 13, 2024 by albertz

RF (PT) meaning of losses with as_error

#1642 opened Nov 8, 2024 by albertz

Tensor Dim, support Dim.capacity > max(Dim.dyn_size_ext) JAX TPU

#1641 opened Nov 6, 2024 by albertz

Potential timeout during data caching in multi-node trainings bug

#1638 opened Oct 23, 2024 by NeoLegends

RF cross_entropy (matmul, gather) should maybe have allow_broadcast?

#1636 opened Oct 19, 2024 by albertz

Remove outdated Python header attribs?

#1635 opened Oct 18, 2024 by albertz

Sharding for multi-GPU training

#1634 opened Oct 15, 2024 by albertz

rf.BatchNorm keeps updating statistics when used in eval mode in training returnn-frontend

#1625 opened Sep 12, 2024 by mnghiap

TF end layer independent of batch causes error in beam search

#1606 opened Aug 26, 2024 by albertz

rf.RelPosCausalSelfAttention fails with single_step_dim returnn-frontend

#1585 opened Jul 17, 2024 by LucaG1

Torch multiple simultaneous gradient_checkpoint_scope

#1583 opened Jul 15, 2024 by albertz

Torch gradient_checkpoint_scope potential memory leak

#1582 opened Jul 12, 2024 by albertz

Torch gradient_checkpoint_scope could trigger segmentation fault?

#1581 opened Jul 12, 2024 by albertz

RuntimeError: CUDA error: an illegal memory access was encountered

#1577 opened Jul 9, 2024 by albertz

Torch: print model at log verbosity 3

#1575 opened Jul 5, 2024 by NeoLegends

multiprocessing: OSError: AF_UNIX path too long

#1571 opened Jul 3, 2024 by michelwi

Ignore a single broken gradient

#1568 opened Jul 2, 2024 by JackTemaki

Datasets: blocklist in addition to allowlist for segment list file

#1566 opened Jul 2, 2024 by NeoLegends

Hang in training (often with multi GPU training)

#1558 opened Jun 28, 2024 by albertz

RF scaled_dot_product_attention

#1555 opened Jun 27, 2024 by albertz

SlowMo (BMUF) support for PyTorch distributed training MultiGPU PyTorch

#1553 opened Jun 26, 2024 by albertz

Tensor deepcopy does not copy raw_tensor

#1541 opened Jun 17, 2024 by albertz

Previous 1 2 3 4 5 6 7 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly