-
Notifications
You must be signed in to change notification settings - Fork 421
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Avoid searching unnecessary dirs for shared libs
bug
Something isn't working
#1801
opened May 20, 2025 by
timmoon10
Loading…
4 of 13 tasks
Check cublas/cuda versions for deprecated cublasLt GEMM attributes
#1800
opened May 19, 2025 by
ksivaman
Loading…
5 of 13 tasks
Use an empty torch tensor to indicate no fp8 information in extra_state
2.4.0
#1799
opened May 19, 2025 by
pstjohn
Loading…
1 of 13 tasks
fix model parallel encoder to be properly sharded params
#1794
opened May 16, 2025 by
sudhakarsingh27
Loading…
6 of 13 tasks
[PyTorch][MoE] Reduce CPU Overhead By Fuse Torch Empty Calls
performance
Performance issues
#1793
opened May 16, 2025 by
zhongbozhu
Loading…
1 of 13 tasks
[JAX][Draft] PR #1666 rebase, squash, and debug
#1792
opened May 16, 2025 by
huanghua1994
Loading…
7 of 13 tasks
Add huggingface from_pretrained / save_pretrained tests
#1787
opened May 14, 2025 by
pstjohn
Loading…
[common] Added support of FP4 data type
#1779
opened May 13, 2025 by
Oleg-Goncharov
Loading…
6 of 13 tasks
[PyTorch] Update PyTorch FSDP2 test to cover all TE layer types
2.4.0
testing
Improvements to tests or testing infrastructure
#1777
opened May 12, 2025 by
denera
Loading…
8 of 13 tasks
linear: clear row-wise weight at the end of forward
#1770
opened May 12, 2025 by
kshitij12345
•
Draft
[PyTorch] Refactor activation offloading of quantized tensors.
#1738
opened Apr 30, 2025 by
pggPL
Loading…
8 of 13 tasks
fix: update grad_output quant to avoid redundant work
#1736
opened Apr 30, 2025 by
kshitij12345
Loading…
correct weight quantizer for grouped_linear/layernorm_linear and layernorm_mlp
#1733
opened Apr 29, 2025 by
HuangHunag-MT
Loading…
8 of 13 tasks
Support Context Parallel for Multi Latent Attention (MLA)
#1729
opened Apr 29, 2025 by
yuzhongw-nvidia
Loading…
13 tasks
[JAX] Decouple Recipe and ScalingMode
#1728
opened Apr 29, 2025 by
jberchtold-nvidia
Loading…
8 of 13 tasks
Add variance calculation from FusedAdam optimizer states
#1726
opened Apr 28, 2025 by
kwyss-nvidia
Loading…
7 of 13 tasks
[JAX] Updated: unbalanced CP with THD format
#1709
opened Apr 22, 2025 by
huanghua1994
Loading…
8 of 13 tasks
[PyTorch] FP8 Subchannel Recipe With FP8 Gather And Configurable Scaling Factor Tensor Swizzling
#1707
opened Apr 21, 2025 by
zhongbozhu
Loading…
1 of 13 tasks
Previous Next
ProTip!
Updated in the last three days: updated:>2025-05-16.