[pull] main from NVIDIA:main #17

pull · 2024-07-02T23:00:32Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

#957) * NVTE_OVERRIDE_MAX_SEQ_LEN Signed-off-by: Frank Lin <[email protected]> * small fix Signed-off-by: Frank Lin <[email protected]> * preserve old amax_and_scale_update_inplace and new amax_and_scale_update_inplace Signed-off-by: Frank Lin <[email protected]> * remove useless code path; try to simplify logic within the baseline Signed-off-by: Frank Lin <[email protected]> * simplify logic Signed-off-by: Frank Lin <[email protected]> * small fix Signed-off-by: Frank Lin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix comments from Timmoon Signed-off-by: Frank Lin <[email protected]> * fix comments from Timmoon Signed-off-by: Frank Lin <[email protected]> * Update transformer_engine/paddle/distributed.py Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Frank Lin <[email protected]> * diable bw fp8 update Signed-off-by: Frank Lin <[email protected]> * fix lint Signed-off-by: Frank Lin <[email protected]> * fix ci error Signed-off-by: Frank Lin <[email protected]> --------- Signed-off-by: Frank Lin <[email protected]> Co-authored-by: Frank Lin (Engrg-Hardware 1) <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tim Moon <[email protected]>

Fix typo when selecting tuned RMSNorm kernels Signed-off-by: Tim Moon <[email protected]>

* Integrate experimental ragged offset Signed-off-by: Reese Wang <[email protected]> * Use per sequence based offsets Signed-off-by: Reese Wang <[email protected]> * Format Signed-off-by: Reese Wang <[email protected]> * Remove v/o_seq_offsets Signed-off-by: Reese Wang <[email protected]> * Add FP16 sanity tests and remove forward tests from the automatically run tests Signed-off-by: Reese Wang <[email protected]> * Enhance input checks Signed-off-by: Reese Wang <[email protected]> * Separate fused attn to 2 differnt APIs and add the docs Signed-off-by: Reese Wang <[email protected]> * Add experimental to the docs Signed-off-by: Reese Wang <[email protected]> * Fix lint Signed-off-by: Reese Wang <[email protected]> * Add runtime segments check Signed-off-by: Reese Wang <[email protected]> * Remove finished TODO Signed-off-by: Reese Wang <[email protected]> --------- Signed-off-by: Reese Wang <[email protected]>

* removed libcuda.so link at compile time for TE/PyTorch extension Signed-off-by: Alp Dener <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * linting fixes Signed-off-by: Alp Dener <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated get_symbol() in TE/common/cuda_utils.h to new impl based on cudaGetDriverEntryPoint Signed-off-by: Alp Dener <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix duplicate quotation Signed-off-by: Alp Dener <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Alp Dener <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Kirthi Shankar Sivamani <[email protected]>

* update to FE 1.5.1 and add bottom right causal Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adjust logic for backend selection Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update FE to 1.5.2 Signed-off-by: Charlene Yang <[email protected]> * add get_attention_backend function Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update get_attention_backend Signed-off-by: Charlene Yang <[email protected]> * fix get_attention_backend Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * tweak get_attention_backend and fix unit tests Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes for unfused, get_backend, etc Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/pytorch/attention.py Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Charlene Yang <[email protected]> * fix cpu offload Signed-off-by: Charlene Yang <[email protected]> * minor fixes for get_attention_backend Signed-off-by: Charlene Yang <[email protected]> * explicitly skip FP32 and padding tests because there is no support Signed-off-by: Charlene Yang <[email protected]> * minor fix for window size check Signed-off-by: Charlene Yang <[email protected]> * update check_set_window_size and add enc_dec_attn_mask_type/enc_dec_window_size Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Charlene Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tim Moon <[email protected]>

Fix size mismatch error in fp8 transpose. Signed-off-by: Dennis Liu <[email protected]>

) * remove implicit padding and unpadding Signed-off-by: Xin Yao <[email protected]> --------- Signed-off-by: Xin Yao <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Phuong Nguyen <[email protected]>

…olumn') to adapt DiT of PaddleMIX (#963) [Paddle] Fix forward and backward of Linear(parallel_mode='column') When te.Linear(parallel_mode='column') is not used in pairs with te.Linear(parallel_mode='row'), the output should to be all-gathered when forward and reduce-scattered when backward. Signed-off-by: minyu <[email protected]> Co-authored-by: Kirthi Shankar Sivamani <[email protected]>

* add parallel build without pyproject Signed-off-by: Phuong Nguyen <[email protected]> --------- Signed-off-by: Phuong Nguyen <[email protected]> Co-authored-by: Tim Moon <[email protected]>

Build for python < 3.8 Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

eee4017 and others added 2 commits July 2, 2024 10:40

[Core] Fix bug when selecting tuned RMSNorm kernels (#983)

7669bf3

Fix typo when selecting tuned RMSNorm kernels Signed-off-by: Tim Moon <[email protected]>

pull bot added the ⤵️ pull label Jul 2, 2024

zlsh80826 and others added 8 commits July 2, 2024 19:24

[MoE][Pytorch]Fix size mismatch error in fp8 transpose. (#988)

e3bb24e

Fix size mismatch error in fp8 transpose. Signed-off-by: Dennis Liu <[email protected]>

Parallel build with limited resource (#987)

a8c83f8

* add parallel build without pyproject Signed-off-by: Phuong Nguyen <[email protected]> --------- Signed-off-by: Phuong Nguyen <[email protected]> Co-authored-by: Tim Moon <[email protected]>

Support individual framework builds for python<=3.7 (#997)

8062ac5

Build for python < 3.8 Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

phu0ngng merged commit 8062ac5 into phu0ngng:main Jul 8, 2024

phu0ngng had a problem deploying to github-pages July 8, 2024 20:49 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from NVIDIA:main #17

[pull] main from NVIDIA:main #17

pull bot commented Jul 2, 2024 •

edited

Loading

[pull] main from NVIDIA:main #17

[pull] main from NVIDIA:main #17

Conversation

pull bot commented Jul 2, 2024 • edited Loading

pull bot commented Jul 2, 2024 •

edited

Loading