Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] main from NVIDIA:main #17

Merged
merged 10 commits into from
Jul 8, 2024
Merged

[pull] main from NVIDIA:main #17

merged 10 commits into from
Jul 8, 2024

Conversation

pull[bot]
Copy link

@pull pull bot commented Jul 2, 2024

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

eee4017 and others added 2 commits July 2, 2024 10:40
#957)

* NVTE_OVERRIDE_MAX_SEQ_LEN

Signed-off-by: Frank Lin <[email protected]>

* small fix

Signed-off-by: Frank Lin <[email protected]>

* preserve old amax_and_scale_update_inplace and new amax_and_scale_update_inplace

Signed-off-by: Frank Lin <[email protected]>

* remove useless code path; try to simplify logic within the baseline

Signed-off-by: Frank Lin <[email protected]>

* simplify logic

Signed-off-by: Frank Lin <[email protected]>

* small fix

Signed-off-by: Frank Lin <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comments from Timmoon

Signed-off-by: Frank Lin <[email protected]>

* fix comments from Timmoon

Signed-off-by: Frank Lin <[email protected]>

* Update transformer_engine/paddle/distributed.py

Co-authored-by: Tim Moon <[email protected]>
Signed-off-by: Frank Lin <[email protected]>

* diable bw fp8 update

Signed-off-by: Frank Lin <[email protected]>

* fix lint

Signed-off-by: Frank Lin <[email protected]>

* fix ci error

Signed-off-by: Frank Lin <[email protected]>

---------

Signed-off-by: Frank Lin <[email protected]>
Co-authored-by: Frank Lin (Engrg-Hardware 1) <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Tim Moon <[email protected]>
Fix typo when selecting tuned RMSNorm kernels

Signed-off-by: Tim Moon <[email protected]>
@pull pull bot added the ⤵️ pull label Jul 2, 2024
zlsh80826 and others added 8 commits July 2, 2024 19:24
* Integrate experimental ragged offset

Signed-off-by: Reese Wang <[email protected]>

* Use per sequence based offsets

Signed-off-by: Reese Wang <[email protected]>

* Format

Signed-off-by: Reese Wang <[email protected]>

* Remove v/o_seq_offsets

Signed-off-by: Reese Wang <[email protected]>

* Add FP16 sanity tests and remove forward tests from the automatically run tests

Signed-off-by: Reese Wang <[email protected]>

* Enhance input checks

Signed-off-by: Reese Wang <[email protected]>

* Separate fused attn to 2 differnt APIs and add the docs

Signed-off-by: Reese Wang <[email protected]>

* Add experimental to the docs

Signed-off-by: Reese Wang <[email protected]>

* Fix lint

Signed-off-by: Reese Wang <[email protected]>

* Add runtime segments check

Signed-off-by: Reese Wang <[email protected]>

* Remove finished TODO

Signed-off-by: Reese Wang <[email protected]>

---------

Signed-off-by: Reese Wang <[email protected]>
* removed libcuda.so link at compile time for TE/PyTorch extension

Signed-off-by: Alp Dener <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* linting fixes

Signed-off-by: Alp Dener <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated get_symbol() in TE/common/cuda_utils.h to new impl based on cudaGetDriverEntryPoint

Signed-off-by: Alp Dener <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix duplicate quotation

Signed-off-by: Alp Dener <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alp Dener <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
* update to FE 1.5.1 and add bottom right causal

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adjust logic for backend selection

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update FE to 1.5.2

Signed-off-by: Charlene Yang <[email protected]>

* add get_attention_backend function

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update get_attention_backend

Signed-off-by: Charlene Yang <[email protected]>

* fix get_attention_backend

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* tweak get_attention_backend and fix unit tests

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fixes for unfused, get_backend, etc

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update transformer_engine/pytorch/attention.py

Co-authored-by: Tim Moon <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>

* fix cpu offload

Signed-off-by: Charlene Yang <[email protected]>

* minor fixes for get_attention_backend

Signed-off-by: Charlene Yang <[email protected]>

* explicitly skip FP32 and padding tests because there is no support

Signed-off-by: Charlene Yang <[email protected]>

* minor fix for window size check

Signed-off-by: Charlene Yang <[email protected]>

* update check_set_window_size and add enc_dec_attn_mask_type/enc_dec_window_size

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fixes

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Charlene Yang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Tim Moon <[email protected]>
Fix size mismatch error in fp8 transpose.

Signed-off-by: Dennis Liu <[email protected]>
)

* remove implicit padding and unpadding

Signed-off-by: Xin Yao <[email protected]>
---------

Signed-off-by: Xin Yao <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Phuong Nguyen <[email protected]>
…olumn') to adapt DiT of PaddleMIX (#963)

[Paddle] Fix forward and backward of Linear(parallel_mode='column')

When te.Linear(parallel_mode='column') is not used in pairs with te.Linear(parallel_mode='row'), the output should to be all-gathered when forward and reduce-scattered when backward.

Signed-off-by: minyu <[email protected]>
Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
* add parallel build without pyproject

Signed-off-by: Phuong Nguyen <[email protected]>

---------

Signed-off-by: Phuong Nguyen <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
Build for python < 3.8

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
@phu0ngng phu0ngng merged commit 8062ac5 into phu0ngng:main Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants