Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] main from NVIDIA:main #90

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

[pull] main from NVIDIA:main #90

wants to merge 13 commits into from

Conversation

pull[bot]
Copy link

@pull pull bot commented Mar 6, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

timmoon10 and others added 4 commits March 6, 2025 08:57
* Enable MXFP8 LayerNorm and RMSNorm

Signed-off-by: Tim Moon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix compilation

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

* Fix envvar

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
Signed-off-by: Nicolas Castet <[email protected]>
* fix

Signed-off-by: Pawel Gadzinski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

Signed-off-by: Pawel Gadzinski <[email protected]>

* fix

Signed-off-by: Pawel Gadzinski <[email protected]>

* test

Signed-off-by: Pawel Gadzinski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more sensitive tests

Signed-off-by: Pawel Gadzinski <[email protected]>

* typo fix and skip test on blackwell fix

Signed-off-by: Pawel Gadzinski <[email protected]>

---------

Signed-off-by: Pawel Gadzinski <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@pull pull bot added the ⤵️ pull label Mar 6, 2025
vasunvidia and others added 9 commits March 6, 2025 14:56
* Remove cudaStreamSync. call

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Use cudaMemsetAsync instead of cudaMemcpyAsync

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Update transformer_engine/common/transformer_engine.cpp

Co-authored-by: Tim Moon <[email protected]>
Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

---------

Signed-off-by: Vasudevan Rengasamy <[email protected]>
Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
Add NVTX ranges

Signed-off-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
Don't set data to null

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
Fix incorrect docstrings in tensor saving functions

Signed-off-by: Tim Moon <[email protected]>
* fix recompilation of out and lse correction in p2p+bshd/sbhd

Signed-off-by: Xiaowei Ren <[email protected]>

* fix recompilation of get_seq_chunk_ids_for_reordering

Signed-off-by: Xiaowei Ren <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix recomplilation of reorder_seq_chunks_for_a2a

Signed-off-by: Xiaowei Ren <[email protected]>

* recover a change

Signed-off-by: Xiaowei Ren <[email protected]>

* typo fix

Signed-off-by: Xiaowei Ren <[email protected]>

* minor change to softmax_lse correction

Signed-off-by: Xiaowei Ren <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cache cu_seqlens for BSHD/SBHD format

Signed-off-by: Xiaowei Ren <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* do not need to allocate out buffer for BSHD/SBHD

Signed-off-by: Xiaowei Ren <[email protected]>

* code refactoring

Signed-off-by: Xiaowei Ren <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fix

Signed-off-by: Xiaowei Ren <[email protected]>

* refactor init out correction

Signed-off-by: Xiaowei Ren <[email protected]>

* fix a docstring

Signed-off-by: Xiaowei Ren <[email protected]>

* typo fix

Signed-off-by: Xiaowei Ren <[email protected]>

* code refactoring

Signed-off-by: Xiaowei Ren <[email protected]>

* fix init out correct dtype

Signed-off-by: Xiaowei Ren <[email protected]>

* add pad_between_seqs to DPA API

Signed-off-by: Xiaowei Ren <[email protected]>

* add pad_between_seqs to the API of MHA and transformer layer

Signed-off-by: Xiaowei Ren <[email protected]>

* add pad_between_seqs to the API of MHA and transformer layer

Signed-off-by: Xiaowei Ren <[email protected]>

---------

Signed-off-by: Xiaowei Ren <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* check in per-tensor current scaling full recipe

Signed-off-by: zhongboz <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: zhongboz <[email protected]>

setup basics of current scaling quantizer in python level

Signed-off-by: zhongboz <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: zhongboz <[email protected]>

add test case for current scaling dequantize

Signed-off-by: zhongboz <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: zhongboz <[email protected]>

finish linear layer fwd bwd test, determined error with bf16

Signed-off-by: zhongboz <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: zhongboz <[email protected]>

achieved zero tolerance for Linear by specify gemm use_split_accumulator config

Signed-off-by: zhongboz <[email protected]>

enable layernormlinear with current scaling, pass bitwise test

Signed-off-by: zhongboz <[email protected]>

refactor test case code

Signed-off-by: zhongboz <[email protected]>

make current scaling quantizers distrbuted, pass distributed linear&layernormlinear tests

Signed-off-by: zhongboz <[email protected]>

bug fix: use cached fp8 recipe in backward

Signed-off-by: zhongboz <[email protected]>

fix layernorm_mlp with current scaling, fix activation_helper with current scaling

Signed-off-by: zhongboz <[email protected]>

support detailed numerical settings from recipe to quantization kernel

Signed-off-by: zhongboz <[email protected]>

resolving MR comments

Signed-off-by: zhongboz <[email protected]>

recipe naming

Signed-off-by: zhongboz <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolve mr comments, remove IS_CURRENT_SCALING template from kernels

Signed-off-by: zhongboz <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolve mr comments, make current scaling c++ test cases

Signed-off-by: zhongboz <[email protected]>

* add current scaling to test_numerics.py, skip act recomp and grouped linear

Signed-off-by: zhongboz <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add benchmark for quantizer

Signed-off-by: zhongboz <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add benchmarks for linear layer

Signed-off-by: zhongboz <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bug fix, typo

Signed-off-by: zhongboz <[email protected]>

* resolve more mr comments

Signed-off-by: zhongboz <[email protected]>

* avoid potential race condition by not using from_blob to construct amax tensor in C++

Signed-off-by: zhongboz <[email protected]>

* resolve more comments

Signed-off-by: zhongboz <[email protected]>

* Debug linter warnings and license check

Signed-off-by: Tim Moon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Debug import error in FP8 tensor test

Signed-off-by: Tim Moon <[email protected]>

* Debug compilation error with CUDA 12.1 for Turing

Signed-off-by: Tim Moon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolve mr comments, fix activation cast fusion

Signed-off-by: zhongboz <[email protected]>

* resolve comments, add NVTEQuantizationParams for compute scale

Signed-off-by: zhongboz <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove is_current_scaling check totally from common folder

Signed-off-by: zhongboz <[email protected]>

* remove benchmarks, will contribute in another repo

Signed-off-by: zhongboz <[email protected]>

* adjust cs default recipe config

Signed-off-by: zhongboz <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adjust comments in test

Signed-off-by: zhongboz <[email protected]>

* Remove current scaling mode from core lib

Signed-off-by: Tim Moon <[email protected]>

* Refactor current-scaling-specific logic in core C++ lib

Move amax and scale update functions out of casting functions, and put into dedicated current-scaling source file. Add general API for accessing quantization config object.

Signed-off-by: Tim Moon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add missing header in C++ tests

Signed-off-by: Tim Moon <[email protected]>

* Disable test config with FP8 transpose on Blackwell

Signed-off-by: Tim Moon <[email protected]>

* Fix compilation error in C++ test

Signed-off-by: Tim Moon <[email protected]>

---------

Signed-off-by: zhongboz <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: zhongboz <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Tim Moon <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
* Verified TE2.0 with offloading

Signed-off-by: Selvaraj Anandaraj <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Skipping tests for Ampere and removed child class preparing

Signed-off-by: Selvaraj Anandaraj <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* offloading support for MXFP8 dtype

Signed-off-by: Selvaraj Anandaraj <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Changed quantized tensor detection mechanism

Signed-off-by: Selvaraj Anandaraj <[email protected]>

* Fix mxfp8 offload, lint errors, and var name

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

* Supported disabling offloading for quantized tensors

Signed-off-by: Selvaraj Anandaraj <[email protected]>

* bug fix

Signed-off-by: Selvaraj Anandaraj <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bugs

Signed-off-by: Selvaraj Anandaraj <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added support for None in list of Quantized data tensors

Signed-off-by: root <[email protected]>

* Hopper backward compatibility cleanup

Signed-off-by: Selvaraj Anandaraj <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Coding style nit

Signed-off-by: Selvaraj Anandaraj <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added guards

Signed-off-by: Selvaraj Anandaraj <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Selvaraj Anandaraj <[email protected]>
Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
Co-authored-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
Internal quantizer for input to the modules

Signed-off-by: Przemek Tredak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants