Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[PyTorch] Enabling Per-Tensor Current Scaling Recipe (NVIDIA#1471)
* check in per-tensor current scaling full recipe Signed-off-by: zhongboz <[email protected]> [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: zhongboz <[email protected]> setup basics of current scaling quantizer in python level Signed-off-by: zhongboz <[email protected]> [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: zhongboz <[email protected]> add test case for current scaling dequantize Signed-off-by: zhongboz <[email protected]> [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: zhongboz <[email protected]> finish linear layer fwd bwd test, determined error with bf16 Signed-off-by: zhongboz <[email protected]> [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: zhongboz <[email protected]> achieved zero tolerance for Linear by specify gemm use_split_accumulator config Signed-off-by: zhongboz <[email protected]> enable layernormlinear with current scaling, pass bitwise test Signed-off-by: zhongboz <[email protected]> refactor test case code Signed-off-by: zhongboz <[email protected]> make current scaling quantizers distrbuted, pass distributed linear&layernormlinear tests Signed-off-by: zhongboz <[email protected]> bug fix: use cached fp8 recipe in backward Signed-off-by: zhongboz <[email protected]> fix layernorm_mlp with current scaling, fix activation_helper with current scaling Signed-off-by: zhongboz <[email protected]> support detailed numerical settings from recipe to quantization kernel Signed-off-by: zhongboz <[email protected]> resolving MR comments Signed-off-by: zhongboz <[email protected]> recipe naming Signed-off-by: zhongboz <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * resolve mr comments, remove IS_CURRENT_SCALING template from kernels Signed-off-by: zhongboz <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * resolve mr comments, make current scaling c++ test cases Signed-off-by: zhongboz <[email protected]> * add current scaling to test_numerics.py, skip act recomp and grouped linear Signed-off-by: zhongboz <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add benchmark for quantizer Signed-off-by: zhongboz <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add benchmarks for linear layer Signed-off-by: zhongboz <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * bug fix, typo Signed-off-by: zhongboz <[email protected]> * resolve more mr comments Signed-off-by: zhongboz <[email protected]> * avoid potential race condition by not using from_blob to construct amax tensor in C++ Signed-off-by: zhongboz <[email protected]> * resolve more comments Signed-off-by: zhongboz <[email protected]> * Debug linter warnings and license check Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Debug import error in FP8 tensor test Signed-off-by: Tim Moon <[email protected]> * Debug compilation error with CUDA 12.1 for Turing Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * resolve mr comments, fix activation cast fusion Signed-off-by: zhongboz <[email protected]> * resolve comments, add NVTEQuantizationParams for compute scale Signed-off-by: zhongboz <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove is_current_scaling check totally from common folder Signed-off-by: zhongboz <[email protected]> * remove benchmarks, will contribute in another repo Signed-off-by: zhongboz <[email protected]> * adjust cs default recipe config Signed-off-by: zhongboz <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adjust comments in test Signed-off-by: zhongboz <[email protected]> * Remove current scaling mode from core lib Signed-off-by: Tim Moon <[email protected]> * Refactor current-scaling-specific logic in core C++ lib Move amax and scale update functions out of casting functions, and put into dedicated current-scaling source file. Add general API for accessing quantization config object. Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add missing header in C++ tests Signed-off-by: Tim Moon <[email protected]> * Disable test config with FP8 transpose on Blackwell Signed-off-by: Tim Moon <[email protected]> * Fix compilation error in C++ test Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: zhongboz <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: zhongboz <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tim Moon <[email protected]> Co-authored-by: Tim Moon <[email protected]>
- Loading branch information