Support fp8 w8a8 for pt backend #2959

RunningLeon · 2024-12-26T07:27:41Z

Motivation

Support fp8 w8a8 for pt backend

Modification

Support fp8 w8a8 for pt backend

BC-breaking (Optional)

None

Use cases (Optional)

fp8 quant

lmdeploy lite smooth_quant \
  meta-llama/Meta-Llama-3-8B-Instruct \
  --work-dir Meta-Llama-3-8B-Instruct-fp8-w8a8 \
  --quant-dtype fp8

chat

lmdeploy  chat Meta-Llama-3-8B-Instruct-fp8-w8a8 --backend pytorch

Checklist

Pre-commit or other linting tools are used to fix the potential lint issues.
The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
The documentation has been modified accordingly, like docstring or example tutorials.

docs/zh_cn/quantization/w8a8.md

lmdeploy/pytorch/kernels/cuda/w8a8_triton_kernels.py

AllentDan · 2024-12-27T06:18:12Z

lmdeploy/pytorch/kernels/cuda/w8a8_triton_kernels.py

-        accumulator += tl.dot(a, b)
+        a = tl.load(a_ptrs, mask=offs_k[None, :] < K - k * BLOCK_K, other=None)
+        b = tl.load(b_ptrs, mask=offs_k[:, None] < K - k * BLOCK_K, other=None)
+        accumulator = tl.dot(a, b, accumulator, out_dtype=ACCUMULATOR_DTYPE)


Does it help improve our other kernels that use tl.dot too? @grimoire

Most gemm kernels should be benefited from this.

RunningLeon and others added 4 commits December 26, 2024 12:12

support w8a8 smooth_quant and loading

d663d4c

optimize int8

77f5fb8

fix fp8 kernels

a1081b3

update docs for w8a8

e71c4cc

lvhan028 requested review from grimoire and AllentDan December 26, 2024 08:41

grimoire reviewed Dec 27, 2024

View reviewed changes

docs/zh_cn/quantization/w8a8.md Outdated Show resolved Hide resolved

grimoire approved these changes Dec 27, 2024

View reviewed changes

AllentDan reviewed Dec 27, 2024

View reviewed changes

resolve comments

2dba1a7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support fp8 w8a8 for pt backend #2959

Support fp8 w8a8 for pt backend #2959

RunningLeon commented Dec 26, 2024

AllentDan Dec 27, 2024

grimoire Dec 27, 2024

Support fp8 w8a8 for pt backend #2959

Are you sure you want to change the base?

Support fp8 w8a8 for pt backend #2959

Conversation

RunningLeon commented Dec 26, 2024

Motivation

Modification

BC-breaking (Optional)

Use cases (Optional)

fp8 quant

chat

Checklist

AllentDan Dec 27, 2024

Choose a reason for hiding this comment

grimoire Dec 27, 2024

Choose a reason for hiding this comment