[Neo][vLLM] Accept quant options for awq, fp8 #2382

a-ys · 2024-09-12T23:44:40Z

Description

This PR adds additional pass-through options to configure awq & fp8 quantization.

For AWQ, we add the following options. These map to the options defined here.

option.awq_zero_point: toggles zero point quantization
option.awq_block_size: (existing field) group/block size for awq quantization
option.awq_weight_bit_width: bit width for quantization. currently only 4 is supported.
option.awq_mm_version: awq matmul implementation
option.awq_ignore_layers: layers to ignore during quantization

For FP8, we add the following options. These are defined here.

option.fp8_activation_scheme: static or dynamic activation scaling factors
option.fp8_kv_cache_quant_targets: modules to target for kv cache quantization (currently unused)
option.fp8_ignore_patterns: layers to ignore
option.calib_size: (existing field) number of samples for activation scales calibration

For fields that are read by the underlying library as a list of strings, we accept them like so:

option.fp8_kv_cache_quant_targets=k_proj, v_proj

Also removes previously implemented configuration options for FP8 as added in the original PR (#2272)

siddvenk · 2024-09-13T00:05:05Z

Changes lgtm, but i don't have much context. Have these been tested? do we have tests in lmi-distro?

a-ys · 2024-09-13T00:25:24Z

Have done some tests locally to test that the parsing works properly and the default workflow (no options) is still functional. Working on expanded CI tests right now for lmi distro

…

On Thu, Sep 12, 2024, 5:05 PM Siddharth Venkatesan ***@***.***> wrote: Changes lgtm, but i don't have much context. Have these been tested? do we have tests in lmi-distro? — Reply to this email directly, view it on GitHub <#2382 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJRYM5IMIQLHYLTC6IK3CQDZWIT4NAVCNFSM6AAAAABOEITZOOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBXGQZTKMJYG4> . You are receiving this because you authored the thread.Message ID: ***@***.***>

siddvenk · 2024-09-13T00:28:02Z

Can you make a cherry-pick PR for this on the 0.29.0-dlc branch as well? At the very least let's ensure no regression with our automated tests here before the patch release

a-ys · 2024-09-13T00:36:22Z

Sounds good, will do.

…

On Thu, Sep 12, 2024, 5:28 PM Siddharth Venkatesan ***@***.***> wrote: Can you make a cherry-pick PR for this on the 0.29.0-dlc branch as well? At the very least let's ensure no regression with our automated tests here before the patch release — Reply to this email directly, view it on GitHub <#2382 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJRYM5KFGSDGHCALZ55VZHLZWIWSRAVCNFSM6AAAAABOEITZOOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBXGU4DQNBUHA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

…2384)

[Neo][vLLM] Accept quant options for awq, fp8

de85f88

a-ys requested review from zachgk, frankfliu and a team as code owners September 12, 2024 23:44

siddvenk approved these changes Sep 13, 2024

View reviewed changes

siddvenk merged commit 5c46335 into deepjavalibrary:master Sep 13, 2024
9 checks passed

a-ys added a commit to a-ys/djl-serving-1 that referenced this pull request Sep 13, 2024

[Neo][vLLM] Accept quant options for awq, fp8 (deepjavalibrary#2382)

f037b32

siddvenk pushed a commit that referenced this pull request Sep 13, 2024

[cherry-pick] [Neo][vLLM] Accept quant options for awq, fp8 (#2382) (#…

d6a9798

…2384)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Neo][vLLM] Accept quant options for awq, fp8 #2382

[Neo][vLLM] Accept quant options for awq, fp8 #2382

a-ys commented Sep 12, 2024 •

edited

Loading

siddvenk commented Sep 13, 2024

a-ys commented Sep 13, 2024 via email

siddvenk commented Sep 13, 2024

a-ys commented Sep 13, 2024 via email

[Neo][vLLM] Accept quant options for awq, fp8 #2382

[Neo][vLLM] Accept quant options for awq, fp8 #2382

Conversation

a-ys commented Sep 12, 2024 • edited Loading

Description

siddvenk commented Sep 13, 2024

a-ys commented Sep 13, 2024 via email

siddvenk commented Sep 13, 2024

a-ys commented Sep 13, 2024 via email

a-ys commented Sep 12, 2024 •

edited

Loading