Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Neo][vLLM] Accept quant options for awq, fp8 #2382

Merged
merged 1 commit into from
Sep 13, 2024

Conversation

a-ys
Copy link
Contributor

@a-ys a-ys commented Sep 12, 2024

Description

This PR adds additional pass-through options to configure awq & fp8 quantization.

For AWQ, we add the following options. These map to the options defined here.

  • option.awq_zero_point: toggles zero point quantization
  • option.awq_block_size: (existing field) group/block size for awq quantization
  • option.awq_weight_bit_width: bit width for quantization. currently only 4 is supported.
  • option.awq_mm_version: awq matmul implementation
  • option.awq_ignore_layers: layers to ignore during quantization

For FP8, we add the following options. These are defined here.

  • option.fp8_activation_scheme: static or dynamic activation scaling factors
  • option.fp8_kv_cache_quant_targets: modules to target for kv cache quantization (currently unused)
  • option.fp8_ignore_patterns: layers to ignore
  • option.calib_size: (existing field) number of samples for activation scales calibration

For fields that are read by the underlying library as a list of strings, we accept them like so:

option.fp8_kv_cache_quant_targets=k_proj, v_proj

Also removes previously implemented configuration options for FP8 as added in the original PR (#2272)

@a-ys a-ys requested review from zachgk, frankfliu and a team as code owners September 12, 2024 23:44
@siddvenk
Copy link
Contributor

Changes lgtm, but i don't have much context. Have these been tested? do we have tests in lmi-distro?

@a-ys
Copy link
Contributor Author

a-ys commented Sep 13, 2024 via email

@siddvenk
Copy link
Contributor

Can you make a cherry-pick PR for this on the 0.29.0-dlc branch as well? At the very least let's ensure no regression with our automated tests here before the patch release

@a-ys
Copy link
Contributor Author

a-ys commented Sep 13, 2024 via email

@siddvenk siddvenk merged commit 5c46335 into deepjavalibrary:master Sep 13, 2024
9 checks passed
a-ys added a commit to a-ys/djl-serving-1 that referenced this pull request Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants