How can I know the QuantizationConfig when do PTQ in my own model ? #1305

CYL0089 · 2024-12-31T07:59:11Z

Issue Type

Performance

Source

source

MCT Version

2.2.2

OS Platform and Distribution

No response

Python version

No response

Describe the issue

when do "Post-Training Quantization using MCT" , there are many parameters can adjust:

I want to know how I can find the most suitable value of them?

Expected behaviour

No response

Code to reproduce the issue

class QuantizationErrorMethod(Enum):
    """
    Method for quantization threshold selection:

    NOCLIPPING - Use min/max values as thresholds.

    MSE - Use mean square error for minimizing quantization noise.

    MAE - Use mean absolute error for minimizing quantization noise.

    KL - Use KL-divergence to make signals distributions to be similar as possible.

    Lp - Use Lp-norm to minimizing quantization noise.

    HMSE - Use Hessian-based mean squared error for minimizing quantization noise. This method is using Hessian scores to factorize more valuable parameters when computing the error induced by quantization.

    """

    NOCLIPPING = 0
    MSE = 1
    MAE = 2
    KL = 4
    LP = 5
    HMSE = 6


@dataclass
class QuantizationConfig:
    """
    A class that encapsulates all the different parameters used by the library to quantize a model.

    Examples:
        You can create a quantization configuration to apply to a model. For example, to quantize a model's weights and
        activations using thresholds, with weight threshold selection based on MSE and activation threshold selection
        using NOCLIPPING (min/max), while enabling relu_bound_to_power_of_2 and weights_bias_correction,
        you can instantiate a quantization configuration like this:

        >>> import model_compression_toolkit as mct
        >>> qc = mct.core.QuantizationConfig(activation_error_method=mct.core.QuantizationErrorMethod.NOCLIPPING, weights_error_method=mct.core.QuantizationErrorMethod.MSE, relu_bound_to_power_of_2=True, weights_bias_correction=True)


        The QuantizationConfig instance can then be used in the quantization workflow,
        such as with Keras in the function: :func:~model_compression_toolkit.ptq.keras_post_training_quantization`.

    """

    activation_error_method: QuantizationErrorMethod = QuantizationErrorMethod.MSE
    weights_error_method: QuantizationErrorMethod = QuantizationErrorMethod.MSE
    relu_bound_to_power_of_2: bool = False
    weights_bias_correction: bool = True
    weights_second_moment_correction: bool = False
    input_scaling: bool = False
    softmax_shift: bool = False
    shift_negative_activation_correction: bool = True
    activation_channel_equalization: bool = False
    z_threshold: float = math.inf
    min_threshold: float = MIN_THRESHOLD
    l_p_value: int = 2
    linear_collapsing: bool = True
    residual_collapsing: bool = True
    shift_negative_ratio: float = 0.05
    shift_negative_threshold_recalculation: bool = False
    shift_negative_params_search: bool = False
    concat_threshold_update: bool = False
    activation_bias_correction: bool = False
    activation_bias_correction_threshold: float = 0.0


# Default quantization configuration the library use.
DEFAULTCONFIG = QuantizationConfig(QuantizationErrorMethod.MSE, QuantizationErrorMethod.MSE,
                                   relu_bound_to_power_of_2=False, weights_bias_correction=True,
                                   weights_second_moment_correction=False, input_scaling=False, softmax_shift=False)

Log output

No response

CYL0089 · 2024-12-31T08:57:17Z

Post training quantization with PyTorch.
Looking forward to your reply, thank you!

ofirgo · 2024-12-31T09:34:47Z

Hi @CYL0089 ,

Thank you for the question and for using MCT.
Defining the QuantizationConfig depends on what you're trying to achieve and the model that you are trying to compress.
The simplest approach is to use the default config, which is automatically set when running PTQ without providing a specific config. This should give you a descent result.

In addition, if you want to enable/disable some of the available features that might be valuable for your own model compression, you can set and provide your own configuration.
You can find more information about the different features that are available via the quantization config in our documentation.
Feel free to ask about any of the options for more details!

You can also find examples and deep dive on some of the features in our tutorials (see 1, 2). Note that these are written in Keras, but the demonstrated features are available for Pytorch as well.

Contact us if you have any more questions.

Ofir

CYL0089 · 2025-01-03T06:44:16Z

Thank you very much! I' ll follow your tutorials and try some improvment.

ofirgo assigned elad-c and ofirgo and unassigned elad-c Dec 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I know the QuantizationConfig when do PTQ in my own model ? #1305

How can I know the QuantizationConfig when do PTQ in my own model ? #1305

CYL0089 commented Dec 31, 2024

CYL0089 commented Dec 31, 2024 •

edited

Loading

ofirgo commented Dec 31, 2024

CYL0089 commented Jan 3, 2025

How can I know the QuantizationConfig when do PTQ in my own model ? #1305

How can I know the QuantizationConfig when do PTQ in my own model ? #1305

Comments

CYL0089 commented Dec 31, 2024

Issue Type

Source

MCT Version

OS Platform and Distribution

Python version

Describe the issue

Expected behaviour

Code to reproduce the issue

Log output

CYL0089 commented Dec 31, 2024 • edited Loading

ofirgo commented Dec 31, 2024

CYL0089 commented Jan 3, 2025

CYL0089 commented Dec 31, 2024 •

edited

Loading