Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I know the QuantizationConfig when do PTQ in my own model ? #1305

Open
CYL0089 opened this issue Dec 31, 2024 · 3 comments
Open

How can I know the QuantizationConfig when do PTQ in my own model ? #1305

CYL0089 opened this issue Dec 31, 2024 · 3 comments
Assignees

Comments

@CYL0089
Copy link

CYL0089 commented Dec 31, 2024

Issue Type

Performance

Source

source

MCT Version

2.2.2

OS Platform and Distribution

No response

Python version

No response

Describe the issue

when do "Post-Training Quantization using MCT" , there are many parameters can adjust:
image
image

I want to know how I can find the most suitable value of them?

Expected behaviour

No response

Code to reproduce the issue

class QuantizationErrorMethod(Enum):
    """
    Method for quantization threshold selection:

    NOCLIPPING - Use min/max values as thresholds.

    MSE - Use mean square error for minimizing quantization noise.

    MAE - Use mean absolute error for minimizing quantization noise.

    KL - Use KL-divergence to make signals distributions to be similar as possible.

    Lp - Use Lp-norm to minimizing quantization noise.

    HMSE - Use Hessian-based mean squared error for minimizing quantization noise. This method is using Hessian scores to factorize more valuable parameters when computing the error induced by quantization.

    """

    NOCLIPPING = 0
    MSE = 1
    MAE = 2
    KL = 4
    LP = 5
    HMSE = 6


@dataclass
class QuantizationConfig:
    """
    A class that encapsulates all the different parameters used by the library to quantize a model.

    Examples:
        You can create a quantization configuration to apply to a model. For example, to quantize a model's weights and
        activations using thresholds, with weight threshold selection based on MSE and activation threshold selection
        using NOCLIPPING (min/max), while enabling relu_bound_to_power_of_2 and weights_bias_correction,
        you can instantiate a quantization configuration like this:

        >>> import model_compression_toolkit as mct
        >>> qc = mct.core.QuantizationConfig(activation_error_method=mct.core.QuantizationErrorMethod.NOCLIPPING, weights_error_method=mct.core.QuantizationErrorMethod.MSE, relu_bound_to_power_of_2=True, weights_bias_correction=True)


        The QuantizationConfig instance can then be used in the quantization workflow,
        such as with Keras in the function: :func:~model_compression_toolkit.ptq.keras_post_training_quantization`.

    """

    activation_error_method: QuantizationErrorMethod = QuantizationErrorMethod.MSE
    weights_error_method: QuantizationErrorMethod = QuantizationErrorMethod.MSE
    relu_bound_to_power_of_2: bool = False
    weights_bias_correction: bool = True
    weights_second_moment_correction: bool = False
    input_scaling: bool = False
    softmax_shift: bool = False
    shift_negative_activation_correction: bool = True
    activation_channel_equalization: bool = False
    z_threshold: float = math.inf
    min_threshold: float = MIN_THRESHOLD
    l_p_value: int = 2
    linear_collapsing: bool = True
    residual_collapsing: bool = True
    shift_negative_ratio: float = 0.05
    shift_negative_threshold_recalculation: bool = False
    shift_negative_params_search: bool = False
    concat_threshold_update: bool = False
    activation_bias_correction: bool = False
    activation_bias_correction_threshold: float = 0.0


# Default quantization configuration the library use.
DEFAULTCONFIG = QuantizationConfig(QuantizationErrorMethod.MSE, QuantizationErrorMethod.MSE,
                                   relu_bound_to_power_of_2=False, weights_bias_correction=True,
                                   weights_second_moment_correction=False, input_scaling=False, softmax_shift=False)

Log output

No response

@CYL0089
Copy link
Author

CYL0089 commented Dec 31, 2024

Post training quantization with PyTorch.
Looking forward to your reply, thank you!

@ofirgo ofirgo assigned elad-c and ofirgo and unassigned elad-c Dec 31, 2024
@ofirgo
Copy link
Collaborator

ofirgo commented Dec 31, 2024

Hi @CYL0089 ,

Thank you for the question and for using MCT.
Defining the QuantizationConfig depends on what you're trying to achieve and the model that you are trying to compress.
The simplest approach is to use the default config, which is automatically set when running PTQ without providing a specific config. This should give you a descent result.

In addition, if you want to enable/disable some of the available features that might be valuable for your own model compression, you can set and provide your own configuration.
You can find more information about the different features that are available via the quantization config in our documentation.
Feel free to ask about any of the options for more details!

You can also find examples and deep dive on some of the features in our tutorials (see 1, 2). Note that these are written in Keras, but the demonstrated features are available for Pytorch as well.

Contact us if you have any more questions.

Ofir

@CYL0089
Copy link
Author

CYL0089 commented Jan 3, 2025

Thank you very much! I' ll follow your tutorials and try some improvment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants