INC3 common module design collections #1445

yiliu30 · 2023-12-05T09:07:06Z

yiliu30
Dec 5, 2023
Collaborator

Note: INC3 is currently under heavy development, APIs are subject to change.

INC3 `autotune` Module Design

Goal

Enable users to easily utilize the predefined tuning space and tuning order with just a few lines of code.
Allow users to tune a single algorithm or combine multiple algorithms.
Empower developers to register framework-specific tuning space and tuning order.
Provide developers with the flexibility to customize the framework-specific autotune pipeline according to their needs.

Design Overview

Maintain a thin autotune layer that offers the common components to drive the tuning pipeline while allowing framework developers to customize it. These components include:
- TuningConfig: Used by users to set tuning spaces, tuning order, and stop conditions.
- TuningLogger: Record the tuning process log, facilitating the automatic collection of tuning results by validation teams.
- ConfigLoader: Takes the config set and sampler, yielding quantization configuration one by one.
- TuningMonitor: Records trial information and provides interfaces to check stop conditions.
- Evaluator: Wraps user-provided evaluation functions into a unified interface to obtain the final evaluation score.

The `autotune` API

One of the major changes in INC 3 is the separation of quantization and autotune into two distinct APIs. Quantization APIs align with stock frameworks. The autotune is the tuning interface used by all framework extensions, with framework-specific arguments. The autotune API accepts two common arguments: tune_config and eval_fns.

def autotune(
    model,
    tune_config: TuningConfig,
    eval_fns: Optional[Union[Callable, Dict, List[Dict]]] = None,
    ...
) -> Optional[torch.nn.Module]:
    """The main entry of auto-tune."""

Usage Examples

# User scripts
from typing import Union
import torch

class UserModel(torch.nn.Module):
    def __init__(self, dims):
        self.fc1 = torch.nn.Linear(dims, dims)
        self.fc2 = torch.nn.Linear(dims, dims)
        self.fc3 = torch.nn.Linear(dims, dims)

    def forward(self, x):
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        return x


def build_simple_model(dims=10):
    return UserModel(dims)


def eval_fn(model) -> Union[float, int]:
    # Define a function to evaluate the model and return a score(float or int).
    # ...
    return 1.0


def calib_fn(model):
    #  Define a function to calibrate the model.
    # ...
    pass


# 1. Use the default tuning space.
from neural_compressor.torch.quantization import GPTQConfig, RTNConfig, TuningConfig, autotune

float_model = build_simple_model()
rtn_default_config_set = RTNConfig.get_config_set_for_tuning()
q_model = autotune(model=float_model, tune_config=TuningConfig(rtn_default_config_set), eval_fns=eval_fn)

# 2.1 Customize the tuning space for a single algorithm.
float_model = build_simple_model()
customize_config_set = TuningConfig(config_set=RTNConfig(bits=[4, 8]))
q_model = autotune(model=float_model, tune_config=TuningConfig(customize_config_set), eval_fns=eval_fn)

# 2.2 Combine multiple algorithms into the tuning space.
float_model = build_simple_model()
customize_config_set = [RTNConfig(bits=[4, 8]), GPTQConfig(bits=[4, 8])]
q_model = autotune(model=float_model, tune_config=TuningConfig(customize_config_set), run_fn=calib_fn, eval_fns=eval_fn)

Advantage Topics [WIP]

Define multiple evaluation functions
The relationship between ConfigLoader, Sampler, ConfigSet, and XxxAgloConfig.
Customize the tuning order

-- END

yiliu30 · 2023-12-26T03:13:12Z

yiliu30
Dec 26, 2023
Collaborator Author

Notes

Integration Example:
- With DeepSpeed
Consider TP/DP/PP for eval/calibrtate/prepare/quantize
Refine the naming of Runner
Support register different objectives
Decompose the quantize into calibrtate/prepare/quantize

0 replies

yiliu30 · 2023-12-26T03:14:14Z

yiliu30
Dec 26, 2023
Collaborator Author

Task List

1st PR Introduce the overview of the auto-tune pipeline and main interface. Introduce INC3.0 auto-tune #1497

Main components and their declaration, relationship: Tuner, TaskManager, FrameworkWrapper, and TorchWrapper
Tune-related config BaseTuningConfig, TuneConfig

2nd PR

Merge user config with framework registered config
Expand the config with local
Register the default tuning order, which needs input from fwk owners

3rd PR

Docs

0 replies

yiliu30 · 2024-01-12T03:22:10Z

yiliu30
Jan 12, 2024
Collaborator Author

About the tuning space and tuning order.

Case 1. Use the default tuning space and tuning order

# for rtn
default_rtn_tuning_config = TuningConfig(quant_configs=get_default_rtn_quant_configs(), max_trials=100)

Case 2. The user specifies the tuning space and tuning order

The concept of sampler: a sampler takes a set of quant configs and yields the config for one quantization process with a specific order.

User customize the tuning space(use the default sampler to sample the quant_config)

customized_rtn_configs = RTNWeightQuantConfig(weight_bits=[2, 4, 6, 8])
tuning_config = TuningConfig(quant_configs=customized_rtn_configs, max_trials=100)

User customizes both tuning space and tuning order (sampler)

def customized_sampler(config: RTNWeightQuantConfig) -> List[RTNWeightQuantConfig]:
    ...

customized_rtn_configs = RTNWeightQuantConfig(weight_bits=[2, 4, 6, 8])
customized_rtn_configs.set_sampler(customized_sampler)
tuning_config = TuningConfig(quant_configs=customized_rtn_configs, max_trials=100)

3 replies

yiliu30 Jan 12, 2024
Collaborator Author

One item that may confuse users is that:

RTNWeightQuantConfig(weight_bits=2) used by the quantize API represents one trial
RTNWeightQuantConfig(weight_bits=[2, 4, 6, 8]) used by the autotune API represents four trials

yiliu30 Jan 12, 2024
Collaborator Author

@ftian1 please have a look, thank you.

yiliu30 Jan 15, 2024
Collaborator Author

After syncing with @ftian1 offline, here is the summary:

Keep using one class RTNWeightQuantConfig in both quantize and autotune APIs.

RTNWeightQuantConfig(weight_bits=2) #used by the quantize API represents one trial
RTNWeightQuantConfig(weight_bits=[2, 4, 6, 8]) #used by the autotune API represents four trials

The agrs of TuningConfig:

Add sampler for sampling config from space
Rename quant_configs to tuning_space

tuning_space = get_default_tuning_space() 
# return all built-in spaces  {rtn_tuning_space, gptq_tuning_space, sq_tuning_space, static_tuning_space}
tuning_config = TuningConfig(tuning_space=tuning_space, max_trials=100, sampler:Optional[Sampler])

yiliu30 · 2024-01-15T03:10:04Z

yiliu30
Jan 15, 2024
Collaborator Author

ConfigRegistry

@register_config(framework_name=FRAMEWORK_NAME, algo_name=GPTQ, priority=100)

#registered_configs
class ConfigRegistry:
    
    
FRAMEWORK_NAME:
    ALGORITHM_NAME:
        PRIORITY
        CLS
        
config_registry = ConfigRegistry

config_registry.get_all_configs()
config_registry.get_sorted_configs()


1. add `priority` into `register_config`
2. replace `registered_configs` with `config_registry`

2 replies

yiliu30 Jan 16, 2024
Collaborator Author

@Kaihui-intel , please add the PR link here.

Kaihui-intel Jan 16, 2024
Collaborator

#1543

yiliu30 · 2024-01-15T08:07:20Z

yiliu30
Jan 15, 2024
Collaborator Author

About the implementation details of `Tuning`

def autotune(...):
    ....
    #####  Option 1.  Original Impl
    tuning_objectives.set_eval_fn_registry(eval_fns)
    torch_wrapper = TorchWrapper(model, run_fn, run_args)
    tuner = Tuner(tune_config=tune_config, tuning_objectives=tuning_objectives, fwk_wrapper=torch_wrapper)
    best_qmodel = tuner.search()
    return  best_qmodel
    
    #####  Option 2. New impl
    from neural_compressor.torch import quantize
    from nueral_compressor.common import init_tuning
    sampler, evaluator, recorder = init_tuning(tune_config)
    for quant_config: BaseConfig in sampler.next_config():
        q_model = quantize(model=model, quant_config=quant_config, run_fn=run_fn, run_args=run_args)
        eval_result = evaluator.eval(q_model)
        recorder.add_new_trial(quant_config, eval_result)
        if recorder.needs_stop():
            best_qmodel = recorder.get_best_qmodel()
            return best_qmodel
    return best_qmodel

6 replies

yiliu30 Jan 16, 2024
Collaborator Author

Need your comments @xin3he @mengniwang95 @zehao-intel , thanks :).

yiliu30 Jan 16, 2024
Collaborator Author

Option 1 (React Here)

yiliu30 Jan 16, 2024
Collaborator Author

Option 2 (React Here)

yiliu30 Jan 16, 2024
Collaborator Author

After offline syncing, Option 2 was selected, and PR will be added soon.

yiliu30 Jan 17, 2024
Collaborator Author

#1528

yiliu30 · 2024-01-16T05:46:25Z

yiliu30
Jan 16, 2024
Collaborator Author

Flag indicating the end of quantization(For CI)

1 reply

yiliu30 Jan 17, 2024
Collaborator Author

#1528

yiliu30 · 2024-01-17T06:10:11Z

yiliu30
Jan 17, 2024
Collaborator Author

Workspace

Add workspace at common
We should port the workspace in 2.x into 3.x common.

2 replies

yiliu30 Jan 17, 2024
Collaborator Author

@Kaihui-intel please have a look.

Kaihui-intel Jan 19, 2024
Collaborator

#1553

yiliu30 · 2024-01-17T06:58:08Z

yiliu30
Jan 17, 2024
Collaborator Author

Enhance the `TuningConfig`

Add into tolerable_loss into TuningConfig

Currently, we only support stopping the tuning process when max_Trials is reached. Add support for stopping when the accuracy target is reached.

neural-compressor/neural_compressor/config.py

Lines 502 to 511 in 9ec22c9

    
           class AccuracyCriterion: 
        
               """Class of Accuracy Criterion. 
        
               Args: 
        
                   higher_is_better(bool, optional): This flag indicates whether the metric higher is the better. 
        
                                                     Default value is True. 
        
                   criterion:(str, optional): This flag indicates whether the metric loss is "relative" or "absolute". 
        
                                              Default value is "relative". 
        
                   tolerable_loss(float, optional): This float indicates how much metric loss we can accept. 
        
                                                    Default value is 0.01.

4 replies

yiliu30 Jan 17, 2024
Collaborator Author

Depends on #1528

yiliu30 Jan 17, 2024
Collaborator Author

To clarify, we can only add tolerable_loss as the other two args can be covered by it.

yiliu30 Jan 26, 2024
Collaborator Author

class TuningConfig:
    """Base Class for Tuning Criterion.
    def __init__(self, config_set=None, timeout=0, max_trials=100, sampler: Sampler = None, tolerable_loss = 0.01) -> None:
        """Init a TuneCriterion object."""
        self.config_set = config_set
        self.timeout = timeout
        self.max_trials = max_trials
        self.sampler = sampler
        self.tolerable_loss = tolerable_loss  # <-- New added

Kaihui-intel Jan 26, 2024
Collaborator

#1579

yiliu30 · 2024-01-18T00:18:58Z

yiliu30
Jan 18, 2024
Collaborator Author

Default tuning space

Enable the default_tuning_space

1 reply

yiliu30 Jan 24, 2024
Collaborator Author

#1562

yiliu30 · 2024-01-18T00:30:18Z

yiliu30
Jan 18, 2024
Collaborator Author

Tasks List https://github.com/orgs/intel/projects/54/views/1?filterQuery=

0 replies

yiliu30 · 2024-01-18T08:57:29Z

yiliu30
Jan 18, 2024
Collaborator Author

Divide the `params_list` into model-level, op-type-level, op-level

Need a solution

0 replies

yiliu30 · 2024-01-24T03:55:32Z

yiliu30
Jan 24, 2024
Collaborator Author

Idea about simplifying the CI for common components.

# torch/3x/torch/test_common.py
Tests for common components.
Owner(s): ["module: common & auto-tune"]

These tests aim to assess the fundamental functionalities of common components and enhance code coverage. 
Currently, there are three replicas in each framework test folder. We may organize them into individual folders
like 'test/3x/common' and update the CI scripts to include them in each framework's CI.

The folder structure:
.
├── 3x
│   ├── common  # <---- New added
│   ├── onnxrt
│   ├── tensorflow
│   └── torch

For each fwk CI:

onnxrt_included_folder:
    ├── 3x
    │   ├── common
    │   ├── onnxrt
    
tensorflow_included_folder:
    ├── 3x
    │   ├── common
    │   ├── tensorflow


torch_included_folder:
    ├── 3x
    │   ├── common
    │   ├── torch

3 replies

yiliu30 Jan 24, 2024
Collaborator Author

@chensuyue @XuehaoSun Please have a look. Thanks :)

chensuyue Jan 24, 2024
Maintainer

I can update the CI test logical when the new structure submitted.

yiliu30 Jan 25, 2024
Collaborator Author

#1575

yiliu30 · 2024-01-26T07:20:22Z

yiliu30
Jan 26, 2024
Collaborator Author

Folder structure of INC 3.X

!!! Avoid creating a folder for just a single file !!!

├── fwk_name
│   ├── __init__.py
│   ├── quantization
│   │   ├── algorithm_entry.py
│   │   ├── autotune.py
│   │   ├── config.py
│   │   ├── __init__.py
│   │   └── quantize.py
│   ├── algorithms
│   │   ├── __init__.py
│   │   ├── smooth_quant
│   │   │   ├── __init__.py
│   │   │   ├── smooth_quant.py
│   │   │   └── utility.py
│   │   ├── static_quant
│   │   │   ├── __init__.py
│   │   │   ├── static_quant.py
│   │   │   └── utility.py
│   │   └── weight_only
│   │       ├── gptq.py
│   │       ├── __init__.py
│   │       └── rtn.py
│   └── utils
│       ├── constants.py
│       ├── __init__.py
│       └── utility.py
└── __init__.py

# * Note some code snippets
# neural_compressor/fwk_name/quantization/algorithm_entry.py
@register_algo(RTN)
def rtn_algo_entry()
    from neural_compressor.fwk_name.algorithms import rtn
    ...

@register_algo(SMOOTH_QUANT)
def smooth_quant_entry():
    from neural_compressor.fwk_name.algorithms import smooth_quant
    ...

1 reply

yiliu30 Jan 26, 2024
Collaborator Author

@ftian1 @mengniwang95 @xin3he @zehao-intel

Kaihui-intel · 2024-01-26T08:08:48Z

Kaihui-intel
Jan 26, 2024
Collaborator

Enhance autotune UT to replace checking logs

0 replies

yiliu30 · 2024-01-30T10:55:20Z

yiliu30
Jan 30, 2024
Collaborator Author

Separate Smooth and Quant from smooth quantization [WIP]

# Option 1 (Currently implementation)
from neural_compressor.torch.quantization import SmoothQuantConfig, quantize
sq_config = SmoothQuantConfig(alpha=0.5)
q_model = quantize(model=float_model, quant_config=sq_config)


# Option 2 (Recommended for v2.5)
# usage 1
from neural_compressor.torch.quantization import SmoothConfig, StaticQuantConfig, quantize
sq_config = SmoothConfig(alpha=0.5)
static_config = StaticQuantConfig(w_sym=False, w_algo="minmax", white_list=["linear1"])
q_model = quantize(model=float_model, quant_config=[sq_config, static_config])

# usage 2
from neural_compressor.torch.quantization import quantize, get_default_smooth_quant_config
q_model = quantize(model=float_model, quant_config=get_default_smooth_quant_config())


# Option 3 (Recommended for further)
from neural_compressor.torch import SmoothConfig, optimize
sq_config = SmoothConfig(alpha=0.5)
# optimize: float model -> float model
optimized_model = optimize(model=float_model, optimize_config=composite_config)

from neural_compressor.torch import StaticQuantConfig, quantize
# quantize: float model -> quantized model
static_config = StaticQuantConfig(w_sym=False, w_algo="minmax", white_list=["linear1"])
q_model = quantize(model=optimized_model, quant_config=static_config)

0 replies

yiliu30 · 2024-02-02T05:47:56Z

yiliu30
Feb 2, 2024
Collaborator Author

Unify and extend `register_algo`

Unify the register_algo
Clean code, import from common

def register_algo(name: str):

DEFAULT_BACKEND = "default_backend"
def register_algo(name: str , backend: str =DEFAULT_BACKEND):
    ...

0 replies

yiliu30 · 2024-02-28T08:51:44Z

yiliu30
Feb 28, 2024
Collaborator Author

0 replies

INC3 common module design collections #1445

yiliu30 Dec 5, 2023 Collaborator

INC3 autotune Module Design

Goal

Design Overview

The autotune API

Usage Examples

Advantage Topics [WIP]

Replies: 17 comments · 23 replies

yiliu30 Dec 26, 2023 Collaborator Author

yiliu30 Dec 26, 2023 Collaborator Author

Task List

yiliu30 Jan 12, 2024 Collaborator Author

About the tuning space and tuning order.

Case 1. Use the default tuning space and tuning order

Case 2. The user specifies the tuning space and tuning order

yiliu30 Jan 12, 2024 Collaborator Author

yiliu30 Jan 12, 2024 Collaborator Author

yiliu30 Jan 15, 2024 Collaborator Author

yiliu30 Jan 15, 2024 Collaborator Author

ConfigRegistry

yiliu30 Jan 16, 2024 Collaborator Author

Kaihui-intel Jan 16, 2024 Collaborator

yiliu30 Jan 15, 2024 Collaborator Author

About the implementation details of Tuning

yiliu30 Jan 16, 2024 Collaborator Author

yiliu30 Jan 16, 2024 Collaborator Author

yiliu30 Jan 16, 2024 Collaborator Author

yiliu30 Jan 16, 2024 Collaborator Author

yiliu30 Jan 17, 2024 Collaborator Author

yiliu30 Jan 16, 2024 Collaborator Author

yiliu30 Jan 17, 2024 Collaborator Author

yiliu30 Jan 17, 2024 Collaborator Author

Workspace

yiliu30 Jan 17, 2024 Collaborator Author

Kaihui-intel Jan 19, 2024 Collaborator

yiliu30 Jan 17, 2024 Collaborator Author

Enhance the TuningConfig

yiliu30 Jan 17, 2024 Collaborator Author

yiliu30 Jan 17, 2024 Collaborator Author

yiliu30 Jan 26, 2024 Collaborator Author

Kaihui-intel Jan 26, 2024 Collaborator

yiliu30 Jan 18, 2024 Collaborator Author

Default tuning space

yiliu30 Jan 24, 2024 Collaborator Author

yiliu30 Jan 18, 2024 Collaborator Author

yiliu30 Jan 18, 2024 Collaborator Author

Divide the params_list into model-level, op-type-level, op-level

yiliu30 Jan 24, 2024 Collaborator Author

Idea about simplifying the CI for common components.

yiliu30 Jan 24, 2024 Collaborator Author

chensuyue Jan 24, 2024 Maintainer

yiliu30 Jan 25, 2024 Collaborator Author

yiliu30 Jan 26, 2024 Collaborator Author

Folder structure of INC 3.X

!!! Avoid creating a folder for just a single file !!!

yiliu30 Jan 26, 2024 Collaborator Author

Kaihui-intel Jan 26, 2024 Collaborator

Enhance autotune UT to replace checking logs

yiliu30 Jan 30, 2024 Collaborator Author

Separate Smooth and Quant from smooth quantization [WIP]

yiliu30 Feb 2, 2024 Collaborator Author

Unify and extend register_algo

yiliu30 Feb 28, 2024 Collaborator Author

yiliu30
Dec 5, 2023
Collaborator

INC3 `autotune` Module Design

The `autotune` API

Replies: 17 comments 23 replies

yiliu30
Dec 26, 2023
Collaborator Author

yiliu30
Dec 26, 2023
Collaborator Author

yiliu30
Jan 12, 2024
Collaborator Author

yiliu30 Jan 12, 2024
Collaborator Author

yiliu30 Jan 12, 2024
Collaborator Author

yiliu30 Jan 15, 2024
Collaborator Author

yiliu30
Jan 15, 2024
Collaborator Author

yiliu30 Jan 16, 2024
Collaborator Author

Kaihui-intel Jan 16, 2024
Collaborator

yiliu30
Jan 15, 2024
Collaborator Author

About the implementation details of `Tuning`

yiliu30 Jan 16, 2024
Collaborator Author

yiliu30 Jan 16, 2024
Collaborator Author

yiliu30 Jan 16, 2024
Collaborator Author

yiliu30 Jan 16, 2024
Collaborator Author

yiliu30 Jan 17, 2024
Collaborator Author

yiliu30
Jan 16, 2024
Collaborator Author

yiliu30 Jan 17, 2024
Collaborator Author

yiliu30
Jan 17, 2024
Collaborator Author

yiliu30 Jan 17, 2024
Collaborator Author

Kaihui-intel Jan 19, 2024
Collaborator

yiliu30
Jan 17, 2024
Collaborator Author

Enhance the `TuningConfig`

yiliu30 Jan 17, 2024
Collaborator Author

yiliu30 Jan 17, 2024
Collaborator Author

yiliu30 Jan 26, 2024
Collaborator Author

Kaihui-intel Jan 26, 2024
Collaborator

yiliu30
Jan 18, 2024
Collaborator Author

yiliu30 Jan 24, 2024
Collaborator Author

yiliu30
Jan 18, 2024
Collaborator Author

yiliu30
Jan 18, 2024
Collaborator Author

Divide the `params_list` into model-level, op-type-level, op-level

yiliu30
Jan 24, 2024
Collaborator Author

yiliu30 Jan 24, 2024
Collaborator Author

chensuyue Jan 24, 2024
Maintainer

yiliu30 Jan 25, 2024
Collaborator Author

yiliu30
Jan 26, 2024
Collaborator Author

yiliu30 Jan 26, 2024
Collaborator Author

Kaihui-intel
Jan 26, 2024
Collaborator

yiliu30
Jan 30, 2024
Collaborator Author

yiliu30
Feb 2, 2024
Collaborator Author

Unify and extend `register_algo`

yiliu30
Feb 28, 2024
Collaborator Author