Skip to content

Latest commit

 

History

History
68 lines (51 loc) · 3.35 KB

doclist.rst

File metadata and controls

68 lines (51 loc) · 3.35 KB

Developer Documentation

Read the following material as you learn how to use Neural Compressor.

Get Started

  • Transform introduces how to utilize Neural Compressor's built-in data processing and how to develop a custom data processing method.
  • Dataset introduces how to utilize Neural Compressor's built-in dataset and how to develop a custom dataset.
  • Metrics introduces how to utilize Neural Compressor's built-in metrics and how to develop a custom metric.
  • UX is a web-based system used to simplify Neural Compressor usage.
  • Intel oneAPI AI Analytics Toolkit Get Started Guide explains the AI Kit components, installation and configuration guides, and instructions for building and running sample apps.
  • AI and Analytics Samples includes code samples for Intel oneAPI libraries.
.. toctree::
    :maxdepth: 1
    :hidden:

    transform.md
    dataset.md
    metric.md
    ux.md
    Intel oneAPI AI Analytics Toolkit Get Started Guide <https://software.intel.com/content/www/us/en/develop/documentation/get-started-with-ai-linux/top.html>
    AI and Analytics Samples <https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics>


Deep Dive

  • Quantization are processes that enable inference and training by performing computations at low-precision data types, such as fixed-point integers. Neural Compressor supports Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT). Note that Dynamic Quantization currently has limited support.
  • Pruning provides a common method for introducing sparsity in weights and activations.
  • Benchmarking introduces how to utilize the benchmark interface of Neural Compressor.
  • Mixed precision introduces how to enable mixed precision, including BFP16 and int8 and FP32, on Intel platforms during tuning.
  • Graph Optimization introduces how to enable graph optimization for FP32 and auto-mixed precision.
  • Model Conversion <model_conversion.md> introduces how to convert TensorFlow QAT model to quantized model running on Intel platforms.
  • TensorBoard provides tensor histograms and execution graphs for tuning debugging purposes.
.. toctree::
    :maxdepth: 1
    :hidden:

    Quantization.md
    PTQ.md
    QAT.md
    dynamic_quantization.md
    pruning.md
    benchmark.md
    mixed_precision.md
    graph_optimization.md
    model_conversion.md
    tensorboard.md


Advanced Topics

  • Adaptor is the interface between Neural Compressor and framework. The method to develop adaptor extension is introduced with ONNX Runtime as example.
  • Tuning strategies can automatically optimized low-precision recipes for deep learning models to achieve optimal product objectives like inference performance and memory usage with expected accuracy criteria. The method to develop a new strategy is introduced.
.. toctree::
    :maxdepth: 1
    :hidden:

    adaptor.md
    tuning_strategies.md