Read the following material as you learn how to use Neural Compressor.
- Transform introduces how to utilize Neural Compressor's built-in data processing and how to develop a custom data processing method.
- Dataset introduces how to utilize Neural Compressor's built-in dataset and how to develop a custom dataset.
- Metrics introduces how to utilize Neural Compressor's built-in metrics and how to develop a custom metric.
- UX is a web-based system used to simplify Neural Compressor usage.
- Intel oneAPI AI Analytics Toolkit Get Started Guide explains the AI Kit components, installation and configuration guides, and instructions for building and running sample apps.
- AI and Analytics Samples includes code samples for Intel oneAPI libraries.
.. toctree:: :maxdepth: 1 :hidden: transform.md dataset.md metric.md ux.md Intel oneAPI AI Analytics Toolkit Get Started Guide <https://software.intel.com/content/www/us/en/develop/documentation/get-started-with-ai-linux/top.html> AI and Analytics Samples <https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics>
- Quantization are processes that enable inference and training by performing computations at low-precision data types, such as fixed-point integers. Neural Compressor supports Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT). Note that Dynamic Quantization currently has limited support.
- Pruning provides a common method for introducing sparsity in weights and activations.
- Benchmarking introduces how to utilize the benchmark interface of Neural Compressor.
- Mixed precision introduces how to enable mixed precision, including BFP16 and int8 and FP32, on Intel platforms during tuning.
- Graph Optimization introduces how to enable graph optimization for FP32 and auto-mixed precision.
- Model Conversion <model_conversion.md> introduces how to convert TensorFlow QAT model to quantized model running on Intel platforms.
- TensorBoard provides tensor histograms and execution graphs for tuning debugging purposes.
.. toctree:: :maxdepth: 1 :hidden: Quantization.md PTQ.md QAT.md dynamic_quantization.md pruning.md benchmark.md mixed_precision.md graph_optimization.md model_conversion.md tensorboard.md
- Adaptor is the interface between Neural Compressor and framework. The method to develop adaptor extension is introduced with ONNX Runtime as example.
- Tuning strategies can automatically optimized low-precision recipes for deep learning models to achieve optimal product objectives like inference performance and memory usage with expected accuracy criteria. The method to develop a new strategy is introduced.
.. toctree:: :maxdepth: 1 :hidden: adaptor.md tuning_strategies.md