Review and edit User Guide. #3734

dwelsch-esi · 2025-01-16T22:09:05Z

Edit all User Guide pages. Reorganize User Guide to guide users in applying quantization processes.

Signed-off-by: Dave Welsch <[email protected]>

dwelsch-esi · 2025-01-17T19:19:52Z

Docs/userguide/debugging_guidelines.rst

@@ -4,35 +4,30 @@
 Quantization debugging guidelines


Why is this still called "debugging"? Can we use a more accurate term, like "further quantization" or "secondary optimization"? It might help to explicitly define the goal here, something like: "To improve model accuracy to acceptable levels after application of standard quantization techniques."

At the very least I'd recommend calling it "troubleshooting", which is a less casual, more widely used terminology and is a more general term, which is appropriate here.

dwelsch-esi · 2025-01-17T19:32:23Z

Docs/userguide/debugging_guidelines.rst


 Debugging workflow
 ==================

-The steps are shown as a flow chart in the following figure and are described in more detail below:
+The steps are shown as a flow chart in the following figure and are described in more detail below.

 .. image:: ../images/debugging_guidelines_1.png


Some notes on the image:

This would make more efficient use of page space (at least on a laptop screen) if it were arranged horizontally instead of vertically. Ideally the text in the graphic should be big enough to read by default; right now I'm having to enlarge the picture to read the text.

The math symbols in the flowchart should be explained in the text, or better, eliminated in favor of step titles that match the text headings ("Quantizing weights vs activations", and so on). (Some abbreviation might be necessary.)

Terms in the flowchart should correspond to identifiable operations. Specifically, "fix" is a vague description that's used several times. Numbering the steps would help with this if they follow a definite sequence.

The boxes in the flowchart should correspond to steps in the procedure. Currently it's not clear which boxes represent which steps in the description.

dwelsch-esi · 2025-01-17T19:36:09Z

Docs/userguide/debugging_guidelines.rst

-
-1. FP32 confidence checks
------------------------
+1. Confidence-checking FP32


"Sanity check" is a being phased out as a non-inclusive term in the industry: https://inclusivenaming.org/
This should be changed in graphics as well.

There's a new item in the backlog to link steps in the user guide to techniques in the feature guide. We should do that here as well.

dwelsch-esi · 2025-01-17T20:02:09Z

Docs/userguide/on_target_inference.rst


- an exported model,
- an encodings JSON file containing quantization parameters (like **encoding min/max/scale/offset**) associated with each quantizers.
+- An exported model


Is the exported model a single file or a collection of files?

dwelsch-esi · 2025-01-17T20:03:39Z

Docs/userguide/on_target_inference.rst


-Follow these instructions to `compile AIMET quantized model <https://app.aihub.qualcomm.com/docs/hub/compile_examples.html#compiling-models-quantized-with-aimet-to-tflite-or-qnn>`_ and then submit an inference job using selected device.
+Follow the `instructions <https://app.aihub.qualcomm.com/docs/hub/compile_examples.html#compiling-models-quantized-with-aimet-to-tflite-or-qnn>`_ at the Qualcomm\ |reg| AI hub to compile a model and submit an inference job using the selected device.


It's not clear to me if this simulates the device or deploys it to your device. What exactly happens here?

dwelsch-esi · 2025-01-17T20:05:33Z

Docs/userguide/on_target_inference.rst

+4. Executing the model
+----------------------
+
+The |qnn| SDK ``qnn-net-run`` tool executes the model (represented as serialized context binary) on the specified target.


Is running the model on a target the same as deploying it to that target?

dwelsch-esi · 2025-01-17T21:06:09Z

Docs/userguide/quantization_workflow.rst

+on-target metrics. Decide which are important to your application:
+
+- Latency
+- Memory size


Are these the only on-target metrics that might be of interest?

dwelsch-esi · 2025-01-17T21:12:52Z

Docs/userguide/quantization_workflow.rst

-QuantSim can only quantize math operations performed by :class:`torch.nn.Module` objects, while
-:class:`torch.nn.functional` calls will be incorrectly ignored. Please refer to framework specific
-pages to know more about such model guidelines.
+If any of the metrics are not acceptable with higher precision, begin with weights at INT8 precision and activations at INT16 precision. 


We say "begin with" here, but didn't we actually begin with W16A16 in the previous step? Some more explanation might be a good idea here.

dwelsch-esi · 2025-01-17T21:14:59Z

Docs/userguide/quantization_workflow.rst

-If the off-target quantized accuracy metric is not meeting expectations, you can use PTQ or QAT
-techniques to improve the quantized accuracy for the desired precision. The decision between
-PTQ and QAT should be based on the quantized accuracy and runtime needs.
+If the off-target quantized accuracy metric does not meet expectations, use PTQ or QAT techniques to improve the quantized accuracy for the implemented precision. The decision to use PTQ or QAT should be based on your quantized accuracy and runtime needs.


Can we provide more guidance on "the decision to use PTQ or QAT"? We allude to it elsewhere, but we don't explain what accuracy and runtime criteria would dictate any particular technique.

dwelsch-esi · 2025-01-17T21:15:52Z

Docs/userguide/quantization_workflow.rst


-Once the quantized accuracy and runtime requirements are achieved at the desired precision,
-the optimized model is ready for deployment on the target runtimes.
+Once the quantized accuracy and runtime requirements are achieved at the desired precision, deploy the optimized model on the target runtimes.


Can we provide explicit instructions for how to deploy the model, or a link to those instructions?

Review and edit User Guide.

2320488

Signed-off-by: Dave Welsch <[email protected]>

dwelsch-esi commented Jan 17, 2025

View reviewed changes

quic-bharathr assigned dwelsch-esi Jan 22, 2025

quic-bharathr requested review from quic-hitameht, quic-kyunggeu and quic-akhobare January 22, 2025 02:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review and edit User Guide. #3734

Review and edit User Guide. #3734

dwelsch-esi commented Jan 16, 2025

dwelsch-esi Jan 17, 2025

dwelsch-esi Jan 17, 2025

dwelsch-esi Jan 17, 2025

dwelsch-esi Jan 17, 2025

dwelsch-esi Jan 17, 2025

dwelsch-esi Jan 17, 2025

dwelsch-esi Jan 17, 2025

dwelsch-esi Jan 17, 2025

dwelsch-esi Jan 17, 2025

dwelsch-esi Jan 17, 2025

dwelsch-esi Jan 17, 2025


		Follow these instructions to `compile AIMET quantized model <https://app.aihub.qualcomm.com/docs/hub/compile_examples.html#compiling-models-quantized-with-aimet-to-tflite-or-qnn>`_ and then submit an inference job using selected device.
		Follow the `instructions <https://app.aihub.qualcomm.com/docs/hub/compile_examples.html#compiling-models-quantized-with-aimet-to-tflite-or-qnn>`_ at the Qualcomm\ \|reg\| AI hub to compile a model and submit an inference job using the selected device.

Review and edit User Guide. #3734

Are you sure you want to change the base?

Review and edit User Guide. #3734

Conversation

dwelsch-esi commented Jan 16, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment