Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review and edit User Guide. #3734

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

dwelsch-esi
Copy link
Contributor

Edit all User Guide pages. Reorganize User Guide to guide users in applying quantization processes.

Signed-off-by: Dave Welsch <[email protected]>
@@ -4,35 +4,30 @@
Quantization debugging guidelines
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this still called "debugging"? Can we use a more accurate term, like "further quantization" or "secondary optimization"? It might help to explicitly define the goal here, something like: "To improve model accuracy to acceptable levels after application of standard quantization techniques."

At the very least I'd recommend calling it "troubleshooting", which is a less casual, more widely used terminology and is a more general term, which is appropriate here.


Debugging workflow
==================

The steps are shown as a flow chart in the following figure and are described in more detail below:
The steps are shown as a flow chart in the following figure and are described in more detail below.

.. image:: ../images/debugging_guidelines_1.png
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes on the image:

  • This would make more efficient use of page space (at least on a laptop screen) if it were arranged horizontally instead of vertically. Ideally the text in the graphic should be big enough to read by default; right now I'm having to enlarge the picture to read the text.
  • The math symbols in the flowchart should be explained in the text, or better, eliminated in favor of step titles that match the text headings ("Quantizing weights vs activations", and so on). (Some abbreviation might be necessary.)
  • Terms in the flowchart should correspond to identifiable operations. Specifically, "fix" is a vague description that's used several times. Numbering the steps would help with this if they follow a definite sequence.
  • The boxes in the flowchart should correspond to steps in the procedure. Currently it's not clear which boxes represent which steps in the description.


1. FP32 confidence checks
------------------------
1. Confidence-checking FP32
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Sanity check" is a being phased out as a non-inclusive term in the industry: https://inclusivenaming.org/
This should be changed in graphics as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a new item in the backlog to link steps in the user guide to techniques in the feature guide. We should do that here as well.


- an exported model,
- an encodings JSON file containing quantization parameters (like **encoding min/max/scale/offset**) associated with each quantizers.
- An exported model
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the exported model a single file or a collection of files?


Follow these instructions to `compile AIMET quantized model <https://app.aihub.qualcomm.com/docs/hub/compile_examples.html#compiling-models-quantized-with-aimet-to-tflite-or-qnn>`_ and then submit an inference job using selected device.
Follow the `instructions <https://app.aihub.qualcomm.com/docs/hub/compile_examples.html#compiling-models-quantized-with-aimet-to-tflite-or-qnn>`_ at the Qualcomm\ |reg| AI hub to compile a model and submit an inference job using the selected device.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me if this simulates the device or deploys it to your device. What exactly happens here?

4. Executing the model
----------------------

The |qnn| SDK ``qnn-net-run`` tool executes the model (represented as serialized context binary) on the specified target.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is running the model on a target the same as deploying it to that target?

on-target metrics. Decide which are important to your application:

- Latency
- Memory size
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the only on-target metrics that might be of interest?

QuantSim can only quantize math operations performed by :class:`torch.nn.Module` objects, while
:class:`torch.nn.functional` calls will be incorrectly ignored. Please refer to framework specific
pages to know more about such model guidelines.
If any of the metrics are not acceptable with higher precision, begin with weights at INT8 precision and activations at INT16 precision.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We say "begin with" here, but didn't we actually begin with W16A16 in the previous step? Some more explanation might be a good idea here.

If the off-target quantized accuracy metric is not meeting expectations, you can use PTQ or QAT
techniques to improve the quantized accuracy for the desired precision. The decision between
PTQ and QAT should be based on the quantized accuracy and runtime needs.
If the off-target quantized accuracy metric does not meet expectations, use PTQ or QAT techniques to improve the quantized accuracy for the implemented precision. The decision to use PTQ or QAT should be based on your quantized accuracy and runtime needs.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we provide more guidance on "the decision to use PTQ or QAT"? We allude to it elsewhere, but we don't explain what accuracy and runtime criteria would dictate any particular technique.


Once the quantized accuracy and runtime requirements are achieved at the desired precision,
the optimized model is ready for deployment on the target runtimes.
Once the quantized accuracy and runtime requirements are achieved at the desired precision, deploy the optimized model on the target runtimes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we provide explicit instructions for how to deploy the model, or a link to those instructions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant