-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review and edit User Guide. #3734
base: develop
Are you sure you want to change the base?
Conversation
Signed-off-by: Dave Welsch <[email protected]>
@@ -4,35 +4,30 @@ | |||
Quantization debugging guidelines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this still called "debugging"? Can we use a more accurate term, like "further quantization" or "secondary optimization"? It might help to explicitly define the goal here, something like: "To improve model accuracy to acceptable levels after application of standard quantization techniques."
At the very least I'd recommend calling it "troubleshooting", which is a less casual, more widely used terminology and is a more general term, which is appropriate here.
|
||
Debugging workflow | ||
================== | ||
|
||
The steps are shown as a flow chart in the following figure and are described in more detail below: | ||
The steps are shown as a flow chart in the following figure and are described in more detail below. | ||
|
||
.. image:: ../images/debugging_guidelines_1.png |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some notes on the image:
- This would make more efficient use of page space (at least on a laptop screen) if it were arranged horizontally instead of vertically. Ideally the text in the graphic should be big enough to read by default; right now I'm having to enlarge the picture to read the text.
- The math symbols in the flowchart should be explained in the text, or better, eliminated in favor of step titles that match the text headings ("Quantizing weights vs activations", and so on). (Some abbreviation might be necessary.)
- Terms in the flowchart should correspond to identifiable operations. Specifically, "fix" is a vague description that's used several times. Numbering the steps would help with this if they follow a definite sequence.
- The boxes in the flowchart should correspond to steps in the procedure. Currently it's not clear which boxes represent which steps in the description.
|
||
1. FP32 confidence checks | ||
------------------------ | ||
1. Confidence-checking FP32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Sanity check" is a being phased out as a non-inclusive term in the industry: https://inclusivenaming.org/
This should be changed in graphics as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a new item in the backlog to link steps in the user guide to techniques in the feature guide. We should do that here as well.
|
||
- an exported model, | ||
- an encodings JSON file containing quantization parameters (like **encoding min/max/scale/offset**) associated with each quantizers. | ||
- An exported model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the exported model a single file or a collection of files?
|
||
Follow these instructions to `compile AIMET quantized model <https://app.aihub.qualcomm.com/docs/hub/compile_examples.html#compiling-models-quantized-with-aimet-to-tflite-or-qnn>`_ and then submit an inference job using selected device. | ||
Follow the `instructions <https://app.aihub.qualcomm.com/docs/hub/compile_examples.html#compiling-models-quantized-with-aimet-to-tflite-or-qnn>`_ at the Qualcomm\ |reg| AI hub to compile a model and submit an inference job using the selected device. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me if this simulates the device or deploys it to your device. What exactly happens here?
4. Executing the model | ||
---------------------- | ||
|
||
The |qnn| SDK ``qnn-net-run`` tool executes the model (represented as serialized context binary) on the specified target. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is running the model on a target the same as deploying it to that target?
on-target metrics. Decide which are important to your application: | ||
|
||
- Latency | ||
- Memory size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these the only on-target metrics that might be of interest?
QuantSim can only quantize math operations performed by :class:`torch.nn.Module` objects, while | ||
:class:`torch.nn.functional` calls will be incorrectly ignored. Please refer to framework specific | ||
pages to know more about such model guidelines. | ||
If any of the metrics are not acceptable with higher precision, begin with weights at INT8 precision and activations at INT16 precision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We say "begin with" here, but didn't we actually begin with W16A16 in the previous step? Some more explanation might be a good idea here.
If the off-target quantized accuracy metric is not meeting expectations, you can use PTQ or QAT | ||
techniques to improve the quantized accuracy for the desired precision. The decision between | ||
PTQ and QAT should be based on the quantized accuracy and runtime needs. | ||
If the off-target quantized accuracy metric does not meet expectations, use PTQ or QAT techniques to improve the quantized accuracy for the implemented precision. The decision to use PTQ or QAT should be based on your quantized accuracy and runtime needs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we provide more guidance on "the decision to use PTQ or QAT"? We allude to it elsewhere, but we don't explain what accuracy and runtime criteria would dictate any particular technique.
|
||
Once the quantized accuracy and runtime requirements are achieved at the desired precision, | ||
the optimized model is ready for deployment on the target runtimes. | ||
Once the quantized accuracy and runtime requirements are achieved at the desired precision, deploy the optimized model on the target runtimes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we provide explicit instructions for how to deploy the model, or a link to those instructions?
Edit all User Guide pages. Reorganize User Guide to guide users in applying quantization processes.