Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytorch bias quantisation on aarch64 - how best to proceed? #2015

Closed
christinaburge opened this issue Jul 29, 2024 · 3 comments
Closed

Pytorch bias quantisation on aarch64 - how best to proceed? #2015

christinaburge opened this issue Jul 29, 2024 · 3 comments
Assignees
Labels

Comments

@christinaburge
Copy link

christinaburge commented Jul 29, 2024

Hi! Following on from the issue #1864, I have been doing some further investigation and it looks like it's relatively straightforward to add quantized bias by making these super-hacky changes to

pytorch/third_party/ideep/mkl-dnn/src/cpu/gemm_x8s8s32x_convolution_utils.cpp

            float data,b ;

            if (jcp_.with_bias) {
                // the bias is not properly quantized so this is a hack for now         
                if (jcp_.bias_data_type == dnnl_s32) {
                 b = io::load_float_value(dnnl_f32, bias, g * jcp_.oc + oc);       
                 int quant_b = static_cast<int32_t>(std::round(b / (scales[(g * jcp_.oc + oc) * jcp_.scale_idx_mult])));
                 data_s32 += quant_b;
                 // dequantize data
                 data = static_cast<float>(data_s32);
                 data *= scales[(g * jcp_.oc + oc) * jcp_.scale_idx_mult];
                }
                else {
                 b = io::load_float_value(jcp_.bias_data_type, bias, g * jcp_.oc + oc);
                 // dequantize data
                 data = static_cast<float>(data_s32);
                 data *= scales[(g * jcp_.oc + oc) * jcp_.scale_idx_mult];
                 data += b;
                } 
            }
            else{
// dequantize data
            data = static_cast<float>(data_s32);
            data *= scales[(g * jcp_.oc + oc) * jcp_.scale_idx_mult];
            }

Does that sound sensible? Or is there a better way to proceed?

@kminemur
Copy link

kminemur commented Jul 30, 2024

Hi @christinaburge

May we know why you need this implementation, any background?

To use your-defined bias data type in oneDNN.
you can change bias memory primitives & memory descriptor in cnn_inference_int8.cpp as a reference.
// auto user_bias_memory = memory({{conv_bias_tz}, dt::f32, tag::x}, eng);
--> auto user_bias_memory = memory({{conv_bias_tz}, dt::s8, tag::x}, eng);

// auto conv_bias_md = memory::desc({conv_bias_tz}, dt::f32, tag::any);
--> auto conv_bias_md = memory::desc({conv_bias_tz}, dt::s8, tag::any);

DNNL_VERBOSE output should be:
~/project/oneDNN/examples$ ONEDNN_VERBOSE=1 ./a.out | grep bia_
onednn_verbose,primitive,exec,cpu,convolution,brg_conv_fwd:avx2_vnni,forward_training,src_u8:a:blocked:acdb::f0 wei_s8:a:blocked:AcdB24a4b::f0 bia_s8::a:blocked:a::f0 dst_u8:a:blocked:acdb::f0,attr-scales:src0:0:f32+dst:0:f32+wei:0:f32 attr-post-ops:eltwise_relu,alg:convolution_direct,mb8_ic256oc384_ih13oh13kh3sh1dh0ph1_iw13ow13kw3sw1dw0pw1,14.845

@dzarukin
Copy link
Contributor

dzarukin commented Aug 2, 2024

@christinaburge Though your observation is correct and one can add such support the way you suggested, it is not really helping because of two reasons:

  1. The library simply un-doing what the user just did - in particular, un-scale just scaled bias on their end. Performance-wise there's no point doing twice the work for nothing.
  2. Programming model says the bias is applied after applying scales, so this would contradict with a definition - quantized bias is applied before scaling but non-quantized right after.

To keep the solution clean, bias should be passed non-quantized. We haven't restricted that for some reason, but we might consider doing that in the next major version to avoid such situations. Hope it helps.

@vpirogov
Copy link
Member

vpirogov commented Aug 2, 2024

You can find detailed explanation of current quantization scheme in oneDNN in "Quantization and scaling" RFC.

@vpirogov vpirogov assigned dzarukin and unassigned kminemur Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants