Pytorch bias quantisation on aarch64 - how best to proceed? #2015

christinaburge · 2024-07-29T09:54:07Z

Hi! Following on from the issue #1864, I have been doing some further investigation and it looks like it's relatively straightforward to add quantized bias by making these super-hacky changes to

pytorch/third_party/ideep/mkl-dnn/src/cpu/gemm_x8s8s32x_convolution_utils.cpp

            float data,b ;

            if (jcp_.with_bias) {
                // the bias is not properly quantized so this is a hack for now         
                if (jcp_.bias_data_type == dnnl_s32) {
                 b = io::load_float_value(dnnl_f32, bias, g * jcp_.oc + oc);       
                 int quant_b = static_cast<int32_t>(std::round(b / (scales[(g * jcp_.oc + oc) * jcp_.scale_idx_mult])));
                 data_s32 += quant_b;
                 // dequantize data
                 data = static_cast<float>(data_s32);
                 data *= scales[(g * jcp_.oc + oc) * jcp_.scale_idx_mult];
                }
                else {
                 b = io::load_float_value(jcp_.bias_data_type, bias, g * jcp_.oc + oc);
                 // dequantize data
                 data = static_cast<float>(data_s32);
                 data *= scales[(g * jcp_.oc + oc) * jcp_.scale_idx_mult];
                 data += b;
                } 
            }
            else{
// dequantize data
            data = static_cast<float>(data_s32);
            data *= scales[(g * jcp_.oc + oc) * jcp_.scale_idx_mult];
            }

Does that sound sensible? Or is there a better way to proceed?

The text was updated successfully, but these errors were encountered:

kminemur · 2024-07-30T08:29:43Z

Hi @christinaburge

May we know why you need this implementation, any background?

To use your-defined bias data type in oneDNN.
you can change bias memory primitives & memory descriptor in cnn_inference_int8.cpp as a reference.
// auto user_bias_memory = memory({{conv_bias_tz}, dt::f32, tag::x}, eng);
--> auto user_bias_memory = memory({{conv_bias_tz}, dt::s8, tag::x}, eng);

// auto conv_bias_md = memory::desc({conv_bias_tz}, dt::f32, tag::any);
--> auto conv_bias_md = memory::desc({conv_bias_tz}, dt::s8, tag::any);

DNNL_VERBOSE output should be:
~/project/oneDNN/examples$ ONEDNN_VERBOSE=1 ./a.out | grep bia_
onednn_verbose,primitive,exec,cpu,convolution,brg_conv_fwd:avx2_vnni,forward_training,src_u8:a:blocked:acdb::f0 wei_s8:a:blocked:AcdB24a4b::f0 bia_s8::a:blocked:a::f0 dst_u8:a:blocked:acdb::f0,attr-scales:src0:0:f32+dst:0:f32+wei:0:f32 attr-post-ops:eltwise_relu,alg:convolution_direct,mb8_ic256oc384_ih13oh13kh3sh1dh0ph1_iw13ow13kw3sw1dw0pw1,14.845

dzarukin · 2024-08-02T20:34:08Z

@christinaburge Though your observation is correct and one can add such support the way you suggested, it is not really helping because of two reasons:

The library simply un-doing what the user just did - in particular, un-scale just scaled bias on their end. Performance-wise there's no point doing twice the work for nothing.
Programming model says the bias is applied after applying scales, so this would contradict with a definition - quantized bias is applied before scaling but non-quantized right after.

To keep the solution clean, bias should be passed non-quantized. We haven't restricted that for some reason, but we might consider doing that in the next major version to avoid such situations. Hope it helps.

vpirogov · 2024-08-02T20:42:19Z

You can find detailed explanation of current quantization scheme in oneDNN in "Quantization and scaling" RFC.

christinaburge added the question label Jul 29, 2024

shu1chen assigned onednnsupporttriage Jul 30, 2024

shu1chen assigned kminemur and unassigned onednnsupporttriage Jul 31, 2024

vpirogov assigned dzarukin and unassigned kminemur Aug 2, 2024

vpirogov closed this as completed Sep 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pytorch bias quantisation on aarch64 - how best to proceed? #2015

Pytorch bias quantisation on aarch64 - how best to proceed? #2015

christinaburge commented Jul 29, 2024 •

edited

Loading

kminemur commented Jul 30, 2024 •

edited

Loading

dzarukin commented Aug 2, 2024

vpirogov commented Aug 2, 2024

Pytorch bias quantisation on aarch64 - how best to proceed? #2015

Pytorch bias quantisation on aarch64 - how best to proceed? #2015

Comments

christinaburge commented Jul 29, 2024 • edited Loading

kminemur commented Jul 30, 2024 • edited Loading

dzarukin commented Aug 2, 2024

vpirogov commented Aug 2, 2024

christinaburge commented Jul 29, 2024 •

edited

Loading

kminemur commented Jul 30, 2024 •

edited

Loading