You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I believe that the bias is not removed in the quantize() function. This would be necessary to have a symmetric Q8_0 quantization of activations. Is that not needed?
voidquantize(QuantizedTensor*qx, float*x, intn) {
intnum_groups=n / GS;
floatQ_MAX=127.0f;
for (intgroup=0; group<num_groups; group++) {
// find the max absolute value in the current groupfloatwmax=0.0;
for (inti=0; i<GS; i++) {
floatval=fabs(x[group*GS+i]);
if (val>wmax) {
wmax=val;
}
}
// calculate and write the scaling factorfloatscale=wmax / Q_MAX;
qx->s[group] =scale;
// calculate and write the quantized valuesfor (inti=0; i<GS; i++) {
floatquant_value=x[group*GS+i] / scale; // scaleint8_tquantized= (int8_t) round(quant_value); // round and clampqx->q[group*GS+i] =quantized;
}
}
}
The text was updated successfully, but these errors were encountered:
Hi,
I believe that the bias is not removed in the quantize() function. This would be necessary to have a symmetric Q8_0 quantization of activations. Is that not needed?
The text was updated successfully, but these errors were encountered: