runq.c quantization not symmetric #542

hafezmg48 · 2024-10-30T15:04:31Z

Hi,
I believe that the bias is not removed in the quantize() function. This would be necessary to have a symmetric Q8_0 quantization of activations. Is that not needed?

void quantize(QuantizedTensor *qx, float* x, int n) {
    int num_groups = n / GS;
    float Q_MAX = 127.0f;

    for (int group = 0; group < num_groups; group++) {

        // find the max absolute value in the current group
        float wmax = 0.0;
        for (int i = 0; i < GS; i++) {
            float val = fabs(x[group * GS + i]);
            if (val > wmax) {
                wmax = val;
            }
        }

        // calculate and write the scaling factor
        float scale = wmax / Q_MAX;
        qx->s[group] = scale;

        // calculate and write the quantized values
        for (int i = 0; i < GS; i++) {
            float quant_value = x[group * GS + i] / scale; // scale
            int8_t quantized = (int8_t) round(quant_value); // round and clamp
            qx->q[group * GS + i] = quantized;
        }
    }
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runq.c quantization not symmetric #542

runq.c quantization not symmetric #542

hafezmg48 commented Oct 30, 2024

runq.c quantization not symmetric #542

runq.c quantization not symmetric #542

Comments

hafezmg48 commented Oct 30, 2024