Optimize rdbEncodeInteger using bit operations #196

tryfinally · 2020-06-13T13:56:47Z

use sign extension mov instructions into 8/16/32 bits int registers to check if value fits
into 8/16/32 bit width.

// return number of bytes needed to encode (1,2,4 or 0 if above)
int canEncodeInBytes(long long value){ 
    struct SignExtendBits{
        long long bits8: 8; 
        long long bits16: 16;
        long long bits32: 32;|
    } v;           
    v.bits8 = value;
    if ( v.bits8 == value ) return 1; 
    v.bits16 = value; 
    if ( v.bits16 == value ) return 2; 
    v.bits32 = value; 
    if ( v.bits32 == value ) return 4; 
    return 0; 
}

Compiles to compact code even with -O1 :

function(long long):                           # @function(long long)
        movsx   rcx, dil
        mov     eax, 1
        cmp     rcx, rdi
        je      .LBB8_3
        movsx   rcx, di
        mov     eax, 2
        cmp     rcx, rdi
        je      .LBB8_3
        movsxd  rcx, edi
        xor     eax, eax
        cmp     rcx, rdi
        sete    al
        shl     eax, 2
.LBB8_3:
        ret

use sign extension mov instructions into 8/16/32 bits int registers to check if value fits into 8/16/32 bit width. ``` int function(long long value){ struct SignExtendBits{ long long bits8: 8; long long bits16: 16; long long bits32: 32; } v; v.bits8 = value; if ( v.bits8 == value ) return 1; m.bits16 = value; if ( m.bits16 == value ) return 2; m.bits32 = value; if ( m.bits32 == value ) return 3; return 0; } Compiles to compact code even with -O1 : function(long long): movsx rcx, dil mov eax, 1 cmp rcx, rdi je .LBB1_3 movsx rcx, di mov eax, 2 cmp rcx, rdi je .LBB1_3 movsxd rcx, edi xor eax, eax cmp rcx, rdi sete al shl eax, 2 .LBB1_3: ret ```

JohnSully · 2020-06-13T14:39:58Z

Hi @tryfinally do you have an example that shows the perf improvement?

tryfinally · 2020-06-13T15:06:30Z

Nope, but look on the assembly. I know these are micro optimizations.
It would be good to know if you are not welcoming those.
I would agree that macro optimizations needs per data.

JohnSully · 2020-06-13T16:01:17Z

OK let me do some save/load testing to see the impact.

In general optimization only fixes should have some scenario or perf data to justify them. Sometimes perf results can be counterintuitive so its important to keep it data driven.

That said I don’t want to discourage tinkering or contributions, so this is not going to be a super high bar.

romange · 2020-09-30T09:25:23Z

63 ^ __builtin_clzll(value) is a better fit there. it's branchless and cross platform.
it returns the msb index (floor of log2) which you can divide by 8 to get byte width.

VivekSainiEQ force-pushed the unstable branch from 151f759 to 7fefc7b Compare March 19, 2021 20:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize rdbEncodeInteger using bit operations #196

Optimize rdbEncodeInteger using bit operations #196

tryfinally commented Jun 13, 2020

JohnSully commented Jun 13, 2020

tryfinally commented Jun 13, 2020

JohnSully commented Jun 13, 2020

romange commented Sep 30, 2020

Optimize rdbEncodeInteger using bit operations #196

Are you sure you want to change the base?

Optimize rdbEncodeInteger using bit operations #196

Conversation

tryfinally commented Jun 13, 2020

JohnSully commented Jun 13, 2020

tryfinally commented Jun 13, 2020

JohnSully commented Jun 13, 2020

romange commented Sep 30, 2020