Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize rdbEncodeInteger using bit operations #196

Open
wants to merge 1 commit into
base: unstable
Choose a base branch
from

Conversation

tryfinally
Copy link
Contributor

use sign extension mov instructions into 8/16/32 bits int registers to check if value fits
into 8/16/32 bit width.

// return number of bytes needed to encode (1,2,4 or 0 if above)
int canEncodeInBytes(long long value){ 
    struct SignExtendBits{
        long long bits8: 8; 
        long long bits16: 16;
        long long bits32: 32;|
    } v;           
    v.bits8 = value;
    if ( v.bits8 == value ) return 1; 
    v.bits16 = value; 
    if ( v.bits16 == value ) return 2; 
    v.bits32 = value; 
    if ( v.bits32 == value ) return 4; 
    return 0; 
} 

Compiles to compact code even with -O1 :

function(long long):                           # @function(long long)
        movsx   rcx, dil
        mov     eax, 1
        cmp     rcx, rdi
        je      .LBB8_3
        movsx   rcx, di
        mov     eax, 2
        cmp     rcx, rdi
        je      .LBB8_3
        movsxd  rcx, edi
        xor     eax, eax
        cmp     rcx, rdi
        sete    al
        shl     eax, 2
.LBB8_3:
        ret

use sign extension mov instructions into 8/16/32 bits int registers to check if value fits
into 8/16/32 bit width.
```
int function(long long value){
    struct SignExtendBits{
        long long bits8: 8;
        long long bits16: 16;
        long long bits32: 32;
    } v;

    v.bits8 = value;
    if ( v.bits8 == value ) return 1;
    m.bits16 = value;
    if ( m.bits16 == value ) return 2;
    m.bits32 = value;
    if ( m.bits32 == value ) return 3;

    return 0;
}

Compiles to compact code even with -O1 :
function(long long):
        movsx rcx, dil
        mov eax, 1
        cmp rcx, rdi
        je .LBB1_3
        movsx rcx, di
        mov eax, 2
        cmp rcx, rdi
        je .LBB1_3
        movsxd rcx, edi
        xor eax, eax
        cmp rcx, rdi
        sete al
        shl eax, 2
.LBB1_3:
        ret
```
@JohnSully
Copy link
Collaborator

Hi @tryfinally do you have an example that shows the perf improvement?

@tryfinally
Copy link
Contributor Author

Nope, but look on the assembly. I know these are micro optimizations.
It would be good to know if you are not welcoming those.
I would agree that macro optimizations needs per data.

@JohnSully
Copy link
Collaborator

OK let me do some save/load testing to see the impact.

In general optimization only fixes should have some scenario or perf data to justify them. Sometimes perf results can be counterintuitive so its important to keep it data driven.

That said I don’t want to discourage tinkering or contributions, so this is not going to be a super high bar.

@romange
Copy link

romange commented Sep 30, 2020

63 ^ __builtin_clzll(value) is a better fit there. it's branchless and cross platform.
it returns the msb index (floor of log2) which you can divide by 8 to get byte width.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants