Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-115802: JIT using the "medium" code model on x86_64-unknown-linux-gnu #130097

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

brandtbucher
Copy link
Member

@brandtbucher brandtbucher commented Feb 13, 2025

This is a perfect middle-ground between the "large" code model (which we used to use) and the "small" code model (which we currently use):

  • Local data, like OPARG, is encoded directly in the instruction stream (currently they're loaded indirectly).
  • Extern data, like &_PyEval_BinaryOps, is encoded directly in the instruction stream (currently they're loaded indirectly).
  • Local jumps, like _JIT_ERROR_TARGET, use 32-bit jumps (currently they use "relaxable" 64-bit indirect jumps).
  • Extern jumps, like _Py_Dealloc, use "relaxable" 64-bit indirect jumps (same as today).

This only works on one platform, but it's an important one. Looks to be 0.5%-1% faster on benchmarks, as well as a very slight (~0.15%) memory savings due to having to JIT less auxiliary data for storing addresses.

Here's the before-and-after of _LOAD_SMALL_INT:

void
emit__LOAD_SMALL_INT(
    unsigned char *code, unsigned char *data, _PyExecutorObject *executor,
    const _PyUOpInstruction *instruction, jit_state *state)
{
    // 
    // _LOAD_SMALL_INT.o:     file format elf64-x86-64
    // 
    // Disassembly of section .text:
    // 
    // 0000000000000000 <_JIT_ENTRY>:
    // 0: 0f b7 05 00 00 00 00          movzwl  (%rip), %eax            # 0x7 <_JIT_ENTRY+0x7>
    // 0000000000000003:  R_X86_64_GOTPCREL    _JIT_OPARG-0x4
    // 7: c1 e0 05                      shll    $0x5, %eax
    // a: 48 8b 0d 00 00 00 00          movq    (%rip), %rcx            # 0x11 <_JIT_ENTRY+0x11>
    // 000000000000000d:  R_X86_64_REX_GOTPCRELX       _PyRuntime-0x4
    // 11: 48 01 c8                      addq    %rcx, %rax
    // 14: 48 05 f8 36 00 00             addq    $0x36f8, %rax           # imm = 0x36F8
    // 1a: 49 89 45 00                   movq    %rax, (%r13)
    // 1e: 49 83 c5 08                   addq    $0x8, %r13
    // 22: ff 25 00 00 00 00             jmpq    *(%rip)                 # 0x28 <_JIT_ENTRY+0x28>
    // 0000000000000024:  R_X86_64_GOTPCRELX   _JIT_CONTINUE-0x4
    const unsigned char code_body[34] = {
        0x0f, 0xb7, 0x05, 0x00, 0x00, 0x00, 0x00, 0xc1,
        0xe0, 0x05, 0x48, 0x8b, 0x0d, 0x00, 0x00, 0x00,
        0x00, 0x48, 0x01, 0xc8, 0x48, 0x05, 0xf8, 0x36,
        0x00, 0x00, 0x49, 0x89, 0x45, 0x00, 0x49, 0x83,
        0xc5, 0x08,
    };
    // 0: OPARG
    // 8: &_PyRuntime+0x0
    patch_64(data + 0x0, instruction->oparg);
    patch_64(data + 0x8, (uintptr_t)&_PyRuntime);
    memcpy(code, code_body, sizeof(code_body));
    patch_32r(code + 0x3, (uintptr_t)data + -0x4);
    patch_x86_64_32rx(code + 0xd, (uintptr_t)data + 0x4);
}
void
emit__LOAD_SMALL_INT(
    unsigned char *code, unsigned char *data, _PyExecutorObject *executor,
    const _PyUOpInstruction *instruction, jit_state *state)
{
    // 
    // _LOAD_SMALL_INT.o:     file format elf64-x86-64
    // 
    // Disassembly of section .text:
    // 
    // 0000000000000000 <_JIT_ENTRY>:
    // 0: 48 b8 00 00 00 00 00 00 00 00 movabsq $0x0, %rax
    // 0000000000000002:  R_X86_64_64  _JIT_OPARG
    // a: 0f b7 c0                      movzwl  %ax, %eax
    // d: c1 e0 05                      shll    $0x5, %eax
    // 10: 48 b9 00 00 00 00 00 00 00 00 movabsq $0x0, %rcx
    // 0000000000000012:  R_X86_64_64  _PyRuntime+0x36f8
    // 1a: 48 01 c1                      addq    %rax, %rcx
    // 1d: 49 89 4d 00                   movq    %rcx, (%r13)
    // 21: 49 83 c5 08                   addq    $0x8, %r13
    // 25: e9 00 00 00 00                jmp     0x2a <_JIT_ENTRY+0x2a>
    // 0000000000000026:  R_X86_64_PLT32       _JIT_CONTINUE-0x4
    const unsigned char code_body[37] = {
        0x48, 0xb8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x0f, 0xb7, 0xc0, 0xc1, 0xe0, 0x05,
        0x48, 0xb9, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x48, 0x01, 0xc1, 0x49, 0x89, 0x4d,
        0x00, 0x49, 0x83, 0xc5, 0x08,
    };
    memcpy(code, code_body, sizeof(code_body));
    patch_64(code + 0x2, instruction->oparg);
    patch_64(code + 0x12, (uintptr_t)&_PyRuntime + 0x36f8);
}

@brandtbucher brandtbucher added performance Performance or resource usage interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-JIT labels Feb 13, 2025
@brandtbucher brandtbucher self-assigned this Feb 13, 2025
@bedevere-app bedevere-app bot mentioned this pull request Feb 13, 2025
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting core review interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage topic-JIT
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant