Improve fast-ZPP performance #126

nielsdos · 2025-04-21T13:38:42Z

No description provided.

…de bloat The goal of this PR is to reduce some of the code bloat induced by fast-ZPP. Reduced code bloat results in fewer cache misses (and better DSB coverage), and fewer instructions executed. If we take a look at a simple function: ```c PHP_FUNCTION(twice) { zend_long foo; ZEND_PARSE_PARAMETERS_START(1, 1) Z_PARAM_LONG(foo) ZEND_PARSE_PARAMETERS_END(); RETURN_LONG(foo * 2); } ``` We obtain the following assembly on x86-64 in the non-cold blocks: ```s <+0>: push %r12 <+2>: push %rbp <+3>: push %rbx <+4>: sub $0x10,%rsp <+8>: mov %fs:0x28,%r12 <+17>: mov %r12,0x8(%rsp) <+22>: mov 0x2c(%rdi),%r12d <+26>: cmp $0x1,%r12d <+30>: jne 0x22beaf <zif_twice-3212257> <+36>: cmpb $0x4,0x58(%rdi) <+40>: mov %rsi,%rbp <+43>: jne 0x53c2f0 <zif_twice+96> <+45>: mov 0x50(%rdi),%rax <+49>: add %rax,%rax <+52>: movl $0x4,0x8(%rbp) <+59>: mov %rax,0x0(%rbp) <+63>: mov 0x8(%rsp),%rax <+68>: sub %fs:0x28,%rax <+77>: jne 0x53c312 <zif_twice+130> <+79>: add $0x10,%rsp <+83>: pop %rbx <+84>: pop %rbp <+85>: pop %r12 <+87>: ret <+88>: nopl 0x0(%rax,%rax,1) <+96>: lea 0x50(%rdi),%rbx <+100>: mov %rsp,%rsi <+103>: mov $0x1,%edx <+108>: mov %rbx,%rdi <+111>: call 0x620240 <zend_parse_arg_long_slow> <+116>: test %al,%al <+118>: je 0x22be96 <zif_twice.cold> <+124>: mov (%rsp),%rax <+128>: jmp 0x53c2c1 <zif_twice+49> <+130>: call 0x201050 <__stack_chk_fail@plt> ``` Notice how we get the stack protector overhead in this function and also have to reload the parsed value on the slow path. This happens because the parsed value is returned via a pointer. If instead we were to return struct with a value pair (similar to optional in C++ / Option in Rust), then the values are returned via registers. This means that we no longer have stack protector overhead and we also don't need to reload a value, resulting in better register usage. This is the resulting assembly for the sample function after this patch: ```s <+0>: push %r12 <+2>: push %rbp <+3>: push %rbx <+4>: mov 0x2c(%rdi),%r12d <+8>: cmp $0x1,%r12d <+12>: jne 0x22d482 <zif_twice-3205454> <+18>: cmpb $0x4,0x58(%rdi) <+22>: mov %rsi,%rbp <+25>: jne 0x53be08 <zif_twice+56> <+27>: mov 0x50(%rdi),%rax <+31>: add %rax,%rax <+34>: movl $0x4,0x8(%rbp) <+41>: mov %rax,0x0(%rbp) <+45>: pop %rbx <+46>: pop %rbp <+47>: pop %r12 <+49>: ret <+50>: nopw 0x0(%rax,%rax,1) <+56>: lea 0x50(%rdi),%rbx <+60>: mov $0x1,%esi <+65>: mov %rbx,%rdi <+68>: call 0x61e7b0 <zend_parse_arg_long_slow> <+73>: test %dl,%dl <+75>: je 0x22d46a <zif_twice.cold> <+81>: jmp 0x53bdef <zif_twice+31> ``` The following uses the default benchmark programs we use in CI. Each program is ran on php-cgi with the appropriate `-T` argument, then repeated 15 times. It shows a small performance improvement on Symfony both with and without JIT, and a small improvement on WordPress with JIT. For WordPress, the difference is small as my CPU is bottlenecked on some other stuff as well. | Test | Old Mean | Old Stddev | New Mean | New Stddev | |---------------------------------|----------|------------|----------|------------| | Symfony, no JIT (-T10,50) | 0.5324 | 0.0050 | 0.5272 | 0.0042 | | Symfony, tracing JIT (-T10,50) | 0.5301 | 0.0029 | 0.5264 | 0.0036 | | WordPress, no JIT (-T5,25) | 0.7408 | 0.0049 | 0.7404 | 0.0048 | | WordPress, tracing JIT (-T5,25) | 0.6814 | 0.0052 | 0.6770 | 0.0055 | I was not able to measure any meaningful difference for our micro benchmarks `Zend/bench.php` and `Zend/micro_bench.php`. The Valgrind instruction counts also show a decrease: -0.19% on Symfony without JIT, and -0.14% on WordPress without JIT (see CI).

nielsdos · 2025-04-26T18:33:19Z

Now open on php-src

nielsdos force-pushed the zend-parse-arg-improve-1 branch 2 times, most recently from 51c64a4 to 9806445 Compare April 23, 2025 15:56

nielsdos changed the title ~~wip~~ Improve fast-ZPP performance Apr 23, 2025

nielsdos force-pushed the zend-parse-arg-improve-1 branch 3 times, most recently from 12ad48c to b3dec4a Compare April 26, 2025 17:31

nielsdos closed this Apr 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve fast-ZPP performance #126

Improve fast-ZPP performance #126

Uh oh!

nielsdos commented Apr 21, 2025

Uh oh!

nielsdos commented Apr 26, 2025

Uh oh!

Uh oh!

Improve fast-ZPP performance #126

Improve fast-ZPP performance #126

Uh oh!

Conversation

nielsdos commented Apr 21, 2025

Uh oh!

nielsdos commented Apr 26, 2025

Uh oh!

Uh oh!