Skip to content

tracing JIT floating point register clobbering on Windows and ARM64 #18136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nono303 opened this issue Mar 24, 2025 · 27 comments
Closed

tracing JIT floating point register clobbering on Windows and ARM64 #18136

nono303 opened this issue Mar 24, 2025 · 27 comments

Comments

@nono303
Copy link
Contributor

nono303 commented Mar 24, 2025

Hi,
I'm facing a strange bug on a "simple" distance calculation (addition in a loop) with PHP cli & opcache JIT enabled.
with opcache.enable_cli = 1 and opcache.jit >= 1252 (optimization level between 2 and 5) at some time in the loop the addition (+=) give erratic results.

  • Only appears when using the phpGPX components.
    I've extracted the distance calculation function for standalone & callback tests with same data without any issue.
    So, might be related to namespace, object type or whatever...
  • The bug doesn't affect PHP cgi: tested with the same script, same data same php.ini... configuration without any issue.
  • I'm running on Windows, self-compiled PHP 8.4.5 TS vs17 x64

Here are "ready to test" files to reproduce it (like on linux)
⏩ Results:

opcache.enable_cli:0
  standalone: 9063.663059466
  phpGPX    : 9063.6630584286

opcache.enable_cli:1    opcache.jit:disable
  standalone: 9063.663059466
  phpGPX    : 9063.6630584286
opcache.enable_cli:1    opcache.jit:1251
  standalone: 9063.663059466
  phpGPX    : 9063.6630584286
opcache.enable_cli:1    opcache.jit:1252
  standalone: 9063.663059466
  phpGPX    : 1.369121694286
opcache.enable_cli:1    opcache.jit:1253
  standalone: 9063.663059466
  phpGPX    : 1.7499852968769
opcache.enable_cli:1    opcache.jit:1254
  standalone: 9063.663059466
  phpGPX    : 1.7499852968769
opcache.enable_cli:1    opcache.jit:1255
  standalone: 9063.663059466
  phpGPX    : 1.7499852968769

Adding a debug line to DistanceCalculator.php

diff --git "a/src/phpGPX/Helpers/DistanceCalculator.php" "b/src/phpGPX/Helpers/DistanceCalculator.php"
index 5a2a51b..d2df82e 100644
--- "a/src/phpGPX/Helpers/DistanceCalculator.php"
+++ "b/src/phpGPX/Helpers/DistanceCalculator.php"
@@ -77,6 +77,8 @@ class DistanceCalculator
 			else {
 				$distance += $curPoint->difference;
 				$lastConsideredPoint = $curPoint;
+				if($p >= 120 && $p <= 130)
+					echo "it:".$p." last:".$ld." cur:".$distance." add:".$curPoint->difference.PHP_EOL;
 			}
 
 			$curPoint->distance = $distance;

results, failing @ iteration 128:

opcache.enable_cli:1 opcache.jit:1254
it:120 last:565.12951886607 cur:568.91636386652 add:3.7868450004519
it:121 last:568.91636386652 cur:574.28152730666 add:5.3651634401381
it:122 last:574.28152730666 cur:578.88597767002 add:4.604450363368
it:123 last:578.88597767002 cur:583.43520379413 add:4.5492261241002
it:124 last:583.43520379413 cur:587.74462875461 add:4.3094249604809
it:125 last:587.74462875461 cur:594.67011966358 add:6.9254909089751
it:126 last:594.67011966358 cur:598.5481926842 add:3.8780730206198
it:127 last:598.5481926842 cur:601.6696032882 add:3.1214106039962
it:128 last:0.51993194404748 cur:2.8028430036349 add:2.2829110595874
it:129 last:0.51993220497079 cur:5.3492195083835 add:4.8292873034127
it:130 last:0.51993241619443 cur:4.6919392250549 add:4.1720068088605

opcache.enable_cli:0
it:120 last:565.12951886607 cur:568.91636386652 add:3.7868450004519
it:121 last:568.91636386652 cur:574.28152730666 add:5.3651634401381
it:122 last:574.28152730666 cur:578.88597767002 add:4.604450363368
it:123 last:578.88597767002 cur:583.43520379413 add:4.5492261241002
it:124 last:583.43520379413 cur:587.74462875461 add:4.3094249604809
it:125 last:587.74462875461 cur:594.67011966358 add:6.9254909089751
it:126 last:594.67011966358 cur:598.5481926842 add:3.8780730206198
it:127 last:598.5481926842 cur:601.6696032882 add:3.1214106039962
it:128 last:601.6696032882 cur:603.95251434778 add:2.2829110595874
it:129 last:603.95251434778 cur:608.7818016512 add:4.8292873034127
it:130 last:608.7818016512 cur:612.95380846006 add:4.1720068088605

Available to perform further tests and provide details if needed

PHP Version

PHP 8.4.5 TS vs17 x64

Operating System

Windows 11 x64

@nielsdos
Copy link
Member

I get an assertion failure on 8.4.5 on Windows+JIT (when built in debug mode):
Assertion failed: _blocks[use] == b || use_insn->op == IR_PARAM, file ext\opcache\jit\ir\ir_gcm.c, line 948
On PHP-8.4 git HEAD I no longer get the assertion failure but I do get the wrong result on Windows+JIT.
I'll have a look a bit later

@nielsdos
Copy link
Member

nielsdos commented Mar 24, 2025

Reduced to a standalone reproducer:

<?php
namespace Foo;

function diff($point1, $point2)
{
	$a = deg2rad($point1); // Prefixing these with \ also makes the issue go away
	$b = deg2rad($point2);
	return $a - $b;
}

function getRawDistance()
{
	$distance = 0;
	for ($p = 0; $p < 200; $p++) {
		// Needs to be a namespaced call_user_func call to reproduce the issue (i.e. adding \ at front makes the issue go away)
		$distance += call_user_func('Foo\diff', 0, $p);
	}

	return $distance;
}

var_dump(getRawDistance());

@nielsdos
Copy link
Member

nielsdos commented Mar 24, 2025

The above reproducer outputs float(-3.473205211468716) but it should output float(-347.3205211468715). Can only reproduce it on Windows, not on Linux. Run with -d opcache.jit=tracing. Not sure where to start looking and I don't have a capstone package on Windows to look at the disassembly... cc @dstogov
EDIT: it's as if instead of addition to $distance, it assigns to $distance. The result of $distance is always equal to the result of the last call to Foo\diff.

@nono303
Copy link
Contributor Author

nono303 commented Apr 4, 2025

Hi,
Another opcache JIT optimization >=2 issue resulting a core dump
Always the same context (Windows 11 x64 + PHP 8.4.5 TS vs17 x64 self compiled) but with php-cgi
Unhandled exception at 0x00007FFD48540727 (php8ts.dll) in php-cgi.exe_250402_164329.dmp: 0xC0000005: Access violation reading location 0x000001F33605F0B8.

const _zend_array * is null in the Inline Frame call stack

[Inline Frame] php8ts.dll!zend_hash_find_bucket(const _zend_array *) Line 769
	at C:\sdk\src\php-sdk\phpmaster\vs17\x64\php-src\Zend\zend_hash.c(769)
[Inline Frame] php8ts.dll!zend_hash_find(const _zend_array *) Line 2675
	at C:\sdk\src\php-sdk\phpmaster\vs17\x64\php-src\Zend\zend_hash.c(2675)
[Inline Frame] php8ts.dll!zend_hash_find_ex(const _zend_array *) Line 192
	at C:\sdk\src\php-sdk\phpmaster\vs17\x64\php-src\Zend\zend_hash.h(192)
[Inline Frame] php8ts.dll!zend_fetch_dimension_address_inner(_zend_array * ht, const _zval_struct * dim, int) Line 2712
	at C:\sdk\src\php-sdk\phpmaster\vs17\x64\php-src\Zend\zend_execute.c(2712)
php8ts.dll!ZEND_FETCH_DIM_R_SPEC_CV_CONST_HANDLER(_zend_execute_data * execute_data) Line 42849
	at C:\sdk\src\php-sdk\phpmaster\vs17\x64\php-src\Zend\zend_vm_execute.h(42849)
php8ts.dll!execute_ex(_zend_execute_data * ex) Line 58584
	at C:\sdk\src\php-sdk\phpmaster\vs17\x64\php-src\Zend\zend_vm_execute.h(58584)
php8ts.dll!zend_execute(_zend_op_array * op_array, _zval_struct * return_value) Line 64238
	at C:\sdk\src\php-sdk\phpmaster\vs17\x64\php-src\Zend\zend_vm_execute.h(64238)
php8ts.dll!zend_execute_script(int type, _zval_struct * retval, _zend_file_handle * file_handle) Line 1935
	at C:\sdk\src\php-sdk\phpmaster\vs17\x64\php-src\Zend\zend.c(1935)
php8ts.dll!php_execute_script_ex(_zend_file_handle * primary_file, _zval_struct * retval) Line 2575
	at C:\sdk\src\php-sdk\phpmaster\vs17\x64\php-src\main\main.c(2575)
php-cgi.exe!main(int argc, char * * argv) Line 2562
	at C:\sdk\src\php-sdk\phpmaster\vs17\x64\php-src\sapi\cgi\cgi_main.c(2562)
[Inline Frame] php-cgi.exe!invoke_main() Line 78
	at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl(78)
php-cgi.exe!__scrt_common_main_seh() Line 288
	at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl(288)
kernel32.dll!BaseThreadInitThunk()
ntdll.dll!RtlUserThreadStart()

Image

What kind of details would be usefull to understand this one? (the script is quite complex...)

@nielsdos
Copy link
Member

nielsdos commented Apr 5, 2025

I found the root cause, and your last comment is likely related to the same root cause.

The problem is due to (global) register allocation. getRawDistance() stores $distance in %xmm6 and diff() also uses %xmm6. %xmm6 is a preserved register, but the prologue of diff() does not save this to the stack frame. ctx->used_preserved_regs does not even contain the floating point register. So the value of $distance (%xmm6) in getRawDistance() is simply overwritten by the code in diff(). I found that we enter this branch:

https://github.com/dstogov/ir/blob/8d17022fb61ebfed9f6be81a8182ea31202697ed/ir_ra.c#L4099-L4100

The reason this doesn't happen on Linux is because on Linux all XMM registers are volatile, while on Windows only xmm0-xmm5 are volatile while xmm6-xmm15 are preserved.

@nono303 As a workaround, try setting the R component of opcache.jit to 1. e.g. instead of 1252 pick 1152. Does that also workaround the bug reported in your last comment? If that doesn't work, try picking 0.

@nono303
Copy link
Contributor Author

nono303 commented Apr 5, 2025

Many Thx @nielsdos for your time on it!

✅ I confirm that changing R component of opcache.jit from 2 to 1 (whatever is O between 2 and 5) fix the issue as a workaround for getRawDistance() and also for my second complex script

> php dist.php
opcache.enable_cli:1    opcache.jit:1255
  function:    9063.663059466
  callback:    9063.663059466
  phpGPX full: 1.7499852968769
  phpGPX mini: 1.7499852906557

> php dist.php
opcache.enable_cli:1    opcache.jit:1155
  function:    9063.663059466
  callback:    9063.663059466
  phpGPX full: 9063.6630584286
  phpGPX mini: 9061.4300164687

⏩ If you need some quick (like with patch on php-src before making a PR) tests on Windows, don’t hesitate as I can easily compile binary and launch different tests

@nielsdos
Copy link
Member

nielsdos commented Apr 5, 2025

Thanks for testing. Unfortunately, I will have difficulties making a patch for this as I'm not too familiar with the backend component of IR and it would take me some time learning that. I also think that if this means we diverge from the fixed stack frame then we get issues with the unwinding code that was added in #17095
So this is a bit above my knowledge level for now and would be best if @dstogov could take a look please.

@dstogov
Copy link
Member

dstogov commented Apr 7, 2025

@nono303 thanks for the report. @nielsdos thanks for the investigation. You already found the reason and I hope it shouldn't be a big problem to fix this now. I'll take care.

@dstogov
Copy link
Member

dstogov commented Apr 7, 2025

The fair fix is simple. Could you check that it fixes your problem (I wasn't able to reproduce the failure with reproduction case. Only the assertion).

diff --git a/ext/opcache/jit/ir/ir_ra.c b/ext/opcache/jit/ir/ir_ra.c
index 0c0e8dec3b4..fa8f639c0c3 100644
--- a/ext/opcache/jit/ir/ir_ra.c
+++ b/ext/opcache/jit/ir/ir_ra.c
@@ -4097,7 +4097,7 @@ static void assign_regs(ir_ctx *ctx)
 		if (IR_REGSET_DIFFERENCE(IR_REGSET_INTERSECTION(used_regs, IR_REGSET_PRESERVED),
 			ctx->used_preserved_regs)) {
 			// TODO: Preserved reg and fixed frame conflict ???
-			// IR_ASSERT(0 && "Preserved reg and fixed frame conflict");
+			IR_ASSERT(0 && "Preserved reg and fixed frame conflict");
 		}
 	} else {
 		ctx->used_preserved_regs = IR_REGSET_UNION((ir_regset)ctx->fixed_save_regset,
diff --git a/ext/opcache/jit/zend_jit_ir.c b/ext/opcache/jit/zend_jit_ir.c
index 4d006c19d34..ff0abd8e1a0 100644
--- a/ext/opcache/jit/zend_jit_ir.c
+++ b/ext/opcache/jit/zend_jit_ir.c
@@ -33,12 +33,10 @@
 # define ZREG_IP             15 /* IR_REG_R15 */
 # define ZREG_FIRST_FPR      16
 # if defined(_WIN64)
-#  define IR_REGSET_PRESERVED ((1<<3) | (1<<5) | (1<<6) | (1<<7) | (1<<12) | (1<<13) | (1<<14) | (1<<15))
-/*
+//#  define IR_REGSET_PRESERVED ((1<<3) | (1<<5) | (1<<6) | (1<<7) | (1<<12) | (1<<13) | (1<<14) | (1<<15))
 #  define IR_REGSET_PRESERVED ((1<<3) | (1<<5) | (1<<6) | (1<<7) | (1<<12) | (1<<13) | (1<<14) | (1<<15) | \
                                (1<<(16+6)) | (1<<(16+7)) | (1<<(16+8)) | (1<<(16+9)) | (1<<(16+10)) | \
                                (1<<(16+11)) | (1<<(16+12)) | (1<<(16+13)) | (1<<(16+14)) | (1<<(16+15)))
-*/
 #  define IR_SHADOW_ARGS     32
 # else
 #  define IR_REGSET_PRESERVED ((1<<3) | (1<<5) | (1<<12) | (1<<13) | (1<<14) | (1<<15)) /* all preserved registers */
@@ -2696,7 +2694,8 @@ static void zend_jit_init_ctx(zend_jit_ctx *jit, uint32_t flags)
 			jit->ctx.flags |= IR_USE_FRAME_POINTER;
 			jit->ctx.fixed_stack_frame_size = sizeof(void*) * 16; /* 10 saved registers and 6 spill slots (8 bytes) */
 #elif defined(_WIN64)
-			jit->ctx.fixed_stack_frame_size = sizeof(void*) * 11; /* 8 saved registers and 3 spill slots (8 bytes) */
+//			jit->ctx.fixed_stack_frame_size = sizeof(void*) * 11; /* 8 saved registers and 3 spill slots (8 bytes) */
+			jit->ctx.fixed_stack_frame_size = sizeof(void*) * 21; /* 18 saved registers and 3 spill slots (8 bytes) */
 #elif defined(IR_TARGET_X86_64)
 			jit->ctx.fixed_stack_frame_size = sizeof(void*) * 9;  /* 6 saved registers and 3 spill slots (8 bytes) */
 #else /* IR_TARGET_x86 */

The drawback of the fix, that it significantly increases code of each prologue and epilogue. This is the reason it wasn't applied in the first place.

After all, I think that it may make sense to avoid "fixed stack frame" for CALL VM JIT, but this would require inlining of most the JIT stubs. Probably, this is the right way, but it's not a simple fix.

@nielsdos may be you have other ideas?

@dstogov
Copy link
Member

dstogov commented Apr 7, 2025

@arnaud-lb please also think about this problem

@arnaud-lb
Copy link
Member

If I understand correctly, fixed_stack_frame_size causes JIT'ed code to allocate a fixed amount of stack in prologue and to always save fixed_save_regset. This implies that epilogues are always the same, which allow us to jump to stubs, as they will restore registers and stack pointers correctly before they return.

If that's the case, I think we could achieve a similar result by storing a bitset of saved registers on the top of the stack frame, so that stubs know the size of the frame and which registers they have to restore. Stub epilogues would be slower, but normal functions will usually have smaller epilogue/prologue.

An alternative is to not emit an epilogue in stubs, and to call them:

call foo_stub
ret

NB: I don't understand the difference between CALL and HYBRID in this context.

@nielsdos
Copy link
Member

nielsdos commented Apr 7, 2025

@dstogov That works and the patch looks right: now xmm6-xmm15 are preserved. I had tried something like this (but more hacked) but didn't pursue it because of the overhead.

@dstogov
Copy link
Member

dstogov commented Apr 7, 2025

An alternative is to not emit an epilogue in stubs, and to call them:

This won't work at least for linking between traces. (prologue in root trace, then few branches to side traces, then epilogue and return).

NB: I don't understand the difference between CALL and HYBRID in this context.

HYBRID handlers are called through C computed goto, inside the execute_ex() stack frame, and "return" through indirect jmp. They require "fixed stack frame" and may avoid saving/restoring preserved registers.

@nono303
Copy link
Contributor Author

nono303 commented Apr 7, 2025

Hi @dstogov
✅ I can confirm that your patch applied on 8.4.6rc1 fix the issue on Windows!

>php dist.php
opcache.enable_cli:1    opcache.jit:1255
  function:    9063.663059466
  callback:    9063.663059466
  phpGPX full: 1.7499852968769
  phpGPX mini: 1.7499852906557

...

    # apply php-src_pr18136.patch
Checking patch ext/opcache/jit/ir/ir_ra.c...
Checking patch ext/opcache/jit/zend_jit_ir.c...
Applied patch ext/opcache/jit/ir/ir_ra.c cleanly.
Applied patch ext/opcache/jit/zend_jit_ir.c cleanly.

...

>php dist.php
opcache.enable_cli:1    opcache.jit:1255
  function:    9063.663059466
  callback:    9063.663059466
  phpGPX full: 9063.6630584286
  phpGPX mini: 9061.4300164687

I'll check if i see some regression on other use-cases

@dstogov
Copy link
Member

dstogov commented Apr 14, 2025

I though about this problem a bit more, and I think it may be reduced...
Zend VM shouldn't use XMM6-XMM15, so clobbering them is not a big problem.

The problems arises when JIT-ed code that uses XMM6-XMM15 calls some engine function, that calls interpreter, that calls the other JIT-ed code that modifies XMM6-XMM15. So, a wrapper for zif_call_user_func() could solve the problem for this fix.
Of course this is not a complete and not a completely fair solution.

In long terms I would think about using the single HYBRID VM for all targets that supports JIT.

@arnaud-lb
Copy link
Member

HYBRID handlers are called through C computed goto, inside the execute_ex() stack frame, and "return" through indirect jmp. They require "fixed stack frame" and may avoid saving/restoring preserved registers.

Ok, I was missing that HYBRID JIT does not preserve registers at all. r14/r15 are shared by execute_ex and JIT'ed code so don't need to be preserved (but must be preserved when reentering execute_ex, so they are preserved by execute_ex). rbx/r12/r13 are considered cloberred when calling opcode handlers in execute_ex due to HYBRID_JIT_GUARD.

The problems arises when JIT-ed code that uses XMM6-XMM15 calls some engine function, that calls interpreter, that calls the other JIT-ed code that modifies XMM6-XMM15. So, a wrapper for zif_call_user_func() could solve the problem for this fix.
Of course this is not a complete and not a completely fair solution.

I think this can also happen in the following scenario: Internal function that uses XMM6-15 calls JIT'ed code that modifies XMM6-15. We may need to save XMM6-15 when entering/leaving execute_ex.

@dstogov
Copy link
Member

dstogov commented Apr 14, 2025

I think this can also happen in the following scenario: Internal function that uses XMM6-15 calls JIT'ed code that modifies XMM6-15. We may need to save XMM6-15 when entering/leaving execute_ex.

You are right.

It seems you have found an extremely simple and efficient solution! :)
Just tell C compiler to save all the preserved registers for execute_ex().
I hope MSVC allows this.

@nono303
Copy link
Contributor Author

nono303 commented Apr 14, 2025

I hope MSVC allows this.

I'll be able to test a patch on x64 MSVC 14.2 (vs16) & 14.3 (vs17)

@nielsdos
Copy link
Member

nielsdos commented Apr 14, 2025

It's an interesting idea. Unfortunately MSVC does not support inline assembly on x64, so the usual trick to set the registers in the clobber list does not work unfortunately; maybe there's another simple way to let MSVC do this.

EDIT: I'll think about how to use an external assembler file to do this thing.

@nielsdos
Copy link
Member

I've worked on patching this, and I had two ideas (only for x64 for now so far):

  1. Save/restore XMM registers via an external assembly routine. This works (https://gist.github.com/nielsdos/a3b0b66373e839019f437ecdd940a007#file-working-patch) but may be unsafe depending on what the compiler does exactly
  2. Rename execute_ex to execute_ex_real on Windows and use an assembly routine to save/restore the xmm registers on the stack. This should be safer, but for some reason I can't get this to work and I don't understand why (https://gist.github.com/nielsdos/a3b0b66373e839019f437ecdd940a007#file-notworking-patch)

@nono303
Copy link
Contributor Author

nono303 commented Apr 14, 2025

Thx @nielsdos!
Testing on vs17 x64 (release TS, AVX2)

  • Applying file-notworking-patch I obtained execute_ex already defined in zend_execute compilation error:
...
    # apply php-src_pr18136-2nw.patch
Checking patch Zend/asm/save_xmm_x86_64_ms_masm.asm...
Checking patch Zend/zend_vm_execute.skl...
Checking patch win32/build/config.w32...
Applied patch Zend/asm/save_xmm_x86_64_ms_masm.asm cleanly.
Applied patch Zend/zend_vm_execute.skl cleanly.
Applied patch win32/build/config.w32 cleanly.
...
 Assembling: Zend\asm\jump_x86_64_ms_pe_masm.asm
 Assembling: Zend\asm\make_x86_64_ms_pe_masm.asm
 Assembling: Zend\asm\save_xmm_x86_64_ms_masm.asm
save_xmm_x86_64_ms_masm.obj : error LNK2005: execute_ex already defined in zend_execute.obj
   Creating library C:\sdk\src\php-sdk\phpmaster\vs17\x64\build\Release_TS\php8ts.lib and object C:\sdk\src\php-sdk\phpmaster\vs17\x64\build\Release_TS\php8ts.exp
save_xmm_x86_64_ms_masm.obj : error LNK2001: unresolved external symbol execute_ex_real: return code '0x460'
Stop.
...
...
    # apply php-src_pr18136-2w.patch
Checking patch Zend/asm/save_xmm_x86_64_ms_masm.asm...
Checking patch Zend/zend_vm_gen.php...
Checking patch win32/build/config.w32...
Applied patch Zend/asm/save_xmm_x86_64_ms_masm.asm cleanly.
Applied patch Zend/zend_vm_gen.php cleanly.
Applied patch win32/build/config.w32 cleanly.
...
 Assembling: Zend\asm\jump_x86_64_ms_pe_masm.asm
 Assembling: Zend\asm\make_x86_64_ms_pe_masm.asm
 Assembling: Zend\asm\save_xmm_x86_64_ms_masm.asm
   Creating library C:\sdk\src\php-sdk\phpmaster\vs17\x64\build\Release_TS\php8ts.lib and object C:\sdk\src\php-sdk\phpmaster\vs17\x64\build\Release_TS\php8ts.exp
Generating code
Finished generating code
...
>php dist.php
opcache.enable_cli:1    opcache.jit:1255
  function:    9063.663059466
  callback:    9063.663059466
  phpGPX full: 1.7499852968769
  phpGPX mini: 1.7499852906557

@nielsdos
Copy link
Member

@nono303 Did you regenerate the VM? Try rerunning Zend/zend_vm_gen.php and rerun nmake.

@nono303
Copy link
Contributor Author

nono303 commented Apr 14, 2025

@nono303 Did you regenerate the VM? Try rerunning Zend/zend_vm_gen.php and rerun nmake.

Nop... Better when regenerating the VM ;)
file-working-patch Compile & fix issue

...
    # apply php-src_pr18136-2w.patch
Checking patch Zend/asm/save_xmm_x86_64_ms_masm.asm...
Checking patch Zend/zend_vm_gen.php...
Checking patch win32/build/config.w32...
Applied patch Zend/asm/save_xmm_x86_64_ms_masm.asm cleanly.
Applied patch Zend/zend_vm_gen.php cleanly.
Applied patch win32/build/config.w32 cleanly.
zend_vm_opcodes.h generated successfully.
zend_vm_opcodes.c generated successfully.
zend_vm_execute.h generated successfully.
...
>php dist.php
opcache.enable_cli:1    opcache.jit:1255
  function:    9063.663059466
  callback:    9063.663059466
  phpGPX full: 9063.6630584286
  phpGPX mini: 9061.4300164687

🔴 file-notworking-patch Compile and don't fix issue with 'new' results

...
    # apply php-src_pr18136-2nw.patch
Checking patch Zend/asm/save_xmm_x86_64_ms_masm.asm...
Checking patch Zend/zend_vm_execute.skl...
Checking patch win32/build/config.w32...
Applied patch Zend/asm/save_xmm_x86_64_ms_masm.asm cleanly.
Applied patch Zend/zend_vm_execute.skl cleanly.
Applied patch win32/build/config.w32 cleanly.
zend_vm_opcodes.h generated successfully.
zend_vm_opcodes.c generated successfully.
zend_vm_execute.h generated successfully.
...
>php dist.php
opcache.enable_cli:1    opcache.jit:1255
  function:    9063.663059466
  callback:    9063.663059466
  phpGPX full: 1.2301194959665
  phpGPX mini: 1.2301194897453

@nielsdos
Copy link
Member

@nono303 Thanks a lot for testing. I finally figured out what was wrong in my other patch (notworking.patch). Turns out I needed to reserve shadow space. I'll try to make a PR soon and will ping you for testing it if you'd still like to do that.

@nielsdos nielsdos changed the title Bug: opcache JIT optimization >=2 [Windows cli] tracing JIT XMM clobbering on Windows Apr 18, 2025
nielsdos pushed a commit to nielsdos/php-src that referenced this issue Apr 18, 2025
On win64, xmm6-xmm15 are preserved registers, but the prologues and
epilogues of JITted code don't handle these. The issue occurs when
calling into the JIT code again via an internal handler
(like call_user_func). Therefore, we want to save/restore xmm registers
upon entering/leaving execute_ex. Since MSVC x64 does not support inline
assembly, we create an assembly wrapper around the real execute_ex
function.
The alternative is to always save/restore these xmm registers into the
fixed call frame, but this causes unnecessary overhead.
@nielsdos
Copy link
Member

nielsdos commented Apr 18, 2025

@nono303 When possible, could you please check out the patch from #18352 ? You will again need to regenerate the VM most likely. You'll also need to rerun buildconf+configure.

@nielsdos nielsdos changed the title tracing JIT XMM clobbering on Windows tracing JIT floating point register clobbering on Windows and ARM64 Apr 19, 2025
nielsdos pushed a commit to nielsdos/php-src that referenced this issue Apr 19, 2025
…ndows and ARM64

On win64, xmm6-xmm15 are preserved registers, but the prologues and
epilogues of JITted code don't handle these. The issue occurs when
calling into the JIT code again via an internal handler
(like call_user_func). Therefore, we want to save/restore xmm registers
upon entering/leaving execute_ex. Since MSVC x64 does not support inline
assembly, we create an assembly wrapper around the real execute_ex
function.
The alternative is to always save/restore these xmm registers into the
fixed call frame, but this causes unnecessary overhead.
The same issue occurs for ARM64 platforms for floating point register
8 to 15. However, there we can use inline asm to fix this.
nielsdos added a commit that referenced this issue Apr 21, 2025
* PHP-8.4:
  Fix GH-18136: tracing JIT floating point register clobbering on Windows and ARM64
@nono303
Copy link
Contributor Author

nono303 commented Apr 22, 2025

@nono303 When possible, could you please check out the patch from #18352 ? You will again need to regenerate the VM most likely. You'll also need to rerun buildconf+configure.

Hi @nielsdos,
(bank holiday weekend, less responsiveness)
https://github.com/php/php-src/pull/18352.patch sounds all good for me!

...
    # apply php-src_pr18352.patch
Checking patch Zend/asm/save_xmm_x86_64_ms_masm.asm...
Checking patch Zend/zend_vm_execute.h...
Hunk #1 succeeded at 55025 (offset -12 lines).
Checking patch Zend/zend_vm_execute.skl...
Checking patch ext/opcache/tests/jit/gh18136.phpt...
Checking patch win32/build/config.w32...
Applied patch Zend/asm/save_xmm_x86_64_ms_masm.asm cleanly.
Applied patch Zend/zend_vm_execute.h cleanly.
Applied patch Zend/zend_vm_execute.skl cleanly.
Applied patch ext/opcache/tests/jit/gh18136.phpt cleanly.
Applied patch win32/build/config.w32 cleanly.
zend_vm_opcodes.h generated successfully.
zend_vm_opcodes.c generated successfully.
zend_vm_execute.h generated successfully.
...
Build architecture: x64 
Visual C++:         14.44.35109.1
cscript /nologo /e:jscript win32\build\buildconf.js
Rebuilding configure.js
...
>php dist.php
opcache.enable_cli:1    opcache.jit:1255
  function:    9063.663059466
  callback:    9063.663059466
  phpGPX full: 9063.6630584286
  phpGPX mini: 9061.4300164687

@nielsdos
Copy link
Member

Thanks for checking!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants