-
Notifications
You must be signed in to change notification settings - Fork 7.8k
zend_execute: Streamline typechecks in zend_check_type_slow()
if an object is given
#18277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…re` to a `callable` parameter With `Closure`s existing since 5.3 and first class callables since 8.1, nowadays the most common type of callable likely is a `Closure`. Type checking a `Closure` argument for a `callable` parameter however does quite a bit of work, when a simple CE check would suffice. We thus do this. For: <?php function closure_(\Closure $f) { $f('abc'); } function callable_(callable $f) { $f('abc'); } $c = strrev(...); for ($i = 0; $i < 10000000; $i++) { closure_($c); } with the call in the loop appropriately replaced and running on a Intel(R) Core(TM) i7-1365U, I get the following timings: Benchmark 1: /tmp/bench/before callable.php Time (mean ± σ): 368.9 ms ± 5.2 ms [User: 365.0 ms, System: 3.6 ms] Range (min … max): 359.8 ms … 374.5 ms 10 runs Benchmark 2: /tmp/bench/before closure.php Time (mean ± σ): 283.3 ms ± 6.0 ms [User: 279.6 ms, System: 3.4 ms] Range (min … max): 274.1 ms … 293.2 ms 10 runs Benchmark 3: /tmp/bench/after callable.php Time (mean ± σ): 279.9 ms ± 10.1 ms [User: 276.3 ms, System: 3.4 ms] Range (min … max): 269.6 ms … 301.5 ms 10 runs Benchmark 4: /tmp/bench/after closure.php Time (mean ± σ): 283.4 ms ± 2.3 ms [User: 279.5 ms, System: 3.6 ms] Range (min … max): 279.7 ms … 286.6 ms 10 runs Summary /tmp/bench/after callable.php ran 1.01 ± 0.04 times faster than /tmp/bench/before closure.php 1.01 ± 0.04 times faster than /tmp/bench/after closure.php 1.32 ± 0.05 times faster than /tmp/bench/before callable.php The “standard” `array_find()` micro-benchmark of: <?php $array = range(1, 10000); $result = 0; for ($i = 0; $i < 5000; $i++) { $result += array_find($array, static function ($item) { return $item === 5000; }); } var_dump($result); Results in: $ hyperfine -L version before,after '/tmp/bench/{version} native.php' Benchmark 1: /tmp/bench/before native.php Time (mean ± σ): 637.3 ms ± 6.4 ms [User: 632.4 ms, System: 3.9 ms] Range (min … max): 626.8 ms … 644.9 ms 10 runs Benchmark 2: /tmp/bench/after native.php Time (mean ± σ): 572.0 ms ± 3.8 ms [User: 567.8 ms, System: 3.8 ms] Range (min … max): 567.0 ms … 580.1 ms 10 runs Summary /tmp/bench/after native.php ran 1.11 ± 0.01 times faster than /tmp/bench/before native.php see php#18157 (comment)
Zend/zend_execute.c
Outdated
(type_mask & MAY_BE_CALLABLE) | ||
&& ( | ||
/* Fast check for closures. */ | ||
EXPECTED(Z_OBJCE_P(arg) == zend_ce_closure) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think arg
is guaranteed to be an object at this point. Maybe this can be combined with the Z_TYPE_P(arg) == IS_OBJECT)
above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you can combine it with the one above as a type declaration that only contains built-in types won't be complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to something like:
if (Z_TYPE_P(arg) == IS_OBJECT) {
if ((type_mask & MAY_BE_CALLABLE) && EXPECTED(Z_OBJCE_P(arg) == zend_ce_closure)) {
return true;
} else if (ZEND_TYPE_IS_COMPLEX(*type)) {
...
}
}
Not sure if that's better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is reasonable to have if(EXPECTED(Z_TYPE_P(arg) == IS_OBJECT) {}
first, as to get to this point you need to cross if (EXPECTED(ZEND_TYPE_CONTAINS_CODE(*type, Z_TYPE_P(arg)))) {
from zend_check_type()
.
Meaning that we are either dealing with an object, a callable, or a scalar type that needs to be coerced which depends on strict_type.
The nice benefit of this approach is that it would mean that the call to zend_value_instanceof_static()
could be moved into that new if block and remove the check for if (Z_TYPE_P(zv) != IS_OBJECT) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. The may also still check for ZEND_TYPE_IS_COMPLEX(*type)
first (before type_mask & MAY_BE_CALLABLE
) to avoid any degradation of complex types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted according to Gina's suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is static really more common than callable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is static really more common than callable?
I'm seeing static
used quite a bit for return types. The closure
check is probably cheaper, though. No strong opinion here. Can reorder if desired (ideally after someone else verified the numbers, because I don't really trust the results on my CPU).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same. My benchmarks are usually all over the place... I have yet to find a setting for me that works well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can have the static type check after the quick Closure check, especially as we could now transform static
to the class name for final classes.
… object is given For the callable.php vs closure.php benchmark: $ taskset -c 0 hyperfine -L file callable.php,closure.php -L version before,after '/tmp/bench/{version} {file}' Benchmark 1: /tmp/bench/before callable.php Time (mean ± σ): 351.1 ms ± 2.8 ms [User: 348.3 ms, System: 2.0 ms] Range (min … max): 349.3 ms … 359.0 ms 10 runs Benchmark 2: /tmp/bench/before closure.php Time (mean ± σ): 274.6 ms ± 2.4 ms [User: 270.9 ms, System: 2.9 ms] Range (min … max): 273.3 ms … 281.5 ms 10 runs Benchmark 3: /tmp/bench/after callable.php Time (mean ± σ): 272.4 ms ± 0.5 ms [User: 270.3 ms, System: 2.0 ms] Range (min … max): 271.6 ms … 273.3 ms 10 runs Benchmark 4: /tmp/bench/after closure.php Time (mean ± σ): 277.4 ms ± 2.2 ms [User: 274.3 ms, System: 2.4 ms] Range (min … max): 275.7 ms … 283.3 ms 10 runs Summary /tmp/bench/after callable.php ran 1.01 ± 0.01 times faster than /tmp/bench/before closure.php 1.02 ± 0.01 times faster than /tmp/bench/after closure.php 1.29 ± 0.01 times faster than /tmp/bench/before callable.php For the array_find benchmark: Benchmark 1: /tmp/bench/before native.php Time (mean ± σ): 627.6 ms ± 7.1 ms [User: 622.5 ms, System: 2.8 ms] Range (min … max): 622.1 ms … 641.4 ms 10 runs Benchmark 2: /tmp/bench/after native.php Time (mean ± σ): 598.0 ms ± 5.5 ms [User: 594.4 ms, System: 2.7 ms] Range (min … max): 589.9 ms … 604.9 ms 10 runs Summary /tmp/bench/after native.php ran 1.05 ± 0.02 times faster than /tmp/bench/before native.php For a test with scalar types: <?php function func(string $f): string { return strrev($f); } for ($i = 0; $i < 10000000; $i++) { func('abc'); } the timings are: Benchmark 1: /tmp/bench/before scalar.php Time (mean ± σ): 212.5 ms ± 1.7 ms [User: 209.8 ms, System: 2.2 ms] Range (min … max): 211.4 ms … 218.1 ms 14 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark 2: /tmp/bench/after scalar.php Time (mean ± σ): 200.9 ms ± 2.0 ms [User: 198.0 ms, System: 2.4 ms] Range (min … max): 199.7 ms … 207.2 ms 14 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Summary /tmp/bench/after scalar.php ran 1.06 ± 0.01 times faster than /tmp/bench/before scalar.php And a union type using only scalars: <?php function func(string|int $f): string { return strrev($f); } for ($i = 0; $i < 10000000; $i++) { func('abc'); } results in: Benchmark 1: /tmp/bench/before union.php Time (mean ± σ): 212.8 ms ± 1.8 ms [User: 210.0 ms, System: 2.2 ms] Range (min … max): 212.0 ms … 219.0 ms 14 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark 2: /tmp/bench/after union.php Time (mean ± σ): 200.1 ms ± 0.4 ms [User: 197.7 ms, System: 2.3 ms] Range (min … max): 199.4 ms … 200.8 ms 14 runs Summary /tmp/bench/after union.php ran 1.06 ± 0.01 times faster than /tmp/bench/before union.php Union types with objects are the only thing tested that were more or less the same: <?php class A { public function get() { return 'abc'; } } class B { } function func(A|B $f): string { return strrev($f->get()); } for ($i = 0; $i < 10000000; $i++) { func(new A()); } results in: Benchmark 1: /tmp/bench/before union_obj.php Time (mean ± σ): 700.6 ms ± 10.4 ms [User: 696.5 ms, System: 3.9 ms] Range (min … max): 688.6 ms … 717.8 ms 10 runs Benchmark 2: /tmp/bench/after union_obj.php Time (mean ± σ): 706.7 ms ± 15.0 ms [User: 702.5 ms, System: 3.8 ms] Range (min … max): 688.9 ms … 725.8 ms 10 runs Summary /tmp/bench/before union_obj.php ran 1.01 ± 0.03 times faster than /tmp/bench/after union_obj.php
Closure
to a callable
parameterzend_check_type_slow()
if an object is given
Quoting from the follow-up commit: For the callable.php vs closure.php benchmark:
For the array_find benchmark:
For a test with scalar types:
the timings are:
And a union type using only scalars:
results in:
Union types with objects are the only thing tested that were more or less the
results in:
I don't really trust my CPU to provide reliable benchmark results, though. So if someone wants to test this themselves, please do. I've created a Gist with the PHP files at: https://gist.github.com/TimWolla/d8ffbe9b9668d4d05805982ae591dcba |
With the exception of callable.php, which was significantly faster on the new version, most other benchmark seem to have slightly regressed. This result is consistent, I tried each benchmark multiple times. |
A run on an i7-1185G7:
So more or less status quo, and a slight degradation for union.php. diff --git a/Zend/zend_API.c b/Zend/zend_API.c
index e0006e7d727..c8a506d8366 100644
--- a/Zend/zend_API.c
+++ b/Zend/zend_API.c
@@ -4293,6 +4293,10 @@ ZEND_API bool zend_is_callable_at_frame(
ZEND_API bool zend_is_callable_ex(zval *callable, zend_object *object, uint32_t check_flags, zend_string **callable_name, zend_fcall_info_cache *fcc, char **error) /* {{{ */
{
+ if (Z_TYPE_P(callable) == IS_OBJECT && Z_OBJCE_P(callable) == zend_ce_closure) {
+ return true;
+ }
+
/* Determine callability at the first parent user frame. */
zend_execute_data *frame = EG(current_execute_data);
while (frame && (!frame->func || !ZEND_USER_CODE(frame->func->type))) {
This only affects the callable bench and has no influence on the others. Sure the win isn't as great for type checking as this PR had, but this may be a pragmatic solution?
|
I can confirm that Niels' patch significantly improves performance for
However, that might not mean much for my CPU because it also significantly improves
So, take that with a grain of salt. I hate my CPU. 😋 |
#justCodeLayoutThings
But as long as the others don't slow down I take it as a win 😎 EDIT: ofc my patch is not "complete" as we need to fetch the callable name in some cases too, but not in the type check case. But if this seems like a good plan to Tim as well, I can make it a formal patch with a PR. |
Yes, the callable name was the reason I did not touch that function, but
No strong opinion here, please feel free to propose something, given that I can't reliably benchmark this with the hardware I have available. Alternatively for 8e216a2 (the initial commit in this PR), the |
The path from zend_check_type_slow doesn't request a callable name anyway so that doesn't matter. |
All sorts of complicated things going on here, function seems very sensitive to changes. The best improvement for My suggestion only influences the A "in the middle" solution may be to declare a helper function in the same file as diff --git a/Zend/zend_execute.c b/Zend/zend_execute.c
index 89fbcf2cbd7..58be57596e3 100644
--- a/Zend/zend_execute.c
+++ b/Zend/zend_execute.c
@@ -1140,6 +1140,14 @@ static bool zend_check_intersection_type_from_list(
return true;
}
+static zend_never_inline bool zend_is_callable_idk_what_to_call_this(zval *callable, uint32_t check_flags)
+{
+ if (EXPECTED(Z_TYPE_P(callable) == IS_OBJECT && Z_OBJCE_P(callable) == zend_ce_closure)) {
+ return true;
+ }
+ return zend_is_callable_ex(callable, NULL, check_flags, NULL, NULL, NULL);
+}
+
static zend_always_inline bool zend_check_type_slow(
const zend_type *type, zval *arg, const zend_reference *ref,
bool is_return_type, bool is_internal)
@@ -1178,7 +1186,7 @@ static zend_always_inline bool zend_check_type_slow(
const uint32_t type_mask = ZEND_TYPE_FULL_MASK(*type);
if ((type_mask & MAY_BE_CALLABLE) &&
- zend_is_callable(arg, is_internal ? IS_CALLABLE_SUPPRESS_DEPRECATIONS : 0, NULL)) {
+ zend_is_callable_idk_what_to_call_this(arg, is_internal ? IS_CALLABLE_SUPPRESS_DEPRECATIONS : 0)) {
return 1;
}
if ((type_mask & MAY_BE_STATIC) && zend_value_instanceof_static(arg)) {
This yields a 25% improvement on |
With
Closure
s existing since 5.3 and first class callables since 8.1, nowadays the most common type of callable likely is aClosure
. Type checking aClosure
argument for acallable
parameter however does quite a bit of work, when a simple CE check would suffice. We thus do this.For:
with the call in the loop appropriately replaced and running on a Intel(R) Core(TM) i7-1365U, I get the following timings:
The “standard”
array_find()
micro-benchmark of:Results in:
see #18157 (comment)