Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement acyclic object tracking #17130

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

iluuu1994
Copy link
Member

Acyclic objects don't need to be added to the GC. We know an object is acyclic if all properties are typed as acyclic, and the object does not have any dynamic properties. This is possible to track at runtime with minimal overhead.

Fixes GH-17127

Acyclic objects don't need to be added to the GC. We know an object is acyclic
if all properties are typed as acyclic, and the object does not have any dynamic
properties. This is possible to track at runtime with minimal overhead.

Fixes phpGH-17127
@iluuu1994 iluuu1994 force-pushed the acyclic-object-tracking branch from 72c581b to f6e49bb Compare December 12, 2024 18:12
@dktapps
Copy link
Contributor

dktapps commented Dec 12, 2024

I like the idea, but a few things to mention:

  • I think it'd probably be better to consider internal classes as cyclic by default. Having them opt-in to being maybe cyclic seems like it would cause leaks from third party extensions which miss this change. Not sure if there's some other way this could be inferred, like custom get_properties or get_gc handlers. Haven't thought about this too much.
  • UnitEnum and BackedEnum can likely be considered acyclic. I didn't see these touched in the diff, but I verified on my own that these are also affected by this funny GC business even though they can't be destroyed under any conditions. (https://3v4l.org/OgFFQ)

@iluuu1994
Copy link
Member Author

I think it'd probably be better to consider internal classes as cyclic by default.

Yeah, I had the same thought. Could be considered still.

Not sure if there's some other way this could be inferred, like custom get_properties or get_gc handlers.

Unfortunately, object handlers are set directly by extensions after calling register_class_xyz, which makes this not feasible.

UnitEnum and BackedEnum can likely be considered acyclic.

They are with this PR, given that all properties are acyclic.

enum E {
    case C;
}

var_dump(new ReflectionEnum(E::class)->mayBeCyclic());

bool(false)

@dktapps
Copy link
Contributor

dktapps commented Dec 12, 2024

@iluuu1994 Out of curiosity (not related to the PR): I noticed you didn't touch the GC at all, which suggests to me that the functionality to respect this flag is already present.

If I were to, say, make an extension which allows setting GC_NOT_COLLECTABLE on certain objects, would that work within currently supported versions of PHP? I'm looking for a way to get around the performance impacts described in #17131.

@iluuu1994
Copy link
Member Author

@dktapps Yes, you are correct. We may also think about an attribute where users can explicitly opt classes into #[Acyclic] to prevent them from being added to the GC for performance-critical code. E.g. Composer. The recent idea for modules (grouped compilation of related files) might also help to detect acyclic object graphs across files.

@dktapps
Copy link
Contributor

dktapps commented Dec 12, 2024

Removing acyclic objects from the root buffer turned out about 20 ns per object in my testing. But the impact of complex long-lived objects landing in the GC root buffer was orders of magnitude larger.

I think we'd probably get a lot more mileage out of #17131, although ignoring acyclic objects will definitely help too.

@iluuu1994
Copy link
Member Author

There may be some more benefits to get out of this flag. E.g. we may skip over acyclic objects in the GC itself, which I currently have not done. So, even if some root object is added to the GC, it may at least skip over the references to acyclic objects it holds.

Note that our GC implicitly implements something similar as generations. After every GC run, the buffer is empty. Objects are only added to the buffer when they are released, i.e. they are stored in some new variable and then unset. This effectively does the same thing, the object is only considered for cycle collection when it has been worked with.

@dktapps
Copy link
Contributor

dktapps commented Dec 12, 2024

Note that our GC implicitly implements something similar as generations. After every GC run, the buffer is empty. Objects are only added to the buffer when they are released, i.e. they are stored in some new variable and then unset. This effectively does the same thing, the object is only considered for cycle collection when it has been worked with.

An object that survived collection will still end up in the root buffer over and over again & waste CPU time if it's frequently ref'd & unref'd (basically, objects are always treated as if they're generation 0). I've found this to cause major performance impacts in my own code.

@bwoebi
Copy link
Member

bwoebi commented Dec 13, 2024

Do I understand it right that this cyclic-check is pretty trivial and avoids using knowledge of already declared types or such (i.e. if the type specified by the typed property is known to not be cyclic, the object will also not be cyclic)?
Doing that would require actually prohibiting dynamic properties on objects first, right? And then only be useful with final classes.
Thus will only match objects containing exclusively scalar types.

@bwoebi
Copy link
Member

bwoebi commented Dec 13, 2024

On a more technical note - can we simply assume acyclic for internal classes if their declared properties are all scalar and it does not have a get_gc handler? Rather than explicitly specifying cyclic (which, given the opt-out nature, is prone to be missed).

@iluuu1994
Copy link
Member Author

iluuu1994 commented Dec 13, 2024

@bwoebi

Do I understand it right that this cyclic-check is pretty trivial and avoids using knowledge of already declared types or such (i.e. if the type specified by the typed property is known to not be cyclic, the object will also not be cyclic)?

Yes, this is pretty much it. Edit: I just re-read your sentence and missed the word "avoids", now I'm lost what you mean. :)

Doing that would require actually prohibiting dynamic properties on objects first, right?

No. This PR adds the GC_NOT_COLLECTABLE flag to the object on instantiation if the class is inferred to be acyclic. Once the property table is computed, the flag is removed again. This can likely be improved by moving the removal of the flag to a more appropriate place, given that they can be computed for reasons other than dynamic properties.

And then only be useful with final classes.

Why would final classes matter? This is operating on specific instances of one class, if this class is inferred to be acyclic, the only way for it not to be is to add dynamic properties, regardless of any potential subclasses.

On a more technical note - can we simply assume acyclic for internal classes if their declared properties are all scalar and it does not have a get_gc handler?

As mentioned above, this was my first thought. However, since object handlers are set by extensions after calling register_class_xyz, there is no obvious hook for this. We could loop all internal classes after MINIT I suppose. We may also miss some internal classes that actually acyclic even though they set get_gc, e.g. ext-date and ext-ffi.

If get_gc is customized, assume cyclic classes for now.
Don't rely on the cycle collector for objects released in destructor.
@dktapps
Copy link
Contributor

dktapps commented Dec 13, 2024

There may be some more benefits to get out of this flag. E.g. we may skip over acyclic objects in the GC itself, which I currently have not done. So, even if some root object is added to the GC, it may at least skip over the references to acyclic objects it holds.

Yep. Without having GC pay attention to this flag, this change mostly only helps ephemeral objects in local variables and tmpvars.

I made an extension to allow me to explicitly mark objects as not collectable (to try and workaround both #17131 and #17127), but it was mostly unsuccessful because indirectly referenced object still seem to get their dependencies scanned by the GC whether they are marked as GC_NOT_COLLECTABLE or not.

I think the GC really needs to be modified to account for this, otherwise this improvement will be largely ineffective.

Copy link
Member

@arnaud-lb arnaud-lb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this. This is probably the most beneficial optimization we could implement on the current GC.

@@ -70,6 +70,8 @@ ZEND_API HashTable *rebuild_object_properties_internal(zend_object *zobj) /* {{{
zend_class_entry *ce = zobj->ce;
int i;

GC_TYPE_INFO(zobj) &= ~(GC_NOT_COLLECTABLE << GC_FLAGS_SHIFT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would break badly if this changed during GC, if the GC used this to decide when to traverse nodes. I suggest to add an assertion here to check that GC is not running.

Comment on lines +2440 to +2441
/* Mark classes with custom get_gc handler as potentially cyclic, even if
* their properties don't indicate so. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also mark classes with custom get_properties handler as potentially cyclic, as this is called by zend_std_get_gc.

Otherwise this looks right to me, as objects with std get_gc and get_properties can not expose anything other than standard properties to the GC.

Lazy objects need to take care of updating GC_NOT_COLLECTABLE: Remove GC_NOT_COLLECTABLE in zend_object_make_lazy() when the initializer is cyclic or may have a ref to the object itself, and add it again in zend_lazy_object_init().

Comment on lines +708 to +709
/* FIXME: Potentially infer ZEND_ACC_MAY_BE_CYCLIC during construction of
* closure? static closures not binding by references can't be cyclic. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 this looks worth it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid my comment was incorrect.

class Test {
    public \Closure $c;
}

$test = new Test();
$c = static function () use ($test) {}; // acyclic
$test->c = $c; // now it's cyclic
unset($test);
gc_collect_cycles();
unset($c); // leaks

This is only ok if $test itself is acyclic. As mentioned previously, this check would currently be unsound because the object may become cyclic at a later point in time by adding dynamic properties. However, once we're on PHP 9, it should be ok!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth mentioning: readonly classes don't permit dynamic properties. So readonly classes could benefit from this optimization immediately without waiting for PHP 9.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, your test case doesn't show any dynamic property usage @iluuu1994, wouldn't that still be an issue in non-readonly cases even in PHP 9.0?

Copy link
Member Author

@iluuu1994 iluuu1994 Dec 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth mentioning: readonly classes don't permit dynamic properties. So readonly classes could benefit from this optimization immediately without waiting for PHP 9.0.

I was referring to:

This is only ok if $test itself is acyclic. As mentioned previously, this check would currently be unsound

I.e. it's safe to assume Closure is acyclic only if it captures values that aren't themselves acyclic. If Test may point to cyclic values, as Closure itself, then it won't be marked as acyclic. But today, even if Test were inferrably acyclic, this is not guaranteed to remain this way due to dynamic properties. Hence, we need to be more careful about relations between objects.

Actually, your test case doesn't show any dynamic property usage @iluuu1994, wouldn't that still be an issue in non-readonly cases even in PHP 9.0?

Yes, but it's not a priority since it's a rare case and unlikely to make a big difference.

@@ -432,6 +432,8 @@ public function getNamespaceName(): string {}
public function getShortName(): string {}

public function getAttributes(?string $name = null, int $flags = 0): array {}

public function mayBeCyclic(): bool {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for exposing this. This may be beneficial for understanding the GC behaviour as PHP user.

@bwoebi
Copy link
Member

bwoebi commented Dec 15, 2024

Do I understand it right that this cyclic-check is pretty trivial and avoids using knowledge of already declared types or such (i.e. if the type specified by the typed property is known to not be cyclic, the object will also not be cyclic)?

Yes, this is pretty much it. Edit: I just re-read your sentence and missed the word "avoids", now I'm lost what you mean. :)

I mean that you're not actually looking at the type of referenced properties.
Like final class Acyc { /* is acyclic */ } class Foo { public Acyc $prop; } - Foo is going to be marked cyclic (due to the check only looking "is scalar type or not"), even if it cannot be as Acyc is acyclic.

@iluuu1994
Copy link
Member Author

Ah ok. Yes, atm we're not looking at class types, only scalar types (excluding arrays) are considered acyclic. This would be possible, but only in PHP 9, because then dynamic properties cannot be added to objects at runtime anymore, which can implicitly make some other object it references a root. It's also going to be ineffective without cross-file analysis.

@nielsdos
Copy link
Member

but only in PHP 9, because then dynamic properties cannot be added to objects at runtime anymore

I see this statement from time to time, but IIUC it will still be allowed with AllowDynamicProperties right?

@iluuu1994
Copy link
Member Author

I see this statement from time to time, but IIUC it will still be allowed with AllowDynamicProperties right?

Yes, but if we know about the class at comp-time, we know whether this attribute is present too.

* Actually skip non-cyclic objects in GC.
* Add MAY_BE_CYCLIC to more internal classes
* Fix dynamic property creation trigger

And also breaking phpGH-10932 for now. :) I will try to fix this later.
iluuu1994 added a commit that referenced this pull request Dec 18, 2024
Adding strings to the worklist is useless, because they never contribute to
cycles. The assembly size on x86_64 does not change. This significantly improves
performance in this synthetic benchmark by 33%.

    function test($a) {}

    $a = new stdClass();
    $a->self = $a;
    $a->prop1 = str_repeat('a', 10);
    $a->prop2 = str_repeat('a', 10);
    $a->prop3 = str_repeat('a', 10);
    $a->prop4 = str_repeat('a', 10);
    $a->prop5 = str_repeat('a', 10);
    $a->prop6 = str_repeat('a', 10);
    $a->prop7 = str_repeat('a', 10);
    $a->prop8 = str_repeat('a', 10);
    $a->prop9 = str_repeat('a', 10);
    $a->prop10 = str_repeat('a', 10);

    for ($i = 0; $i < 10_000_000; $i++) {
        test($a);
        gc_collect_cycles();
    }

This requires adding IS_TYPE_COLLECTABLE to IS_REFERENCE_EX to ensure these
values continue to be pushed onto the stack. Luckily, IS_TYPE_COLLECTABLE is
currently only used in gc_check_possible_root(), where the checked value cannot
be a reference.

Note that this changes the output of gc_collect_cycles(). Non-cyclic, refcounted
values no longer count towards the total reported values collected.

Also, there is some obvious overlap with GH-17130. This change should be good
nonetheless, especially if we can remove the GC_COLLECTABLE(Z_COUNTED_P(zv))
condition in PHP 9 and rely on Z_COLLECTABLE_P() exclusively, given we can
assume an object doesn't become cyclic at runtime anymore.

Closes GH-17194
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Objects which can't possibly be GC roots take up slots in GC root buffer
5 participants