-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Atomic option locking #967
Conversation
Note that we still check is_locked() because, although the option lock implementation has been updated in such a way that we can rely on just calling set(), we do not know that that is the active implementation.
@barryhughes I may be doing something wrong, but the queue runner doesn't seem to be doing anything, either in trunk or on this branch. I've got |
Hmm 🤔 I'm not immediately sure, but I wonder if it could be a local issue of some kind (local loopback prohibited?) ... here's something I tried:
I'll share the creds I used for the sandbox site and, if you login, you will see 51 failed actions from my test (they failed because of course there is no callback function, but that's completely expected) and you can re-test there if you like (or do so in a fresh sandbox environment)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, got it working on JN. Must be something to do with the wp-env setup.
It looks good to me, but I'd be curious if @brandonpayton has any thoughts as well, since he opened the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great to see. Thank you very much for working on it, @barryhughes! 🙇
✅ The main thing is that the UPDATE depends upon the previous known value to avoid the race.
I left one comment because it seems like the changes could be simpler, but it's possible I'm missing something.
Thanks again!
$existing_lock_value = $this->get_or_create_lock( $lock_type ); | ||
$expiration_timestamp = $this->get_expiration_from( $existing_lock_value ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we could not just get expiration and, depending on the result, INSERT or UPDATE?
It seems unnecessary to potentially INSERT a lock option value in order to conditionally UPDATE against that value later on. Currently, this requires three queries:
- INSERT lock if none exists
- SELECT lock value
- UPDATE if old value is expected value
If we conditionally INSERT or UPDATE, it would require two queries:
- SELECT lock value if any
- INSERT or UPDATE depending on whether there is an existing and expired lock value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely ... a little unsure why I did things this way. In any case, mind taking a further peek (especially at 895e8d0)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely ... a little unsure why I did things this way.
I've had similar experiences :)
In any case, mind taking a further peek (especially at 895e8d0)?
Sure thing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested, but the latest INSERT/UPDATE logic looks good to me.
Having a race in a scheduler undermines its purpose for being. It's great it will be fixed. Thanks again for fixing it!
At least as far back as WP 5.2 (the oldest version we support), the option_name column is defined as unique: therefore we can rely on MySQL to avoid duplicates. Of course, if the schema has been modified and the unique constraint removed, that would be problematic -- but it seems reasonable for us to assume that isn't the case.
@brandonpayton thanks for the reviews so far ... my apologies but could I get one more check? Specifically on b215f72. Being mindful of recent conversations about the inability to rely on the I realize this might be problematic if that constraint has been removed, but I'm assuming that generally is not the case. Checking in, though, in case you have any thoughts on this—particularly in relation to the platform(s) you work on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @barryhughes, it's funny you ended up making that change because, in my previous review, I almost asked why we needed the WHERE clause given that MySQL would enforce option name uniqueness. But one thing the previous query did was to avoid the INSERT attempt resulting in a MySQL error, and I decided not to say anything because that seemed fine too. :)
Either way works for me.
IMO it is totally reasonable to rely upon options having unique names in the wp_options table. If someone removes that constraint, they have change the underlying and widely assumed truth of the WP DB, and WP would probably have bigger problems than broken action scheduler locking.
Thanks!
Yep, that came to mind here—but I think it's a reasonable trade-off. Thanks again for checking on this one 👍 |
Updates the Option Lock implementation to reduce the potential for race conditions.
Previously, and with reference to
ActionScheduler_QueueRunner::maybe_dispatch_async_request()
, we had this pattern:The problem is that another process may obtain the lock immediately after we enter the if {} block, but before we set the lock. As noted in the linked issue, this isn't just theoretical: some cases of this happening have been detected in the wild. This change attempts to address the problem in the following ways:
$lock->set()
, it will only successfully claim the lock and return true if another process did not get there first. Therefore, we now only try to dispatch the async queue runner request if calling$lock->set()
returns true.$lock->is_locked()
because, even though this should now be superfluous when the updated Option Lock implementation is in use, we don't actually know that that is the active implementation.Closes #793.
Testing instructions
Code review and some free-style testing (such as single-stepping through the locking process with XDebug) may be the best way to test and examine the changes to the locking system, but I will anyway outline a fairly manual method of testing here. These testing instructions do not try to simulate the sort of highly concurrent conditions that would lead to a race condition, they simply ensure we haven't obviously broken the way async queue running works.
First, we should disable any methods of triggering the queue runner except for the async queue runner:
define( 'DISABLE_WP_CRON', true );
to your wp-config.php file.You may also wish to manually generate some scheduled actions using this one-liner:
That should give us 50 scheduled actions that are 'past due'.
Before we go any further I want to highlight that, by default, the locking system is used in such a way that a new queue runner is spawned no more frequently than once every 60 seconds.
Visit Tools ▸ Scheduled Actions ... you should see at least 50 actions are past-due (there may be more, depending on how many were already present in your test site ... and, if you do not see any, it's possible a queue runner just finished its work and so you may need to re-run the above snippet a second time). Also, there should be an indication as to when the next queue runner will start:
This should be enough to assure us that:
Changelog