Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add failed jobs support #103

Merged
merged 2 commits into from
Nov 4, 2022
Merged

Conversation

sheldonreiff
Copy link
Contributor

This adds support for storing and manually requeuing failed jobs (defined as jobs that have exceeded maxAttempts). This would lay some groundwork for #11.

This is a somewhat significant set of changes so I'm very open to any feedback.

Many thanks to @amayer5125 for his suggestions and review.

@sheldonreiff sheldonreiff force-pushed the failed-jobs branch 7 times, most recently from cef5041 to dbe3b8a Compare October 19, 2022 22:35
Copy link
Member

@markstory markstory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work 👏 Thank you for putting this together. I left a few comments on naming conventions and documentation improvements.

*/
public function change()
{
$table = $this->table('failed_jobs');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$table = $this->table('failed_jobs');
$table = $this->table('queue_failed_jobs');

Namespacing the table seems prudent when you want to use a name this generic.

@@ -55,13 +55,23 @@ The following configuration should be present in the config array of your **conf

// The amount of time in milliseconds to sleep if no jobs are currently available. default: 10000
'receiveTimeout' => 10000,

// Whether to store failed jobs in the failed_jobs table. default: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Whether to store failed jobs in the failed_jobs table. default: false
// Whether to store failed jobs in the queue_failed_jobs table. default: false

]
],
// ...

The ``Queue`` config key can contain one or more queue configurations. Each of
these is used for interacting with a different queuing backend.

If ``storeFailedJobs`` is set to ``true``, make sure to run the plugin migrations to create the ``failed_jobs`` table.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If ``storeFailedJobs`` is set to ``true``, make sure to run the plugin migrations to create the ``failed_jobs`` table.
If ``storeFailedJobs`` is set to ``true``, make sure to run the plugin migrations to create the ``queue_failed_jobs`` table.

``maxAttempts`` is configured on the job class or via a command line argument, a
job will be considered failed if an exception is thrown while processing the job
and there are no remaining attempts. The job will then be added to the
``failed_jobs`` table and can be requeued manually.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``failed_jobs`` table and can be requeued manually.
``queue_failed_jobs`` table and can be requeued manually.

@@ -266,6 +276,53 @@ This shell can take a few different options:
- Max Runtime
- Runtime: Time since the worker started, the worker will finish when Runtime is over Max Runtime value

Failed Jobs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call this a 'dead letter queue' as that is another term that is used for this behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if that will get confusing because some queue backends implement their own dead letter queues (for example, sqs, rabbitt mq). And it seems in those examples a dead letter queue is defined more broadly where here only jobs that are requeued too many times will be failed/dead lettered. Also, when a job is failed, it's not going to be queued on the original queue infrastructure, but a database table instead. I think this does fall within the concept of a dead letter queue though. Let me know what you think.

Copy link
Member

@markstory markstory Oct 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree that the terminology would overlap with native message brokers. Using 'failed jobs' is a good term for the abstraction we're trying to provide. I think we should mention the similarities with dead letter queues when we explain the behavior.

Let's keep 'failed jobs' until we hear feedback otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept it as failed jobs naming and added some clarification about the similarities/differences in the documentation.


if ($args->hasOption('class')) {
$jobsToDelete->where([
$failedJobsTable->aliasField('class') => $args->getOption('class'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need to manually alias all of these fields. Using where(['class' => $args->getOption('class')]) should be enough.

return;
}

if (!$args->getOption('assume-yes')) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually call these kinds of parameters --force

$parser->addOption('config', [
'help' => 'Requeue jobs by the config used to queue the job.',
]);
$parser->addOption('assume-yes', [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be force as well.


if ($args->hasOption('config')) {
$jobsToRequeue->where([
$failedJobsTable->aliasField('config') => $args->getOption('config'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same feedback here about not needing to use aliasField().

]
);

$failedJobsTable->deleteOrFail($failedJob);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll only be able to do at-least once delivery with this. We should mention that in the usage docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean because if deleteOrFail fails, the job would remain in the failed_jobs table and could be requeued, right? I can note that in the docs. I suppose the only solution to that would be to remove the job from the database, queue it, then insert the record again if there was an exception when queueing. But then a failed job could be "dropped" if it could not be inserted again, which seems worse.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is the scenario. I agree that the solution is overly complicated and that we can document the delivery guarantees we currently offer. If it is a problem we can always add complexity and improve the delivery guarantees in the future.

]
);

$failedJobsTable->deleteOrFail($failedJob);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is the scenario. I agree that the solution is overly complicated and that we can document the delivery guarantees we currently offer. If it is a problem we can always add complexity and improve the delivery guarantees in the future.

['exception' => $exception, 'logger' => $context->getLogger()]
);

EventManager::instance()->dispatch($event);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this extension use EventDispatcherTrait instead of firing the event on the global event manager directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can actually. The listener can be attached when instantiating the extension in WorkerCommand. That's the only place it's instantiated so there isn't really any need for a global event. I'll make that change.

@sheldonreiff sheldonreiff force-pushed the failed-jobs branch 2 times, most recently from c4c8360 to fe82f67 Compare November 1, 2022 21:55
composer.json Outdated
@@ -20,6 +20,7 @@
"require": {
"php": ">=7.2.0",
"cakephp/cakephp": "^4.1",
"cakephp/migrations": "^3.1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this an optional dependency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it optional and added a note in the docs to composer require it if using failed jobs.

phpstan v1.9.1 returns an error with message "Dead catch - Cake\ORM\Exception\PersistenceFailedException is never thrown in the try block.". this error was not returned on v1.8.11.
@sheldonreiff
Copy link
Contributor Author

The second commit I added was to address phpstan failing. I was getting a phpstan error Dead catch - Cake\ORM\Exception\PersistenceFailedException is never thrown in the try block.. It seems version 1.9.1 introduced a change that's causing this (previous builds were using 1.8.11). I don't see anything wrong with the phpdocs.

@markstory markstory merged commit 906bc11 into cakephp:master Nov 4, 2022
@markstory
Copy link
Member

Thank you 🎉

@sheldonreiff sheldonreiff deleted the failed-jobs branch November 4, 2022 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants