Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry failed task placements before giving up #161

Open
jakepruitt opened this issue Aug 31, 2017 · 3 comments
Open

Retry failed task placements before giving up #161

jakepruitt opened this issue Aug 31, 2017 · 3 comments

Comments

@jakepruitt
Copy link

jakepruitt commented Aug 31, 2017

We should consider retrying on failed task placements a few seconds after the initial failure, just to confirm that the failed task placement is in fact a product of limited resources and not a result of sub-second race conditions in the scheduler.

cc/ @brendanmcfarland @rclark

@rclark
Copy link
Contributor

rclark commented Aug 31, 2017

Watchbot's SQS-based try and retry system kinda sorta does this already. Is there an advantage to making a failed placement a special case and not just letting the usual retry + backoff routines handle it?

@jakepruitt
Copy link
Author

@rclark I don't think we want failed task placements to wind things up in the dead letter queue. Failed task placements represent a structural limitation of the scheduler, and should be retried as close to the scheduler as possible (ideally inside of the scheduler, per chat with David Myers). These failures don't represent chronic failures of a particular payload, which is what the dead letter queue should be signaling.

@rclark
Copy link
Contributor

rclark commented Oct 3, 2017

The dead letter queue isn't supposed to represent chronically malformed or rejected payloads -- the idea is that SQS should never ever drop your job until it has been completed successfully. If the scheduler can't place a task for some number of attempts, then yeah -- there's some other limitation at play, but we definitely don't want the application to lose track of the work that it was supposed to get done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants