Retry failed task placements before giving up #161

jakepruitt · 2017-08-31T20:48:25Z

We should consider retrying on failed task placements a few seconds after the initial failure, just to confirm that the failed task placement is in fact a product of limited resources and not a result of sub-second race conditions in the scheduler.

cc/ @brendanmcfarland @rclark

rclark · 2017-08-31T23:34:10Z

Watchbot's SQS-based try and retry system kinda sorta does this already. Is there an advantage to making a failed placement a special case and not just letting the usual retry + backoff routines handle it?

jakepruitt · 2017-10-03T21:37:57Z

@rclark I don't think we want failed task placements to wind things up in the dead letter queue. Failed task placements represent a structural limitation of the scheduler, and should be retried as close to the scheduler as possible (ideally inside of the scheduler, per chat with David Myers). These failures don't represent chronic failures of a particular payload, which is what the dead letter queue should be signaling.

rclark · 2017-10-03T21:44:17Z

The dead letter queue isn't supposed to represent chronically malformed or rejected payloads -- the idea is that SQS should never ever drop your job until it has been completed successfully. If the scheduler can't place a task for some number of attempts, then yeah -- there's some other limitation at play, but we definitely don't want the application to lose track of the work that it was supposed to get done.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry failed task placements before giving up #161

Retry failed task placements before giving up #161

jakepruitt commented Aug 31, 2017 •

edited

Loading

rclark commented Aug 31, 2017

jakepruitt commented Oct 3, 2017

rclark commented Oct 3, 2017

Retry failed task placements before giving up #161

Retry failed task placements before giving up #161

Comments

jakepruitt commented Aug 31, 2017 • edited Loading

rclark commented Aug 31, 2017

jakepruitt commented Oct 3, 2017

rclark commented Oct 3, 2017

jakepruitt commented Aug 31, 2017 •

edited

Loading