Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add exponential back-off with jitter in SQS Batch Processor in case of temporary errors #22

Open
aagrawal207 opened this issue Jun 19, 2021 · 3 comments
Labels
Batch Batch processing utility enhancement New feature or request Java Proposed Community submited Python

Comments

@aagrawal207
Copy link

Is your feature request related to a problem? Please describe.
We use SQS Batch processing for Powertools and some message fails, it stays in the queue until it is received enough times that it moves to DLQ instead. The gap between processing of this failed message is equal to or greater than the SQS's set visibilityTimeout. We can override this visibilityTimeout at failure events for individual messages during exception handling.

Describe the solution you'd like
I would like that SQS Batch Processor library provides this functionality by default so that we don't need to write this ourselves. This also ties nicely with another feature request of mine: #21.

Describe alternatives you've considered
Writing my own utility code using the references I found on the internet like this: https://ivan-site.com/2018/06/exponential-backoff-in-sqs/.

Additional context
This is a pretty common usecase.

@pankajagrawal16 pankajagrawal16 transferred this issue from aws-powertools/powertools-lambda-java Jul 6, 2021
@pankajagrawal16 pankajagrawal16 added Batch Batch processing utility enhancement New feature or request Java Proposed Community submited Python labels Jul 6, 2021
@pankajagrawal16
Copy link

pankajagrawal16 commented Jul 6, 2021

Hi @as2d3 Thanks for opening the issue. It does sounds like an intresting use case to support in batch processing utility.

@heitorlessa Thoughts?

@heitorlessa
Copy link
Contributor

heitorlessa commented Jul 6, 2021

Do I understand correctly that you'll want to set a different wait time per individual message that failed? e.g. set jitter on different failed messages

Then differentiate between temporal vs permanent failures to do this differently?

When you say "it's a pretty common use case", could you expand on that with examples?

At first, I see why you'd want this but I'd wait for more customers to ask this. As we don't fully control the Poller, messages could be delivered more than once, SQS could throttle depending on how many messages are being set with a different visibility, I'd have more questions in this approach once I have a clearer head.

I'd also want us to have a chat with the Lambda team - This is the responsibility of the SQS Consumer (Lambda Poller). Like Kinesis, they could accept additional context in the response that could technically do what this utility does too.

@pankajagrawal16 lets find some time to sync next week over this and have a word with the Lambda team.

Thanks a lot for raising it!!!

@MartinMitro
Copy link

MartinMitro commented Apr 25, 2023

Hey, @heitorlessa ,

I can add example: we are calling third party service, which is down a lot of times even for few hours. It does not make sense to call it every minute once it is down, but exponentially increase visibility timeout of message with maximum set up (lets say half and hour)

Then differentiate between temporal vs permanent failures to do this differently?

Yes, we still want to differentiate between permanent (remove from queue) and temporal failures (retry with back-off)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Batch Batch processing utility enhancement New feature or request Java Proposed Community submited Python
Projects
None yet
Development

No branches or pull requests

4 participants