Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: [SQS Batch Processing] Ability to move Message from Batch to DLQ when certain types of exception occurs #29

Open
pankajagrawal16 opened this issue Jul 27, 2021 · 1 comment
Labels
all_runtimes Changes that should be applied to all runtimes Batch Batch processing utility Pending/Triage Pending triage RFC

Comments

@pankajagrawal16
Copy link

pankajagrawal16 commented Jul 27, 2021

Key information

Summary

During batch processing of SQS messages, there can be messages in the batch which fails processing because of reasons for which user will not want them to retry it but move those to a DLQ associated with SQS queue or delete it entirely. Example might be where a message is failing business validation, and it won't make sense to let it retry until it expires, rather we can simply move such message directly to a DLQ.

We could enhance SQL Batch processing to accept a list of Exception/Errors. If those exceptions occur during message processing via SqsMessageHandler for Java or via record_handler in python, utility can take care of moving such message to DLQ directly or delete it entirely based on a config param, instead of moving them back to queue.

Motivation

This is a fairly common use-case. This will take away all the custom logic that user need to build themselves and let user focus on writing business logic of processing the SQS message instead.

Proposal

During batch processing of SQS messages, there can be messages in the batch which fails processing because of reasons for which user will not want them to retry it but move those to a DLQ associated with SQS queue or delete it entirely. Example might be where a message is failing business validation, and it won't make sense to let it retry until it expires, rather we can simply move such message directly to a DLQ.

We could enhance SQL Batch processing to accept a list of Exception/Errors. If those exceptions occur during message processing via SqsMessageHandler for Java or via record_handler in python, utility can take care of moving such message to DLQ directly or delete it entirely based on a config param, instead of moving them back to queue.

So basically, accepting a list of exceptions/errors and a new flag if such a message should be deleted or moved to DLQ in the api contract or the annotation/decorator. Default could be to move to DLQ is one exists for the SQS queue.

If this feature should be available in other runtimes (e.g. Python), how would this look like to ensure consistency?

For Python version, since it supports similar batch processing utility with similar UX via decorator and APIs, same capability could be added to it as well.

User Experience

How would customers use it?

For Java:

public class PartialBatchPartialFailureHandler implements RequestHandler<SQSEvent, String> {
    @Override
    @SqsBatch(value = InnerMessageHandler.class, nonRetryableExceptions = {IllegalStateException.class})
    public String handleRequest(final SQSEvent sqsEvent,
                                final Context context) {
        return "Success";
    }

    private class InnerMessageHandler implements SqsMessageHandler<Object> {

        @Override
        public String process(SQSMessage message) {
            if ("some business logic validation".equals("false")) {
                throw new IllegalStateException("2e1424d4-f796-459a-8184-9c92662be6da");
            }
            
            return "Success";
        }
    }
}
public class PartialBatchPartialFailureHandler implements RequestHandler<SQSEvent, String> {
    @Override
    @SqsBatch(value = InnerMessageHandler.class, nonRetryableExceptions = {IllegalStateException.class}, deleteNonRetryableMessageFromQueue = true)
    public String handleRequest(final SQSEvent sqsEvent,
                                final Context context) {
        return "Success";
    }

    private class InnerMessageHandler implements SqsMessageHandler<Object> {

        @Override
        public String process(SQSMessage message) {
            if ("some business logic validation".equals("false")) {
                throw new IllegalStateException("2e1424d4-f796-459a-8184-9c92662be6da");
            }
            
            return "Success";
        }
    }
}

Similar support can be made to higher level api supported by utility:

public class PartialBatchPartialFailureHandler implements RequestHandler<SQSEvent, Object> {
    @Override
    public Object handleRequest(final SQSEvent sqsEvent,
                                final Context context) {

        return SqsUtils.batchProcessor(sqsEvent, message -> {
                    if ("some business logic validation".equals("false")) {
                        throw new IllegalStateException("2e1424d4-f796-459a-8184-9c92662be6da");
                    }

                    return "Success";
                },
                IllegalStateException.class);
    }

So in above examples, if IllegalStateException is thrown from handler while processing the message, then that message will be automatically moved to a DLQ or be deleted based on flag value of deleteNonRetryableMessageFromQueue. By default, it will attempt to move it to DLQ if one exists.

Any configuration or corner cases you'd expect?
NA

Demonstration of before and after on how the experience will be better

Refer summary above. Today all the logic of deciding to move DQL or Deleting such message has to be done by users writing alot of custom code around it.

Drawbacks

Increases complexity of the utilty and more code to maintain?

No, since we already depend on SQS client today. Its just additional functionality within utility.

Rationale and alternatives

  • What other designs have been considered? Why not them?
  • What is the impact of not doing this? Customer are forced to write and maintian such custom logic themselves.

Unresolved questions

Optional, stash area for topics that need further development e.g. TBD

@pankajagrawal16
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
all_runtimes Changes that should be applied to all runtimes Batch Batch processing utility Pending/Triage Pending triage RFC
Projects
None yet
Development

No branches or pull requests

2 participants