Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector acks messages sent by S3 source even when delivery failed #19711

Closed
tanushri-sundar opened this issue Jan 25, 2024 · 5 comments
Closed
Labels
domain: delivery Anything related to delivering events within Vector such as end-to-end acknowledgements sink: gcp_pubsub Anything `gcp_pubsub` sink related type: bug A code related bug.

Comments

@tanushri-sundar
Copy link
Contributor

tanushri-sundar commented Jan 25, 2024

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

I am sending messages from S3 to GCP PubSub using Vector.

I have observed that messages rejected by the PubSub sink for non-retryable errors (ex: 400, 403) are dropped.

2024-01-18T21:35:42.649078Z ERROR vector_common::internal_event::component_events_dropped: Internal log [Events dropped] is being suppressed to avoid flooding.

The dropped messages are deleted from SQS, even with a DLQ configured on my SQS.

I would expect that the SQS message not be deleted per Vector’s end-to-end-acknowledgement guarantee.


Expected behavior: Vector should delete messages in S3 only if delivery to the sink is successful. I have traced this issue back to the S3 source code here:

match result {
    BatchStatus::Delivered => Ok(()),
    BatchStatus::Errored => Err(ProcessingError::ErrorAcknowledgement),
    BatchStatus::Rejected => {
        // Sinks are responsible for emitting ComponentEventsDropped.
        // Failed events cannot be retried, so continue to delete the SQS source message.
        Ok(())
    }

The BatchStatus::Rejected case should call Err(ProcessingError::ErrorAcknowledgement).

This would allow failed messages to be DLQed, as the user can configure a DLQ on their SQS and set custom values for maximum retries and visibility timeout.

This fix could potentially be a configuration based approach, with a parameter such as strict_ack = true preventing Vector's overeager deletion.

Configuration

No response

Version

0.33.0

Debug Output

No response

Example Data

No response

Additional Context

No response

References

#14899
#14708
#10870

@tanushri-sundar tanushri-sundar added the type: bug A code related bug. label Jan 25, 2024
@jszwedko
Copy link
Member

This is one of the issues intended to be addressed by #14708.

@jszwedko jszwedko added sink: gcp_pubsub Anything `gcp_pubsub` sink related domain: delivery Anything related to delivering events within Vector such as end-to-end acknowledgements labels Jan 25, 2024
@tanushri-sundar
Copy link
Contributor Author

tanushri-sundar commented Jan 25, 2024

@jszwedko I agree, this could definitely be solved by the RFC #14708.

A smaller scope fix for this particular source would be very helpful, and it could potentially be a low-touch code change. Since there isn't a timeline for implementing the RFC, perhaps starting with this fix can help unblock some users more quickly.

@jszwedko
Copy link
Member

Yeah, after reflecting a bit I agree that this seems like a safe incremental change to make even in the context of that RFC. I'd be happy to see a PR for it if you like.

@tanushri-sundar
Copy link
Contributor Author

@jszwedko That sounds great! I've opened PR #19748

@jszwedko
Copy link
Member

jszwedko commented Feb 1, 2024

Closed by #19748

@jszwedko jszwedko closed this as completed Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: delivery Anything related to delivering events within Vector such as end-to-end acknowledgements sink: gcp_pubsub Anything `gcp_pubsub` sink related type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

2 participants