Replies: 3 comments 1 reply
-
Can you confirm what HTTP status code Elasticsearch returns when returning a "429" response? We do have some logic that looks for "429"s in the response body, but only if the overall HTTP response is reported as successful (2xx): vector/src/sinks/elasticsearch/retry.rs Lines 123 to 143 in acea5ae Also, can you share your sink configuration? |
Beta Was this translation helpful? Give feedback.
-
Hey @jszwedko. Thanks for getting back on this. Yes, the status code on these is 200. See below for the elasticsearch sink config:
|
Beta Was this translation helpful? Give feedback.
-
Hey @jszwedko. So these are possibly 2 different issues(?) and I'm sorry for conflating them.
Here is an extract from the logs related to the elasticsearch sink around that time: I can try and get you them in text format but it isn't easy because of this issue: You can also see several connection refused errors - this is because one of the hosts specified in the until we again get to another period where the requests vastly outnumber responses (this also coincided with the buffers filling up): As for Let me know if you'd like to see anything else and thanks again for taking a look. |
Beta Was this translation helpful? Give feedback.
-
vector-repro.zip
See attached docker config to reproduce this but essentially I have a pipeline with the
demo_logs
source and theelasticsearch
sink. I'm intentionally setting the write thread pool queue size to 0 in elasticsearch (opensearch) so that 429 errors are returned, or at least 429s are returned in the json response body, which is how elasticsearch does 429s.At this point, given that I have set
request_retry_partial: true
, I'm expecting to see thevector_buffer_events
metric go up but it remains at 0. However, if I then stop the opensearch container so that vector can no longer connect,vector_buffer_events
does start to go up. Why is this?Additionally, when our elasticsearch cluster is under load and starts returning 429 errors, we're seeing
rate(vector_http_client_requests_sent_total[5m])
for theelasticsearch
sink continue to rise to huge numbers (see charts below), which isn't what I would expect given that the cluster is under load. From what I can tell, this is then causing bulk indexing tasks to pile up on the cluster, which is something ARC should be helping to avoid.Beta Was this translation helpful? Give feedback.
All reactions