Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(rw2.0): reject remote write 2.0 based on content type #10423

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

krajorama
Copy link
Contributor

@krajorama krajorama commented Jan 13, 2025

What this PR does

Reject remote write 2.0 request properly with HTTP status 415, based on RW2.0 spec.

The current solution returns 2xx , but doesn't actually ingest the samples.

Prometheus does detect this
prometheus-1 | time=2025-01-13T13:01:35.028Z level=ERROR source=queue_manager.go:1670 msg="non-recoverable error" component=remote remote_name=150c10
url=http://mimir-1:8001/api/v1/push failedSampleCount=2000
failedHistogramCount=0 failedExemplarCount=0
err="sent v2 request with 2000 samples, 0 histograms and 0 exemplars;
got 2xx, but PRW 2.0 response header statistics indicate 0 samples,
0 histograms and 0 exemplars were accepted; assumining failure e.g.
the target only supports PRW 1.0 prometheus.WriteRequest, but does not
check the Content-Type header correctly"

But we can do better and also start working towards RW2.0 support.

Which issue(s) this PR fixes or relates to

Related to #9072

Checklist

  • [N/A] Tests updated. Tested manually, we are going to replace this with proper support.
  • [N/A] Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • [N/A] about-versioning.md updated with experimental features.

@krajorama
Copy link
Contributor Author

With this patch, Prometheus gets:

prometheus-1  | time=2025-01-13T13:06:03.430Z level=ERROR source=queue_manager.go:1670
 msg="non-recoverable error" component=remote remote_name=150c10 url=http://mimir-1:8001/api/v1/push failedSampleCount=2000 failedHistogramCount=0 failedExemplarCount=0
err="server returned HTTP status 415 Unsupported Media Type: remote-write v2 is not supported\n"

The current solution returns 2xx , but doesn't actually ingest the samples.

Prometheus does detect this
prometheus-1  | time=2025-01-13T13:01:35.028Z level=ERROR
source=queue_manager.go:1670 msg="non-recoverable error"
component=remote remote_name=150c10
 url=http://mimir-1:8001/api/v1/push failedSampleCount=2000
failedHistogramCount=0 failedExemplarCount=0
err="sent v2 request with 2000 samples, 0 histograms and 0 exemplars;
 got 2xx, but PRW 2.0 response header statistics indicate 0 samples,
 0 histograms and 0 exemplars were accepted; assumining failure e.g.
 the target only supports PRW 1.0 prometheus.WriteRequest, but does not
 check the Content-Type header correctly"

But we can do better and also start working towards RW2.0 support.

Signed-off-by: György Krajcsovits <[email protected]>
@krajorama krajorama force-pushed the feat/remote-write-two branch from 70ef976 to 5dd0ff1 Compare January 13, 2025 13:13
@krajorama krajorama marked this pull request as ready for review January 13, 2025 13:13
@krajorama krajorama requested a review from a team as a code owner January 13, 2025 13:13
Copy link
Member

@jesusvazquez jesusvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -29,6 +29,7 @@
* [BUGFIX] MQE: Fix deriv with histograms #10383
* [BUGFIX] PromQL: Fix <aggr_over_time> functions with histograms https://github.com/prometheus/prometheus/pull/15711 #10400
* [BUGFIX] MQE: Fix <aggr_over_time> functions with histograms #10400
* [BUGFIX] Distributor: return HTTP status 415 Unsupported medica type instead of success for Remote Write 2.0 until we support it. #10423
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* [BUGFIX] Distributor: return HTTP status 415 Unsupported medica type instead of success for Remote Write 2.0 until we support it. #10423
* [BUGFIX] Distributor: return HTTP status 415 Unsupported Media Type instead of success for Remote Write 2.0 until we support it. #10423

Not sure if we want lower case though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good catch, @NickAnge . According to the Google Developer Documentation Style Guide, it should be capitalized and monospace, i.e.: 415 Unsupported Media Type

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should follow the same format for the 200 Success status for the sake of consistency.

@@ -29,6 +29,7 @@
* [BUGFIX] MQE: Fix deriv with histograms #10383
* [BUGFIX] PromQL: Fix <aggr_over_time> functions with histograms https://github.com/prometheus/prometheus/pull/15711 #10400
* [BUGFIX] MQE: Fix <aggr_over_time> functions with histograms #10400
* [BUGFIX] Distributor: return HTTP status 415 Unsupported medica type instead of success for Remote Write 2.0 until we support it. #10423
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good catch, @NickAnge . According to the Google Developer Documentation Style Guide, it should be capitalized and monospace, i.e.: 415 Unsupported Media Type

@@ -29,6 +29,7 @@
* [BUGFIX] MQE: Fix deriv with histograms #10383
* [BUGFIX] PromQL: Fix <aggr_over_time> functions with histograms https://github.com/prometheus/prometheus/pull/15711 #10400
* [BUGFIX] MQE: Fix <aggr_over_time> functions with histograms #10400
* [BUGFIX] Distributor: return HTTP status 415 Unsupported medica type instead of success for Remote Write 2.0 until we support it. #10423
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should follow the same format for the 200 Success status for the sake of consistency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants