Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

path label in Bridge metrics can contain very different values and that makes it hard to work with the metrics #921

Open
scholzj opened this issue Aug 4, 2024 · 7 comments
Assignees
Labels

Comments

@scholzj
Copy link
Member

scholzj commented Aug 4, 2024

The path label in the Bridge metrics seems to contain very different values. That breaks dashbords and makes it hard to work with the metrics. For example, for the poll metrics, you can get the metrics like this:

$ curl -s localhost:8080/metrics | grep strimzi_bridge_http_server_requests_total | grep GET | grep consumers
strimzi_bridge_http_server_requests_total{code="200",method="GET",path="/consumers/my-group5/instances/my-consumer5/records?timeout=100",} 432.0
strimzi_bridge_http_server_requests_total{code="200",method="GET",path="http://my-bridge-bridge-service:8080/consumers/my-group4/instances/my-consumer4/records?timeout=100",} 1035.0

As you can notice:

  • The first metric contains the path parameters (?timeout=100)
  • The second metric containers the full address (http://my-bridge-bridge-service:8080) as well as the path parameters (?timeout=100)

Both of these breaks the Strimzi Bridge Grafana Dashabord that expects the path to be always only /consumers/<NAME>/instances/<NAME>/records. So the dashboard does not show this metric. I noticed it at the Poll chart but this likely impacts also other metrics / charts.

@scholzj
Copy link
Member Author

scholzj commented Aug 4, 2024

Note: The two metrics above were produced by our own examples -> the first one is from the plain Java example and the other from the Vert.x example. So those can be used to reproduce this.

@ppatierno ppatierno self-assigned this Aug 5, 2024
@ppatierno
Copy link
Member

I had a little bit of investigation and it seems that this metrics part is not working as expected since ... 0.20.0 release (using Vert.x 4.1.0) :-o.
The last working for me providing the right path tag is with 0.19.0 (using Vert.x 3.9.3).
I went through the Vert.x micrometer implementation and more specifically the part which provides the HTTP server request ones (because these are not metrics generated by us) in the VertxHttpServerMetrics class and currently they are using the request.uri() for the path tag:

https://github.com/vert-x3/vertx-micrometer-metrics/blob/4.5.9/src/main/java/io/vertx/micrometer/impl/VertxHttpServerMetrics.java#L79

Up to the latest 3.9.x (so 3.9.16) they were using the request.path() instead:

https://github.com/vert-x3/vertx-micrometer-metrics/blob/3.9.16/src/main/java/io/vertx/micrometer/impl/VertxHttpServerMetrics.java#L66

This explain the breaking point between bridge 0.19.0 and 0.20.0.
So I guess they had a reason for making the switch even if the right path meaning in HTTP terms should be request.path() (just the request path, no server host, no query string) and not request.uri() (which contains everything up to the query string).
I am going to ask on the Vert.x micrometer GitHub repo about this. Meanwhile checking if there is a way to "override" the tag on our side.

@ppatierno
Copy link
Member

ppatierno commented Aug 6, 2024

So other than the solution I have in my current open PR I have two more:

  • changing the Grafana to adapt the new path value and override
  • using a MeterFilter (as pointed by the Vert.x guy) in the bridge to override the path value

The two solutions are pretty similar but done at two different levels.
The Grafana one for example means changing from:

sum(rate(strimzi_bridge_http_server_requests_total{container=~"^.+-bridge",method="GET",path=~"/consumers/.+/instances/.+/records"}[5m])) by (path)

to:

sum(label_replace(rate(strimzi_bridge_http_server_requests_total{container=~"^.+-bridge",method="GET",path=~".*/consumers/.+/instances/.+/records.*"}[5m]), "resource_path", "$1", "path", ".*(/consumers/.+/instances/.+/records).*")) by (resource_path)

It's actually doing a label_replace so catching the overall new path (notice the added .* to catch with host and query string) but then replacing with the value of the group specified in the last parameter so just getting what is in the parenthesis here .*(/consumers/.+/instances/.+/records).*.
Of course, it has to be applied to all the other metrics on the dashboard.

The MeterFilter solution is about doing actually almost the same but at the bridge level code.
Following a snippet to show how it should work:

Pattern pattern = Pattern.compile(".*(/consumers/.+/instances/.+/records).*");
                if (this.metricsReporter.getMeterRegistry() != null) {
                    // SOME OTHER UNRELATED CODE
                    this.metricsReporter.getMeterRegistry().config().meterFilter(
                            MeterFilter.replaceTagValues(Label.HTTP_PATH.toString(), path -> {
                                Matcher m = pattern.matcher(path);
                                if (m.matches()) {
                                    return m.group(1);
                                }
                                return path;
                            }, "")
                    );
                }

So we need anyway a pattern for each metric we expose on the dashboard, and than we are matching the pattern and getting the group (as for the Grafana solution).

I am open to discuss what we prefer.
Anyway the discussion with Vert.x folks is still open.
any thoughts @strimzi/maintainers ?
Personally I am for changing the bridge code by using a MeterFilter to leave the dashboards simpler.

@ppatierno
Copy link
Member

After some offline discussion with @scholzj we think that it could be better to leave the bridge as it is while waiting for an answer from Vert.x folks on the issue I raised on the Vert.x micrometer repo. If no fix from their side or something coming later with a new release, we can fix the issue at dashboard level.

@ppatierno
Copy link
Member

Triaged on 03.10.2024: Paolo is going to ping Vert.x folks again on the issue he raised. If no answer in 1-2 weeks, we are going to fix this on our side at dashboard level.

@ppatierno
Copy link
Member

Vert.x folks replied on the issue I opened and they scoped the fix for 4.5.11 release.
When it is released we should come back and check that metrics are correct and dashboard finally works again on our side.

@ppatierno
Copy link
Member

Triaged on 17/10/2024: we'll wait for Vert.x 4.5.11 release taking the fix and check that everything is finally fine on our metrics and dashboard side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants