-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Low cardinality metrics issues #7719
Comments
First, I propose that we don't make low cardinality the default behavior in 1.14. Instead, we use 1.14 to implement the following in low cardinality:
|
Addressing another point, low cardinality should be fine with adding path=/dapr/config or path=/healthz and any other Dapr-specific paths without an identifier. |
I think there is a better solution than the custom regex. Dapr (in both low and high cardinality) would apply a series of builtin regexes to remove identifiers from the URL paths. If the path remains unmodified, low cardinality defaults to hiding the path while high cardinality defaults to showing the path in metrics. If custom regex and ID removal are both enabled, the built in ID removal is only applied if no custom regex matches. Examples:
In summary:
Ideally, we converge into low cardinality with automatic ID removal being the default behavior. |
Do you mean converge into |
I mean that eventually (not in 1.14) we will have low cardinality with automatic ID removal for HTTP paths as the default behavior. I think high cardinality will be opt-in if not removed. |
I think Artur is saying that the low cardinality will be the default since that with the built-in regex identifiers only the remaining paths that the identifiers are not matched will be shown as the "low cardinality" metric we know today. |
The usage of regexs is going to be a problem for both performance and usability reasons. |
I think there's some additional clarification that was provided, based on the context from the issue and PR: #6919 #6723 #6581
I understand that there are users who do want to see higher cardinality for HTTP metrics. This is why we maintained a configuration option |
Following up on the feedback above we would like to propose a few adjustments to the initial proposal and we will work on the items:
|
Regarding point (1), I made some comments in the proposal by @nelson-parente that will give a way for users and platforms to keep the existing behavior (if so they want). Now on point (2), when discussing gRPC<->HTTP for service invocation, Dapr maintainers have moved away from trying to match HTTP and gRPC protocols a while ago, making it a non-issue. One of the consequences of that decision is that Dapr does not convert HTTP to gRPC or vice versa when invoking services, meaning the requestor's app must invoke with the same protocol that the receiving app listens to (sidecar to sidecar is always over gRPC but that is a "tunnel" for the original call and does not need to adhere to the app's protocol). |
This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions. |
This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had activity in the last 67 days. If this issue is still valid, please ping a maintainer and ask them to label it as pinned, good first issue, help wanted or triaged/resolved. Thank you for your contributions. |
Intro
With the low cardinality metrics enabled, a lot of metrics are losing their value and in some cases even become completely unusable as they do not have any useful information provided except the application ID.
Monitoring systems and dashboards previously using high cardinality settings, are broken and there's seems no suitable replacement can be found with low cardinality enabled.
The value of having a low cardinality setting is understandable and the use-cases differ between the users, however having the option to have a middle ground would be great.
Please consider some scenarios where low cardinality might render metrics unusable (see below).
Test method
I've done an investigation by building a few simple applications using the code samples mentioned below.
Using simple applications to send/receive HTTP requests using 2 different methods, based on the examples provided in:
https://docs.dapr.io/developing-applications/building-blocks/service-invocation/howto-invoke-discover-services/#invoke-the-service (plain HTTP calls)
https://github.com/dapr/go-sdk/tree/main/examples/service/serving/http (SDK server)
https://github.com/dapr/go-sdk/tree/main/examples/service/client (SDK client)
Metrics checked
Application:
service-invocation-http-client
Metrics:
dapr_http_server_latency_
(_sum
,_count
,_bucket
)High cardinality:
Low cardinality:
Metrics:
dapr_http_server_request_count
High cardinality:
Low cardinality:
Issue: With low cardinality calling all paths became merged into one single method name, while hiding the real path that is being called and each of those paths is likely to have a different latency.
The same goes for the detection of issues of certain paths, for example, POST can produce errors, which might be hidden, and no way to know which path produces error responses.
With latency metrics missing paths, monitoring will be unable to track which method call is slow or fast or having an issue, all percentiles become useless as well.
Example:
/put-service-invocation-plain
take 100ms/get-service-invocation-plain
take 5msApplications:
service-invocation-http-server
,service-invocation-http-client
Metrics:
dapr_http_server_request_bytes_
(_sum
,_count
,_bucket
)High cardinality:
Low cardinality:
Applications:
service-invocation-http-server
,service-invocation-http-client
Metrics:
dapr_http_server_response_
(_sum
,_count
,_bucket
)High cardinality:
Low cardinality:
Issue: The metrics don't record any labels other than the current
app_id
, for both plain HTTP client and server. Unable to differentiate the calls being made at all, only having the current application ID, missing source/destination app ID and the path/method being called.Applications:
service-invocation-http-server
Metrics:
dapr_http_client_roundtrip_latency_
(_sum
,_count
,_bucket
)High cardinality:
Low cardinality:
Issue: With the low cardinality, again all calls are merged into one, in this case even worse, the only labels left are
app_id
(name of the current application) andstatus
. Making metrics unusable and unable to distinguish the specific calls and troubleshoot properly.In the example samples above, the
404
status calls are coming from internal calls to the pathdapr/config
, which are not part of the application. In low cardinality mode, these404
errors become part of the application and might falsely cause issues in monitoring systems making them think something is wrong with the actual application.Applications:
service-invocation-http-server
Metrics:
dapr_http_client_sent_bytes_
(_sum
,_count
,_bucket
)High cardinality:
Low cardinality:
Issue: As in the above example, the same issue can be observed here, no way of knowing what path is being called, can't measure bytes sent. In addition, internal
dapr/config
is mixed into the application metric and the metric value is now incorrect.Note: Using go-sdk for both client and server service invocations produces the same results except the
path
value as a full Dapr path call with version etc.Additional Files
Attaching dumps from the
9090
port from all 8 applications used in the testing.plain-client-high.txt
plain-client-low.txt
plain-server-high.txt
plain-server-low.txt
sdk-client-high.txt
sdk-client-low.txt
sdk-server-high.txt
sdk-server-low.txt
The text was updated successfully, but these errors were encountered: