-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dotnet8 metrics in otel format #1
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Manually copying over the active comment threads from our word doc discussions. I used my judgement on which discussions were resolved and which were still active, but folks should feel free to add anything they think is relevant.
docs/dotnet/dotnet-http-metrics.md
Outdated
<!-- semconv metric.dotnet.http.client.connections.usage(metric_table) --> | ||
| Name | Instrument Type | Unit (UCUM) | Description | | ||
| -------- | --------------- | ----------- | -------------- | | ||
| `http.client.connections.usage` | UpDownCounter | `{connection}` | Number of outbound HTTP connections that are currently active or idle on the client [1] | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Porting active comment threads from our word doc:
Sam Spencer
I am assuming this is a counter - it feels really strange with usage in the name.
Also with the dimensions, it feels like we are trying to do logging via metrics. As a connection is used it will rapidly flip between active and idle states. Are we getting too dimension happy?
July 18, 2023 at 10:02 AM
Sam Spencer
I'd prefer to have 2 different counters, one for connections, the other for idle connections.
July 18, 2023 at 10:38 AM
Noah Falk
I'll see if we get any feedback from OTel folks. This 'usage' convention is something their naming guidelines recommended and you can see very similar looking prior art for a connection pool in the database metrics: semantic-conventions/docs/database/database-metrics.md at e531541025992b68177a68b87628c5dc75c4f7d9 · open-telemetry/semantic-conventions (github.com)
The *.usage convention is documented here:
semantic-conventions/docs/general/metrics.md at main · open-telemetry/semantic-conventions · GitHub
July 18, 2023 at 2:44 PM
Miha Zupan
I second Sam's dislike of "usage". Would that force us to essentially log twice as much stuff (-1 idle, +1 active, request, -1 active, +1 idle)?
July 19, 2023 at 5:44 AM
Sam Spencer
Proposal: remove the state flag and usage from the name. Tracking idle connections shouldn't be that applicable, and is probably better done from logs.
July 19, 2023 at 10:50 AM
Miha Zupan
I personally don't mind having the idle connections info in metrics, I'd just prefer it be a separate counter.
July 19, 2023 at 12:39 PM
Noah Falk
@Liudmila Molkova - Do you have any feedback on this one? Personally I agree with Sam and Miha that I have an aethestic preference for multiple metrics rather than one 'usage' metric, but I did it this way because it appeared to be OTel's desired pattern and I'd prioritize consistency over my aesthetic preference. However if OTel conventions don't have a strong preference and either form is acceptable I'll split it again.
@miha Zupan - Ah I see now. The description of the current-connections made it sound like it was mutually exclusive with idle connections, but reviewing the code I see it includes both idle and non-idle connections. So yes, this scheme would have double the number of calls to instrument.Add() whenever a connection transitions between idle and non-idle. I'm assuming these transitions don't occur that often (<1K transitions/sec?) in which case perf overheads should be effectively zero. A call to Add() should be double digit ns.
July 19, 2023 at 11:44 PM
Liudmila Molkova
Problems with multiple counters is that if we ever need to support more states, we'll need to add new metrics.
Also, users will less less metric names and can discover available attributes and values as they go.
With multiple metrics, user will need to learn name of each of them. We also need to pay attention to keep attributes, implementation, and documentation in sync.
I think our aethestic preferences are not as important comparing to users ones. One metric allows them to split it and visualize it however they want Two different metric names don't let them do it.
(my preference is strongly on less instruments because of the consistency and flexibility).
July 21, 2023 at 6:11 PM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the usage aspect will not be that useful in practice. Its going to generate a lot of noise.
With the connection duration counter below, I'd suggest removing this counter altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, it seems the usage here represents a number of open connections - the connection.duration
histogram does not provide the number of active/idle connections, it's only reported after the connection is closed, we can derive connection creation and close rate from it.
I.e. if we don't introduce it now:
- we won't have a way to find how many connections are open at each moment
- it seems common and useful to measure the number of open connections
- we can still add it in .NET 9 if we realize it's useful
If we add it now, we can rename it to http.client.open_connections
.
I believe knowing the number of idle connections from metrics is important - alerting on logs is a questionable and expensive thing to do. Retrieving logs when dealing with incidents in production is slow and unreliable - logs can be files on a file system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW I think usage is used incorrectly here - usage assumes there is a limit value reported and the sum of all state dimensions is equal to the limit. This is not the case here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this PR with http.client.open_connections
- PTAL. We should also consider prefixing connection.state
with something - I assume different connections can have different set of states and we should not mix them - e.g. there is no idle or active states here - https://learn.microsoft.com/dotnet/api/system.data.connectionstate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy not to use it :) Any of these seems fine to me:
http.client.open_connections
http.client.connections.count
http.client.connections
If I was picking my preference I'd probably go with the last one but its splitting hairs for me. I think its very reasonable to still have a state attribute that marks connections as idle or not.
@samsp-msft and @antonfirsov can of course say otherwise, but for me (and I am guessing for them), the aesthetic dislike really came from "usage" not having a clear and intuitive meaning. If I told someone "the value of the connection.usage metric is 26" they will probably give me weird confused looks. If I tell them "the value of the open_connections metric is 26" they probably have an intuitive understanding what that means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we had a chat with otel semconv WG members today and thinking about the following choice:
*.connection_pool.usage
+*.conneciton_pool.limit
- which tells e.g. that there are 5 active and 25 idle connections in the connection pool of size 30
OR
*.connections.count
, oropen_connection.count
if there is no pool or limit is not known
I guess the dislike from usage
came from not having a limit reported and not talking about the connection pool. To connection pool usage seems intuitive.
Let me know what your thoughts are and what would be a preferred option here
docs/dotnet/dotnet-http-metrics.md
Outdated
| `_OTHER` | Any HTTP method that the instrumentation has no prior knowledge of. | | ||
<!-- endsemconv --> | ||
|
||
### Metric: `http.client.failed_requests` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Porting active comment threads from the word doc:
Anton Firszov
We just implemented a feature that reports various error reasons using an enum in our exceptions:https://github.com/dotnet/runtime/blob/c430570a01c103bc7f117be573f37d8ce8a129b8/src/libraries/System.Net.Http/src/System/Net/Http/HttpRequestError.cs#L9Would it make sense to include this as an attribute?
July 18, 2023 at 5:36 PM
Noah Falk
That seems reasonable to me. @Liudmila Molkova any thoughts on that one?
July 19, 2023 at 3:31 AM
Liudmila Molkova
Added dotnet.http.request.error in https://github.com/lmolkova/semantic-conventions/blob/dotnet8-metrics-proposed-diff/model/metrics/dotnet-common.yaml#L24
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I commented in more detail on PR #2. In short I think it is good for HttpClient but I am not in favor of extending it outside that.
**[2]:** **TODO: why not network.transport?** | ||
<!-- endsemconv --> | ||
|
||
## Metric: `signalr_http_transport.active_connections` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Porting active comment threads from the word doc:
Trask Stalnaker
I think existing otel sem conv conventions probably lean towards this being signalr_http_transport.connections.count (or possibly just signalr_http_transport.connections, see open-telemetry#50 https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/metrics.md#use-count-instead-of-pluralization
Otel does have db.client.connections.usage, but I think(?) this should be db.client.connections.count, and I just opened open-telemetry#201
Comparing this to otel's http.server.active_requests is also reasonable, but I think the above modeling is probably going to be more consistent long-term.
July 21, 2023 at 8:36 AM
Trask Stalnaker
I'm having second thoughts about this one already 🤔
July 21, 2023 at 8:45 AM
Trask Stalnaker
Opened open-telemetry#202
July 21, 2023 at 9:18 AM
Noah Falk
Yeah, I modeled it originally on http.server.active_requests, but I'd be just fine with current_connections. connections.count and connections are a little more ambiguous because there are two different counts that might refer to, the count of all connections ever made or the count of connections currently open.
July 21, 2023 at 5:36 PM
docs/dotnet/dotnet-http-metrics.md
Outdated
<!-- semconv metric.dotnet.http.client.connections.usage(metric_table) --> | ||
| Name | Instrument Type | Unit (UCUM) | Description | | ||
| -------- | --------------- | ----------- | -------------- | | ||
| `http.client.connections.usage` | UpDownCounter | `{connection}` | Number of outbound HTTP connections that are currently active or idle on the client [1] | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the usage aspect will not be that useful in practice. Its going to generate a lot of noise.
With the connection duration counter below, I'd suggest removing this counter altogether.
I reviewed all the changes you were proposing in #2. I think there are still a number of conversations that need to be responded to here too and that is in my todo list for today. |
7873394
to
313dbaf
Compare
0fd3372
to
f85bfa8
Compare
…ry#324) Signed-off-by: ChrsMark <[email protected]> Co-authored-by: Joao Grassi <[email protected]> Co-authored-by: Joao Grassi <[email protected]> Co-authored-by: Josh Suereth <[email protected]>
docs/dotnet/dotnet-dns-metrics.md
Outdated
<!-- toc --> | ||
|
||
- [DNS metrics](#dns-metrics) | ||
* [Metric: `dns.lookups.duration`](#metric-dnslookupsduration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix pluralization :)
docs/dotnet/dotnet-dns-metrics.md
Outdated
|
||
## DNS metrics | ||
|
||
### Metric: `dns.lookups.duration` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Metric: `dns.lookups.duration` | |
### Metric: `dns.lookup.duration` |
d777d26
to
fc5ba85
Compare
…cally in all semconv (open-telemetry#359) Co-authored-by: Armin Ruech <[email protected]>
Co-authored-by: Reiley Yang <[email protected]>
Co-authored-by: Armin Ruech <[email protected]>
Co-authored-by: Joao Grassi <[email protected]>
…pen-telemetry#203) Co-authored-by: Alexander Wert <[email protected]>
…pen-telemetry#364) Co-authored-by: Alexander Wert <[email protected]>
…-telemetry#208) Signed-off-by: Alexander Wert <[email protected]> Co-authored-by: Joao Grassi <[email protected]> Co-authored-by: Armin Ruech <[email protected]>
…h `network.(peer|local).(address|port)`. (open-telemetry#342) Co-authored-by: Liudmila Molkova <[email protected]>
…y#367) Co-authored-by: Joao Grassi <[email protected]>
Co-authored-by: Armin Ruech <[email protected]>
fc5ba85
to
bae9a8d
Compare
closing this one in favor of open-telemetry#283 |
No description provided.