-
Notifications
You must be signed in to change notification settings - Fork 780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sdk] Metrics refactor #5131
[sdk] Metrics refactor #5131
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #5131 +/- ##
==========================================
+ Coverage 83.38% 84.39% +1.01%
==========================================
Files 297 276 -21
Lines 12531 11724 -807
==========================================
- Hits 10449 9895 -554
+ Misses 2082 1829 -253
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Amazing! Will need some time to review, but I assume this won't be merged until after 1.7 stable is out, right? |
Not something we have discussed yet but I think that would be reasonable. |
This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or Pushing will instruct the bot to automatically remove the label. This bot runs once per day. |
This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or Pushing will instruct the bot to automatically remove the label. This bot runs once per day. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or Pushing will instruct the bot to automatically remove the label. This bot runs once per day. |
This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or Pushing will instruct the bot to automatically remove the label. This bot runs once per day. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Changes
AggregationType
has been replaced with a more flexible[Flags]
enum and it is no longer stored onMetricPoint
which reduces the overall memory footprint of the SDK.int
used for locking on histogram buckets & exponential buckets in favor of the one on the optional components which also slightly lowers the overall memory footprint of the SDK.Overview
The current MetricPoint implementation has a lot spaghetti code 🍝 which has been copy/pasted almost entirely into different methods which add tweaks for exemplars, min/max recording, lock differences, etc. There are also a lot of artificial concepts like "SingleStream" mode and downright whacky things like metric super lists (
List<List<Metric>>
).What this PR seeks to do is rein in all of that code while maintaining or improving the performance.
How this is done is by utilizing the JIT/compiler to do the specialization work for us. In .NET if you declare some generic thing and you feed it a value type/struct the JIT will generate a specialized implementation for it. During that specialization the JIT is able to elide/remove code when it determines something will be
false
forever. For native AOT this is done at compile time (there is no JIT).What's the catch, there's always a catch?!
The trade-off here is we need value type/structs which capture all the possible permutations. These need to be statically reachable so they are preserved during AOT. There are a lot of "behaviors" through the SDK so there are a lot of permutations which means there are a lot of structs! Supporting AOT made this its own complexity.
Benchmarks
Note: The delta benchmarks seem to be speeding up a lot. That is kind of a lie. They are speeding up, yes, but that is only because I fixed the code. Today we are doing some things for the "metric point reclaim" feature even when it isn't turned on. Most of the speed up is coming from that fix.
Examples of JIT output
Note: To capture this I set the
DOTNET_JitDisasm
envvar to*RecordMeasurement*
and then ran the "MetricsBenchmarks" suite.A quick refresher. The JIT has a two-tier process. The first tier (Tier0) is the first compile which is performed the first time some code is run. It is meant to be fast and only does minor optimizations to code. It injects some instrumentation which is used to inform later optimizations (this is known as dynamic PGO). If some code is executed enough it gets re-compiled at the second tier (Tier1). This is where every optimization in the book is thrown at the code including the data from the instrumentation.
Lots of inlining going on. No calls on the hot path. The key bit is...
I'm not an expert in x64 assembly but that looks to me like the inlined versions of the code which actually does the modifications to
MetricPoint
. More or less this code:That is what was left after everything got elided/removed from these methods:
Merge requirement checklist
CHANGELOG.md
files updated for non-trivial changes