Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sdk] Metrics refactor #5131

Closed
wants to merge 19 commits into from

Conversation

CodeBlanch
Copy link
Member

@CodeBlanch CodeBlanch commented Dec 6, 2023

Changes

  • Refactor metrics to remove duplicated code!
  • AggregationType has been replaced with a more flexible [Flags] enum and it is no longer stored on MetricPoint which reduces the overall memory footprint of the SDK.
  • Removed the int used for locking on histogram buckets & exponential buckets in favor of the one on the optional components which also slightly lowers the overall memory footprint of the SDK.
  • Fixed some bugs like exemplars being processed where they shouldn't be due to likely copy/paste failures.

Overview

The current MetricPoint implementation has a lot spaghetti code 🍝 which has been copy/pasted almost entirely into different methods which add tweaks for exemplars, min/max recording, lock differences, etc. There are also a lot of artificial concepts like "SingleStream" mode and downright whacky things like metric super lists (List<List<Metric>>).

What this PR seeks to do is rein in all of that code while maintaining or improving the performance.

How this is done is by utilizing the JIT/compiler to do the specialization work for us. In .NET if you declare some generic thing and you feed it a value type/struct the JIT will generate a specialized implementation for it. During that specialization the JIT is able to elide/remove code when it determines something will be false forever. For native AOT this is done at compile time (there is no JIT).

What's the catch, there's always a catch?!

“There are no solutions. There are only trade-offs.” ― Thomas Sowell

The trade-off here is we need value type/structs which capture all the possible permutations. These need to be statically reachable so they are preserved during AOT. There are a lot of "behaviors" through the SDK so there are a lot of permutations which means there are a lot of structs! Supporting AOT made this its own complexity.

Benchmarks

  • New code:
Method AggregationTemporality Mean Error StdDev Allocated
CounterHotPath Cumulative 9.602 ns 0.1498 ns 0.1402 ns -
CounterWith1LabelsHotPath Cumulative 36.595 ns 0.5499 ns 0.5144 ns -
CounterWith3LabelsHotPath Cumulative 74.093 ns 0.6904 ns 0.5765 ns -
CounterWith5LabelsHotPath Cumulative 124.899 ns 1.9911 ns 1.7651 ns -
CounterWith6LabelsHotPath Cumulative 144.470 ns 2.1833 ns 2.0423 ns -
CounterWith7LabelsHotPath Cumulative 162.113 ns 1.5812 ns 1.4017 ns -
CounterHotPath Delta 9.934 ns 0.0919 ns 0.0860 ns -
CounterWith1LabelsHotPath Delta 37.793 ns 0.6613 ns 0.6186 ns -
CounterWith3LabelsHotPath Delta 76.518 ns 1.4816 ns 1.5853 ns -
CounterWith5LabelsHotPath Delta 135.717 ns 2.4324 ns 2.2753 ns -
CounterWith6LabelsHotPath Delta 148.603 ns 2.6981 ns 2.5238 ns -
CounterWith7LabelsHotPath Delta 163.807 ns 2.4840 ns 2.3235 ns -
  • Old code:
Method AggregationTemporality Mean Error StdDev Allocated
CounterHotPath Cumulative 10.04 ns 0.108 ns 0.101 ns -
CounterWith1LabelsHotPath Cumulative 37.30 ns 0.522 ns 0.488 ns -
CounterWith3LabelsHotPath Cumulative 75.42 ns 0.612 ns 0.542 ns -
CounterWith5LabelsHotPath Cumulative 125.83 ns 1.349 ns 1.262 ns -
CounterWith6LabelsHotPath Cumulative 143.11 ns 2.734 ns 2.558 ns -
CounterWith7LabelsHotPath Cumulative 166.01 ns 2.178 ns 2.037 ns -
CounterHotPath Delta 14.36 ns 0.269 ns 0.252 ns -
CounterWith1LabelsHotPath Delta 42.44 ns 0.227 ns 0.177 ns -
CounterWith3LabelsHotPath Delta 79.54 ns 1.037 ns 0.970 ns -
CounterWith5LabelsHotPath Delta 127.64 ns 1.274 ns 1.192 ns -
CounterWith6LabelsHotPath Delta 144.53 ns 1.225 ns 1.146 ns -
CounterWith7LabelsHotPath Delta 175.06 ns 3.456 ns 3.549 ns -

Note: The delta benchmarks seem to be speeding up a lot. That is kind of a lie. They are speeding up, yes, but that is only because I fixed the code. Today we are doing some things for the "metric point reclaim" feature even when it isn't turned on. Most of the speed up is coming from that fix.

Examples of JIT output

Note: To capture this I set the DOTNET_JitDisasm envvar to *RecordMeasurement* and then ran the "MetricsBenchmarks" suite.

A quick refresher. The JIT has a two-tier process. The first tier (Tier0) is the first compile which is performed the first time some code is run. It is meant to be fast and only does minor optimizations to code. It injects some instrumentation which is used to inform later optimizations (this is known as dynamic PGO). If some code is executed enough it gets re-compiled at the second tier (Tier1). This is where every optimization in the book is thrown at the code including the data from the instrumentation.

  • Tier1 assembly for CumulativeTemporality + DeltaCounterLong:
; Total bytes of code 302
; Assembly listing for method OpenTelemetry.Metrics.MetricMeasurementHandler`2[OpenTelemetry.Metrics.MetricAggregatorBehaviorDefinitions+CumulativeTemporality,OpenTelemetry.Metrics.MetricPointBehaviorDefinitions+DeltaCounterLong]:RecordMeasurement[long](OpenTelemetry.Metrics.AggregatorStore,long,System.ReadOnlySpan`1[System.Collections.Generic.KeyValuePair`2[System.String,System.Object]]):this (Tier1)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; Tier1 code
; optimized code
; optimized using Dynamic PGO
; rbp based frame
; fully interruptible
; with Dynamic PGO: edge weights are valid, and fgCalledCount is 408192
; 6 inlinees with PGO data; 5 single block inlinees; 4 inlinees without PGO data
G_M000_IG01:                ;; offset=0x0000
       push     rbp
       push     r14
       push     rdi
       push     rsi
       push     rbx
       sub      rsp, 48
       lea      rbp, [rsp+0x50]
       mov      qword ptr [rbp-0x30], rsp
       mov      gword ptr [rbp+0x18], rdx
       mov      rcx, rdx
       mov      rbx, r8
 
G_M000_IG02:                ;; offset=0x001D
       mov      rsi, bword ptr [r9]
       mov      edi, dword ptr [r9+0x08]
       cmp      byte  ptr [rcx], cl
       test     edi, edi
       jne      SHORT G_M000_IG07
       call     [OpenTelemetry.Metrics.AggregatorStore:InitializeZeroTagPointIfNotInitialized():this]
       xor      r14d, r14d
 
G_M000_IG03:                ;; offset=0x0033
       test     r14d, r14d
       jl       SHORT G_M000_IG06
 
G_M000_IG04:                ;; offset=0x0038
       mov      rcx, gword ptr [rbp+0x18]
       mov      rax, gword ptr [rcx+0x58]
       cmp      r14d, dword ptr [rax+0x08]
       jae      SHORT G_M000_IG05
       mov      edx, r14d
       shl      rdx, 6
       lea      rax, bword ptr [rax+rdx+0x10]
       cmp      byte  ptr [rax], al
       lea      rdx, bword ptr [rax+0x20]
       lock     
       add      qword ptr [rdx], rbx
       mov      dword ptr [rax+0x1C], 1
       jmp      G_M000_IG15
 
G_M000_IG05:                ;; offset=0x0068
       call     CORINFO_HELP_RNGCHKFAIL
 
G_M000_IG06:                ;; offset=0x006D
       mov      rcx, gword ptr [rbp+0x18]
       call     [OpenTelemetry.Metrics.AggregatorStore:DropMeasurementDefault():this]
       jmp      G_M000_IG15
 
G_M000_IG07:                ;; offset=0x007C
       mov      ecx, 2
       call     CORINFO_HELP_GETSHARED_GCTHREADSTATIC_BASE_NOCTOR_OPTIMIZED
       mov      rax, gword ptr [rax]
       test     rax, rax
       jne      SHORT G_M000_IG08
       mov      rcx, 0x7FFF2E9F2B88
       call     CORINFO_HELP_NEWSFAST
       mov      r14, rax
       mov      rcx, r14
       call     [OpenTelemetry.Metrics.ThreadStaticStorage:.ctor():this]
       mov      ecx, 2
       call     CORINFO_HELP_GETSHARED_GCTHREADSTATIC_BASE_NOCTOR_OPTIMIZED
       mov      rcx, rax
       mov      rdx, r14
       call     CORINFO_HELP_ASSIGN_REF
       mov      rax, r14
 
G_M000_IG08:                ;; offset=0x00C1
       mov      edx, edi
       cmp      byte  ptr [rax], al
       cmp      edx, 8
       jg       SHORT G_M000_IG09
       mov      rcx, gword ptr [rax+0x08]
       lea      eax, [rdx-0x01]
       cmp      eax, dword ptr [rcx+0x08]
       jae      SHORT G_M000_IG05
       dec      edx
       mov      rdx, gword ptr [rcx+8*rdx+0x10]
       mov      r14, gword ptr [rdx+0x08]
       jmp      SHORT G_M000_IG10
 
G_M000_IG09:                ;; offset=0x00E3
       movsxd   rdx, edx
       mov      rcx, 0x7FFF2E68F620
       call     CORINFO_HELP_NEWARR_1_VC
       mov      r14, rax
 
G_M000_IG10:                ;; offset=0x00F8
       mov      r8, r14
       test     r8, r8
       jne      SHORT G_M000_IG11
       xor      rcx, rcx
       xor      edx, edx
       jmp      SHORT G_M000_IG12
 
G_M000_IG11:                ;; offset=0x0106
       lea      rcx, bword ptr [r8+0x10]
       mov      edx, dword ptr [r8+0x08]
 
G_M000_IG12:                ;; offset=0x010E
       cmp      edi, edx
       ja       SHORT G_M000_IG13
       mov      r8d, edi
       shl      r8, 4
       mov      rdx, rsi
       call     [System.Buffer:BulkMoveWithWriteBarrier(byref,byref,ulong)]
       jmp      SHORT G_M000_IG14
 
G_M000_IG13:                ;; offset=0x0124
       call     [System.ThrowHelper:ThrowArgumentException_DestinationTooShort()]
       int3     
 
G_M000_IG14:                ;; offset=0x012B
       mov      rcx, gword ptr [rbp+0x18]
       mov      rax, gword ptr [rcx+0x70]
       mov      rdx, r14
       mov      r8d, edi
       mov      rcx, gword ptr [rax+0x08]
       call     [rax+0x18]System.Func`3[System.__Canon,int,int]:Invoke(System.__Canon,int):int:this
       mov      r14d, eax
       jmp      G_M000_IG03
 
G_M000_IG15:                ;; offset=0x0148
       add      rsp, 48
       pop      rbx
       pop      rsi
       pop      rdi
       pop      r14
       pop      rbp
       ret      
 
G_M000_IG16:                ;; offset=0x0153
       push     rbp
       push     r14
       push     rdi
       push     rsi
       push     rbx
       sub      rsp, 48
       mov      rbp, qword ptr [rcx+0x20]
       mov      qword ptr [rsp+0x20], rbp
       lea      rbp, [rbp+0x50]
 
G_M000_IG17:                ;; offset=0x016A
       mov      rcx, gword ptr [rbp+0x18]
       cmp      dword ptr [rcx], ecx
       call     [OpenTelemetry.Metrics.AggregatorStore:DropMeasurementUnhandledException():this]
       lea      rax, G_M000_IG15
 
G_M000_IG18:                ;; offset=0x017D
       add      rsp, 48
       pop      rbx
       pop      rsi
       pop      rdi
       pop      r14
       pop      rbp
       ret

Lots of inlining going on. No calls on the hot path. The key bit is...

G_M000_IG04:                ;; offset=0x0038
       mov      rcx, gword ptr [rbp+0x18]
       mov      rax, gword ptr [rcx+0x58]
       cmp      r14d, dword ptr [rax+0x08]
       jae      SHORT G_M000_IG05
       mov      edx, r14d
       shl      rdx, 6
       lea      rax, bword ptr [rax+rdx+0x10]
       cmp      byte  ptr [rax], al
       lea      rdx, bword ptr [rax+0x20]
       lock     
       add      qword ptr [rdx], rbx
       mov      dword ptr [rax+0x1C], 1
       jmp      G_M000_IG15

I'm not an expert in x64 assembly but that looks to me like the inlined versions of the code which actually does the modifications to MetricPoint. More or less this code:

Interlocked.Add(ref metricPoint.RunningValue.AsLong, value);

metricPoint.MetricPointStatus = MetricPointStatus.CollectPending;

That is what was left after everything got elided/removed from these methods:

    private static void RecoredMeasurementOnLongCounterOrGaugeMetricPoint(ref MetricPoint metricPoint, long value)
    {
        if (LockRequired)
        {
            if (IsDeltaAggregation)
            {
                unchecked
                {
                    metricPoint.RunningValue.AsLong += value;
                }
            }
            else
            {
                Debug.Assert(IsCumulativeAggregation, "IsCumulativeAggregation was false");

                metricPoint.RunningValue.AsLong = value;
            }
        }
        else if (IsDeltaAggregation)
        {
            Interlocked.Add(ref metricPoint.RunningValue.AsLong, value);
        }
        else
        {
            Debug.Assert(IsCumulativeAggregation, "IsCumulativeAggregation was false");

            Interlocked.Exchange(ref metricPoint.RunningValue.AsLong, value);
        }
    }
; Total bytes of code 392
; Assembly listing for method OpenTelemetry.Metrics.MetricMeasurementHandler`2[OpenTelemetry.Metrics.MetricAggregatorBehaviorDefinitions+CumulativeTemporality,OpenTelemetry.Metrics.MetricPointBehaviorDefinitions+DeltaCounterLong]:RecordMeasurementOnMetricPoint[long](OpenTelemetry.Metrics.AggregatorStore,byref,long,System.ReadOnlySpan`1[System.Collections.Generic.KeyValuePair`2[System.String,System.Object]]):this (Tier1)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; Tier1 code
; optimized code
; optimized using Dynamic PGO
; rsp based frame
; partially interruptible
; with Dynamic PGO: edge weights are valid, and fgCalledCount is 417408
; 1 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       cmp      byte  ptr [r8], r8b
       add      r8, 32
       lock     
       add      qword ptr [r8], r9
 
G_M000_IG03:                ;; offset=0x000B
       ret 
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    private static void CompleteRecordMeasurement(ref MetricPoint metricPoint)
    {
        metricPoint.MetricPointStatus = MetricPointStatus.CollectPending;

        if (IsDeltaTemporality && ReclaimMetricPoints)
        {
            Interlocked.Decrement(ref metricPoint.ReferenceCount);
        }
    }
; Total bytes of code 12
; Assembly listing for method OpenTelemetry.Metrics.MetricMeasurementHandler`2[OpenTelemetry.Metrics.MetricAggregatorBehaviorDefinitions+CumulativeTemporality,OpenTelemetry.Metrics.MetricPointBehaviorDefinitions+DeltaCounterLong]:CompleteRecordMeasurement(byref) (Tier1)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; Tier1 code
; optimized code
; optimized using Dynamic PGO
; rsp based frame
; partially interruptible
; with Dynamic PGO: edge weights are valid, and fgCalledCount is 382976
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
G_M000_IG01:                ;; offset=0x0000
 
G_M000_IG02:                ;; offset=0x0000
       mov      dword ptr [rcx+0x1C], 1
 
G_M000_IG03:                ;; offset=0x0007
       ret  

Merge requirement checklist

  • CONTRIBUTING guidelines followed (nullable enabled, static analysis, etc.)
  • Unit tests added/updated
  • Appropriate CHANGELOG.md files updated for non-trivial changes

@CodeBlanch CodeBlanch requested a review from a team December 6, 2023 01:26
Copy link

codecov bot commented Dec 6, 2023

Codecov Report

Attention: 55 lines in your changes are missing coverage. Please review.

Comparison is base (6250307) 83.38% compared to head (c4e9a0f) 84.39%.
Report is 25 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5131      +/-   ##
==========================================
+ Coverage   83.38%   84.39%   +1.01%     
==========================================
  Files         297      276      -21     
  Lines       12531    11724     -807     
==========================================
- Hits        10449     9895     -554     
+ Misses       2082     1829     -253     
Flag Coverage Δ
unittests ?
unittests-Instrumentation-Experimental 25.79% <0.00%> (?)
unittests-Instrumentation-Stable 25.77% <0.00%> (?)
unittests-Solution-Experimental 84.38% <91.37%> (?)
unittests-Solution-Stable 84.36% <91.37%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...ing/Implementation/OpenTelemetryMetricsListener.cs 90.90% <100.00%> (+0.43%) ⬆️
...lemetry/Metrics/Base2ExponentialBucketHistogram.cs 100.00% <ø> (ø)
.../OpenTelemetry/Metrics/CompositeMetricReaderExt.cs 100.00% <100.00%> (+27.08%) ⬆️
src/OpenTelemetry/Metrics/HistogramBuckets.cs 100.00% <ø> (ø)
...mentHandling/MetricAggregatorBehaviorsAttribute.cs 100.00% <100.00%> (ø)
...surementHandling/MetricMeasurementHandlerHelper.cs 100.00% <100.00%> (ø)
...asurementHandling/MetricPointBehaviorsAttribute.cs 100.00% <100.00%> (ø)
...Telemetry/Metrics/MetricPointOptionalComponents.cs 86.66% <100.00%> (+8.88%) ⬆️
src/OpenTelemetry/Metrics/MetricReaderExt.cs 91.08% <100.00%> (+1.08%) ⬆️
src/OpenTelemetry/Metrics/MetricPoint.cs 88.37% <96.29%> (+19.90%) ⬆️
... and 5 more

... and 32 files with indirect coverage changes

@cijothomas
Copy link
Member

Amazing! Will need some time to review, but I assume this won't be merged until after 1.7 stable is out, right?

@CodeBlanch
Copy link
Member Author

@cijothomas

I assume this won't be merged until after 1.7 stable is out, right?

Not something we have discussed yet but I think that would be reasonable.

Copy link
Contributor

This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or Pushing will instruct the bot to automatically remove the label. This bot runs once per day.

@github-actions github-actions bot added the Stale Issues and pull requests which have been flagged for closing due to inactivity label Dec 15, 2023
@github-actions github-actions bot removed the Stale Issues and pull requests which have been flagged for closing due to inactivity label Dec 16, 2023
Copy link
Contributor

This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or Pushing will instruct the bot to automatically remove the label. This bot runs once per day.

@github-actions github-actions bot added the Stale Issues and pull requests which have been flagged for closing due to inactivity label Dec 23, 2023
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Dec 31, 2023
@CodeBlanch CodeBlanch reopened this Jan 8, 2024
@github-actions github-actions bot removed the Stale Issues and pull requests which have been flagged for closing due to inactivity label Jan 9, 2024
Copy link
Contributor

This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or Pushing will instruct the bot to automatically remove the label. This bot runs once per day.

@github-actions github-actions bot added the Stale Issues and pull requests which have been flagged for closing due to inactivity label Jan 16, 2024
@alanwest alanwest removed the Stale Issues and pull requests which have been flagged for closing due to inactivity label Jan 16, 2024
Copy link
Contributor

This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or Pushing will instruct the bot to automatically remove the label. This bot runs once per day.

@github-actions github-actions bot added the Stale Issues and pull requests which have been flagged for closing due to inactivity label Jan 24, 2024
Copy link
Contributor

github-actions bot commented Feb 1, 2024

Closed as inactive. Feel free to reopen if this PR is still being worked on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale Issues and pull requests which have been flagged for closing due to inactivity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants