Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust Cardinality Limit to Accommodate Internal Reserves #5382

Conversation

rajkumar-rangaraj
Copy link
Contributor

@rajkumar-rangaraj rajkumar-rangaraj commented Feb 23, 2024

Discussion #5349 (comment)

Changes

Currently, From the cardinality limit, we internally reserve one spot for zero tags and another spot for overflow if the emit overflow experimental flag is enabled. A few customers have raised concerns that they receive 1 or 2 dimensions less than the specified cardinality limit. For example, consider a scenario where a user sets the cardinality limit to 10. If the emit overflow experimental flag is enabled, we currently set aside 2 of those spots—one for zero tags and one for overflow cases. This means the user effectively gets to use only 8 of their 10 slots for their dimensions.

This proposed change aims to address concerns regarding the effective reduction of the cardinality limit available to users. The key concept here is that the cardinality limit should apply exclusively to metrics that include dimensions, treating metrics with zero dimensions and overflow metrics as special cases that are exempt from the user's specified limit. Currently, the emit overflow attribute functionality is behind an experimental flag, but it is planned for incorporation into the stable SDK. To simplify future adjustments and to provide users with the full extent of their specified limits, we propose adding two additional spots on top of the cardinality limit.

Note: From the user's perspective, it is important to understand that the cardinality limit they set is intended to apply only to metrics with dimensions. This ensures users can fully utilize their specified limit for dimensioned metrics, without having to factor in special cases into their limit calculations.

Example:

Imagine a user sets a cardinality limit of 10, expecting to track metrics across up to 10 unique dimension combinations. Under the proposed change, to ensure the user's limit is fully applicable to dimensioned metrics, we would internally adjust the limit to 12. This adjustment accounts for one spot reserved for metrics with zero dimensions and another for overflow metrics, ensuring that the user has a full quota of 10 slots available for their dimensioned metrics. This way, regardless of any special cases, the user effectively retains the full capacity of their specified cardinality limit for metrics that include dimensions.

Merge requirement checklist

  • CONTRIBUTING guidelines followed (license requirements, nullable enabled, static analysis, etc.)
  • Unit tests added/updated
  • [?] Appropriate CHANGELOG.md files updated for non-trivial changes

@rajkumar-rangaraj rajkumar-rangaraj requested a review from a team February 23, 2024 01:25
Copy link

codecov bot commented Feb 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.65%. Comparing base (6250307) to head (8f7e799).
Report is 116 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5382      +/-   ##
==========================================
+ Coverage   83.38%   84.65%   +1.27%     
==========================================
  Files         297      281      -16     
  Lines       12531    12098     -433     
==========================================
- Hits        10449    10242     -207     
+ Misses       2082     1856     -226     
Flag Coverage Δ
unittests ?
unittests-Solution-Experimental 84.64% <100.00%> (?)
unittests-Solution-Stable 84.39% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/OpenTelemetry/Metrics/AggregatorStore.cs 85.02% <100.00%> (+4.63%) ⬆️
src/OpenTelemetry/Metrics/MetricReaderExt.cs 91.26% <100.00%> (+1.26%) ⬆️
...OpenTelemetry/Metrics/MetricStreamConfiguration.cs 88.46% <ø> (+13.46%) ⬆️

... and 54 files with indirect coverage changes

@reyang
Copy link
Member

reyang commented Feb 23, 2024

I feel the key here is less about dealing with off-by-one (or off-by-two), it is more about user experience/education/expectation. @rajkumar-rangaraj could you update

/// Gets or sets a positive integer value defining the maximum number of
/// data points allowed for the metric managed by the view.
/// </summary>
/// <remarks>
/// <para><b>WARNING</b>: This is an experimental API which might change or
/// be removed in the future. Use at your own risk.</para>
/// <para>Spec reference: <see
/// href="https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#cardinality-limits">Cardinality
/// limits</see>.</para>
/// Note: If not set the default MeterProvider cardinality limit of 2000
/// will apply.
/// </remarks>
and the doc in this PR so we can align on what exactly do we want the users to understand/do before making the code change?

@rajkumar-rangaraj
Copy link
Contributor Author

the doc in this PR so we can align on what exactly do we want the users to understand/do before making the code change?

Could you please review the updated PR description? If you agree with the changes, I will proceed to update the code comments.

@reyang
Copy link
Member

reyang commented Feb 27, 2024

the doc in this PR so we can align on what exactly do we want the users to understand/do before making the code change?

Could you please review the updated PR description? If you agree with the changes, I will proceed to update the code comments.

The direction looks good to me 👍

@cijothomas
Copy link
Member

@rajkumar-rangaraj
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#overflow-attribute

The SDK MUST create an Aggregator with the overflow attribute set prior to reaching the cardinality limit and use it to aggregate events for which the correct Aggregator could not be created. The maximum number of distinct, non-overflow attributes is one less than the limit, as a result.

If we were to stick with the spec wording (not yet stable, still..), then we should count the overflow attribute as contributing to the overall limit. Maybe just do the zero tag case in this PR, and wait for the spec to finalize.

@utpilla
Copy link
Contributor

utpilla commented Feb 27, 2024

If we were to stick with the spec wording (not yet stable, still..), then we should count the overflow attribute as contributing to the overall limit. Maybe just do the zero tag case in this PR, and wait for the spec to finalize.

SDKs SHOULD support being configured with a cardinality limit. The number of unique combinations of attributes is called cardinality.

If we were to stick with spec wording, then we should not treat zero tag as a special case either. That should also contribute to the overall limit.

@cijothomas
Copy link
Member

If we were to stick with the spec wording (not yet stable, still..), then we should count the overflow attribute as contributing to the overall limit. Maybe just do the zero tag case in this PR, and wait for the spec to finalize.

SDKs SHOULD support being configured with a cardinality limit. The number of unique combinations of attributes is called cardinality.

If we were to stick with spec wording, then we should not treat zero tag as a special case either. That should also contribute to the overall limit.

Spec is not saying anything about zero tags, but spec explicitly talks about overflow contributing to the overall limit.

@utpilla
Copy link
Contributor

utpilla commented Feb 27, 2024

Spec is not saying anything about zero tags, but spec explicitly talks about overflow contributing to the overall limit.

Spec says "The number of unique combinations of attributes is called cardinality."

A measurement with zero tags qualifies as a unique combination.

@cijothomas
Copy link
Member

Spec is not saying anything about zero tags, but spec explicitly talks about overflow contributing to the overall limit.

Spec says "The number of unique combinations of attributes is called cardinality."

A measurement with zero tags qualifies as a unique combination.

Got it. Our only option is then to break the spec compliance and/or influence the spec to soften the wording there, to allows us to special case. After all, what we are doing is aligned with the OTel mission.

@Kielek
Copy link
Contributor

Kielek commented Feb 28, 2024

Spec is not saying anything about zero tags, but spec explicitly talks about overflow contributing to the overall limit.

Spec says "The number of unique combinations of attributes is called cardinality."
A measurement with zero tags qualifies as a unique combination.

Got it. Our only option is then to break the spec compliance and/or influence the spec to soften the wording there, to allows us to special case. After all, what we are doing is aligned with the OTel mission.

+1 to adjust specification. We should avoid any differences where possible. Especially if the part of the specification is still unstable and can be easily modified.

@cijothomas
Copy link
Member

Spec is not saying anything about zero tags, but spec explicitly talks about overflow contributing to the overall limit.

Spec says "The number of unique combinations of attributes is called cardinality."
A measurement with zero tags qualifies as a unique combination.

Got it. Our only option is then to break the spec compliance and/or influence the spec to soften the wording there, to allows us to special case. After all, what we are doing is aligned with the OTel mission.

open-telemetry/opentelemetry-specification#3904 (comment) should give us what we need.

@CodeBlanch
Copy link
Member

@rajkumar-rangaraj

Hey FYI we have this block of code currently:

if (isEmitOverflowAttributeKeySet)
{
// We need at least two metric points. One is reserved for zero tags and the other one for overflow attribute
if (cardinalityLimit > 1)
{
this.emitOverflowAttribute = true;
}
}

If we are going to auto-expand whatever the user chooses for cardinalityLimit you can probably adjust that to just...

-        if (isEmitOverflowAttributeKeySet)
-        {
-            // We need at least two metric points. One is reserved for zero tags and the other one for overflow attribute
-            if (cardinalityLimit > 1)
-            {
-                this.emitOverflowAttribute = true;
-            }
-        }
+        this.emitOverflowAttribute = isEmitOverflowAttributeKeySet;

@@ -1429,7 +1431,7 @@ int MetricPointCount()
}

meterProvider.ForceFlush(MaxTimeToAllowForFlush);
Assert.Equal(MeterProviderBuilderSdk.DefaultCardinalityLimit, MetricPointCount());
Assert.Equal(MeterProviderBuilderSdk.DefaultCardinalityLimit + additionalReserve, MetricPointCount());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should verify that we don't allow the user to exceed the cardinality cap. The right thing to test here now would be to check that count of metric points exported (excluding zero tags and overflow attribute) is equal to MeterProviderBuilderSdk.DefaultCardinalityLimit. We probably need to update MetricPointCount() method to return the count of unreserved MetricPoints instead of all the MetricPoints that were exported in that particular Collect cycle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overflow is guarded with an experimental flag. In cases where the overflow experimental attribute is not set, the value will be the cardinality limit + 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you have now updated MetricPointCount() method definition to account for that.

@rajkumar-rangaraj
Copy link
Contributor Author

Thanks @utpilla for the very detailed review. I appreciate the time you spent on the quality review. I’ve sent an update, please take a look when you get a chance.

@utpilla utpilla merged commit b90c3b2 into open-telemetry:main Mar 7, 2024
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants