Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sdk-logs] Keep LogRecord.State & LogRecord.Attributes in sync if either is updated by a log processor #5169

Merged
merged 5 commits into from
Dec 15, 2023

Conversation

CodeBlanch
Copy link
Member

@CodeBlanch CodeBlanch commented Dec 13, 2023

Relates to #4609

Changes

  • Updates the LogRecord class so that if Attributes \ StateValues or State is set directly they will be kept in sync.

Details

We're still discovering some code out in the wild that isn't handling State \ Attributes \ StateValues very well. Example:

        public override void OnEnd(LogRecord data)
        {
            if (this.IsLogToBeProcessed(data))
            {
                if (data.State is IReadOnlyList<KeyValuePair<string, object>> listOfKvp)
                {
                    data.State = this.AddFields(data, listOfKvp);
                }
            }
        }

That code is flawed in that it ignores Attributes \ StateValues (which may be set instead of State) BUT, if users have ParseStateValues=false AND they are using an exporter which is preferring State over Attributes \ StateValues, it will work fine.

The problem is since 1.5.0+ things have changed:

  • Attributes \ StateValues will be set from the initial state.
  • A lot of exporters were updated to prefer Attributes \ StateValues over State.

The result of those changes is the code above sets a new State but Attributes \ StateValues remains set to what was parsed/resolved from the initial State value. The initial data is exported and these enrichment processors become broken.

To smooth this out what this PR does is keep everything in sync automatically.

Merge requirement checklist

  • CONTRIBUTING guidelines followed (nullable enabled, static analysis, etc.)
  • Unit tests added/updated
  • Appropriate CHANGELOG.md files updated for non-trivial changes

@CodeBlanch CodeBlanch requested a review from a team December 13, 2023 21:03
Copy link

codecov bot commented Dec 13, 2023

Codecov Report

Merging #5169 (b3a3977) into main (7eeae5b) will increase coverage by 0.08%.
The diff coverage is 94.73%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5169      +/-   ##
==========================================
+ Coverage   83.37%   83.45%   +0.08%     
==========================================
  Files         297      297              
  Lines       12386    12398      +12     
==========================================
+ Hits        10327    10347      +20     
+ Misses       2059     2051       -8     
Flag Coverage Δ
unittests 83.45% <94.73%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
.../OpenTelemetry/Logs/ILogger/OpenTelemetryLogger.cs 86.79% <100.00%> (ø)
src/OpenTelemetry/Logs/LoggerSdk.cs 100.00% <100.00%> (ø)
src/OpenTelemetry/Logs/LogRecord.cs 69.73% <94.11%> (+2.36%) ⬆️

... and 7 files with indirect coverage changes

Copy link
Contributor

@Yun-Ting Yun-Ting left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using AttributeData as a underlying cache to keep State and Attributes in sync LGTM as an intermediate solution before deprecating State / ParseStateValues.
When would the deprecation be happening 😀?

@reyang
Copy link
Member

reyang commented Dec 14, 2023

We're still discovering some code out in the wild that isn't handling State \ Attributes \ StateValues very well.

Are there real issues reported by the user? How important it is?

@reyang
Copy link
Member

reyang commented Dec 14, 2023

To smooth this out what this PR does is keep everything in sync automatically.

How much cost would this introduce?

reyang
reyang previously approved these changes Dec 14, 2023
Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with this change. However, I'm concerned about the slippery slope situation where we keep adding patches which make things more complex / less performant - how much confidence do you have @CodeBlanch?

@CodeBlanch
Copy link
Member Author

How much cost would this introduce?

Very little! For users NOT doing any modification to LogRecord there isn't any cost at all. For users doing modification:

  • Setting State to some IReadOnlyList they will basically pay for the cost of logRecord.Attributes = (IReadOnlyList<KeyValuePair<string, object>>)state. A few extra CPU cycles.

  • Setting State to some IEnumerable they will pay to copy the KVPs to the pool storage and then logRecord.Attributes = poolStorage. More CPU cost to do that but no allocations (once the pool is warmed up).

The pattern I have been recommending previously to component authors was to set both State & Attributes \ StateValues in their processors:

    data.State = data.StateValues = this.AddFields(data, listOfKvp);
    // or
    data.State = data.Attributes = this.AddFields(data, listOfKvp);

I didn't want those users to have to pay now for additional logic/work being performed so I have a reference check in the State setter to short-circuit for this case:

    public IReadOnlyList<KeyValuePair<string, object?>>? Attributes
    {
        set
        {
            // ...

            if (this.ILoggerData.State is not null)
            {
                this.ILoggerData.State = value;
            }

            this.AttributeData = value;
         }
    }

    public object? State
    {
        set
        {
            if (ReferenceEquals(this.ILoggerData.State, value))
            {
                return;
            }

            // ...
         }
    }

Are there real issues reported by the user? How important it is?

This came to me through an issue reported in one of our internal systems. IMO it is important because it is very very difficult to troubleshoot. This particular user was up and running happily. He then updated GenevaExporter to 1.6.0 and noticed some of his data went missing. After trying a bunch of things he became frustrated and decided to downgrade. A few different engineers looked at it before me and no one was able to figure it out. A lot of pieces, a lot of confusion. The enrichment processor in play causing the pain is owned by some other team. My motivation in trying to address it via SDK is really just to eliminate support burden.

I'm okay with this change. However, I'm concerned about the slippery slope situation where we keep adding patches which make things more complex / less performant - how much confidence do you have @CodeBlanch?

I actually really like the change and I'm slightly annoyed at myself that I didn't think of it initially. Logically Attributes are the result of processing State so it seems very reasonable expectation to me that if I change the State that Attributes should reflect that.

I'm not too worried about the perf here (see above). The complexity concern is valid but I do think this smooths out the overall complexity we have with State + Attributes \ StateValues. Today you have to really understand the implementation to do it correctly. I feel like with this change it should "just work" without having to have the deep understanding.

When would the deprecation be happening 😀?

Presumably 2.0 but that may never happen.

@reyang reyang dismissed their stale review December 14, 2023 20:01

Changelog needs to be updated.

@reyang
Copy link
Member

reyang commented Dec 14, 2023

This came to me through an issue reported in one of our internal systems. IMO it is important because it is very very difficult to troubleshoot. This particular user was up and running happily. He then updated GenevaExporter to 1.6.0 and noticed some of his data went missing. After trying a bunch of things he became frustrated and decided to downgrade. A few different engineers looked at it before me and no one was able to figure it out. A lot of pieces, a lot of confusion. The enrichment processor in play causing the pain is owned by some other team. My motivation in trying to address it via SDK is really just to eliminate support burden.

Based on the context here, I think changelog needs to be updated calling out the breaking change. In addition, the buggy version needs to be unlisted/deprecated.

I'm okay with this change. However, I'm concerned about the slippery slope situation where we keep adding patches which make things more complex / less performant - how much confidence do you have @CodeBlanch?

I actually really like the change and I'm slightly annoyed at myself that I didn't think of it initially. Logically Attributes are the result of processing State so it seems very reasonable expectation to me that if I change the State that Attributes should reflect that.

What's the gap here? How to avoid it in the future?

@@ -2,6 +2,10 @@

## Unreleased

* `LogRecord.Attributes` and `LogRecord.StateValues` will now automatically
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this is bit unclear from the user perspective, consider borrow some wording from https://github.com/open-telemetry/opentelemetry-dotnet-contrib/blob/main/src/OpenTelemetry.Exporter.Geneva/CHANGELOG.md#140-beta4.

Copy link
Member

@alanwest alanwest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Logically Attributes are the result of processing State so it seems very reasonable expectation to me that if I change the State that Attributes should reflect that.

It seems fair to consider this a bug. I agree that the changelog should more clearly indicate that this is a bug fix that changes behavior.

Based on the context here, I think changelog needs to be updated calling out the breaking change. In addition, the buggy version needs to be unlisted/deprecated.

@reyang are you suggesting unlisting versions 1.5+ of the SDK? With few exceptions (crashing or security related), we normally do not unlist versions for bug fixes. Or do you consider this change a bigger deal than a regular bug fix?

@reyang
Copy link
Member

reyang commented Dec 14, 2023

@reyang are you suggesting unlisting versions 1.5+ of the SDK? With few exceptions (crashing or security related), we normally do not unlist versions for bug fixes. Or do you consider this change a bigger deal than a regular bug fix?

I was asking if this is important or not, and we should communicate in the changelog/listing/unlisting in the proper way which reflects the actual importance (from the users' perspective). So far, I got mixed answers:

Are there real issues reported by the user? How important it is?

This came to me through an issue reported in one of our internal systems. IMO it is important because it is very very difficult to troubleshoot.

@reyang
Copy link
Member

reyang commented Dec 14, 2023

@reyang are you suggesting unlisting versions 1.5+ of the SDK? With few exceptions (crashing or security related), we normally do not unlist versions for bug fixes. Or do you consider this change a bigger deal than a regular bug fix?

I was asking if this is important or not, and we should communicate in the changelog/listing/unlisting in the proper way which reflects the actual importance (from the users' perspective). So far, I got mixed answers:

Are there real issues reported by the user? How important it is?

This came to me through an issue reported in one of our internal systems. IMO it is important because it is very very difficult to troubleshoot.

And the current changelog "LogRecord.Attributes and LogRecord.StateValues will now automatically update if LogRecord.State is set inside a processor." sounds like a feature instead of bug fix (if the user bumped version and see things breaking, it'll be hard for them to correlate while reading through the changelog).

This one sounds better IMO https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry/CHANGELOG.md#151 ("fixed a breaking change...").

@alanwest
Copy link
Member

I was asking if this is important or not, and we should communicate in the changelog/listing/unlisting in the proper way which reflects the actual importance (from the users' perspective).

Gotcha. In this case, my opinion is that it does not warrant an unlisting.

This one sounds better IMO https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry/CHANGELOG.md#151 ("fixed a breaking change...").

Agree the changelog should be written in a way that makes this look like a bug fix.. I like the improved wording @CodeBlanch just pushed up.

@CodeBlanch
Copy link
Member Author

CodeBlanch commented Dec 15, 2023

I pushed an update to the CHANGELOG wording. Same one @alanwest mentioned above looks like we were posting at the same time.

I don't think this is important to the point we should backport or deprecate/unlist packages. In ~6 months I've only seen one issue about it. But it was something I was going to bring up on the next SIG for discussion.

@reyang
Copy link
Member

reyang commented Dec 15, 2023

I pushed an update to the CHANGELOG wording.

Thanks! The updated wording looks great to me!

In ~6 months I've only seen one issue about it.

Now it sounds like it is not important at all (only affecting very few users, and the impact is less severe) - makes sense to not pushing for hotfix / new release / unlisting / deprecation (in a nutshell, just sit on it until this change got picked by the next release train).

@utpilla utpilla merged commit af83eb1 into open-telemetry:main Dec 15, 2023
78 checks passed
@CodeBlanch CodeBlanch deleted the logrecord-state-processor branch December 15, 2023 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants