-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimize telemetry surface area #2844
Conversation
@@ -239,26 +238,6 @@ public void ExtensionWarningEvent(string hubName, string functionName, string in | |||
} | |||
} | |||
|
|||
public void ProcessingOutOfProcPayload( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this wasn't used
Just realized a bunch of logging unit tests broke, which is good, but it means I need to update them. Still, I'd appreciate a quick pass on the proposed refactoring in the meantime. Thanks |
Tests are passing. I'd appreciate a review here, @cgillum. Thanks! |
return sanitizedPayload; | ||
} | ||
|
||
private string SanitizeException(Exception? exception, out string iloggerExceptionString, bool isReplay = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The design of this API is a bit confusing. It's not clear what the difference is supposed to be between the output (return value) and the out
variable. Rather than having a named out
parameter and an unnamed return value, it would be better from a code understandability perspective to have two named out
parameters and just return void
. Having this kind of usage clarity is especially important in this case because making a mistake in interpreting the outputs of this method could cause us to accidentally leak sensitive information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - I have refactored the implementation to use two out
variables.
public void FunctionFailed( | ||
string hubName, | ||
string functionName, | ||
string instanceId, | ||
string reason, | ||
string sanitizedReason, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that sanitizedReason
is going to ETW while reason
is going to ILogger
, but don't we send both of these payloads to Kusto, ETW to DurableFunctionsEvents and ILogger to FunctionsLogs? I know that's not the case for the DTFx tracing code, but I thought the WebJobs ILogger
logs went to FunctionsLogs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Throughout this PR, we only sanitize the logs to DurableFunctionsEvents
and not the ILogger-powered logs to FunctionsLogs
. This is because the ILogger logs are also sent to the user's Application Insights instance, and I was worried about changing the logging behavior to a user-facing component.
So I'm choosing to delay our decision to sanitize the FunctionsLogs for now so that we can unblock sanitizing the DF Kusto table. Does that seem reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks for clarifying. It would be good to confirm what the plan is for sanitizing FunctionsLogs
just to make sure that the work done in this PR isn't redundant or need to be reversed. Happy to chat about this offline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss this offline. But in general I think the clean up done here is unlikely to need to be reversed. If anything, I think it's more likely that it will need to be expanded to include FunctionsLogs as well. I'm just opting to merge a minimal improvement for now.
@@ -95,7 +95,7 @@ public async Task CallOrchestratorAsync(DispatchMiddlewareContext dispatchContex | |||
this.Options.HubName, | |||
functionName.Name, | |||
instance.InstanceId, | |||
isReplaying ? "(replay)" : this.extension.GetIntputOutputTrace(startEvent.Input), | |||
startEvent.Input, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't tell by looking at the diffs, but are we using GetInputOutputTrace
at all anymore in the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the time this of writing this comment, there was a single leftover use, but that was an accident. Starting from this commit (43f63fc) you can see the method has been removed.
So yes - this PR intends to remove this method altogether. Instead of sanitizing the logs at each calling site of our EndToEndTraceHelper, I opted to centralizing the sanitization in the EndtoEndTraceHelper itself. I think that should help minimize the chance of sanitization errors.
…ithub.com/Azure/azure-functions-durable-extension into dajusto/remove-potentially-sensitive-logs
…e-extension into dajusto/remove-potentially-sensitive-logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Under certain conditions (user opting in to tracing raw inputs and outputs), our telemetry may capture sensitive information. For example, through logs of exceptions, inputs, output, etc. T
This PR aims to remove the sensitive information by implementing these guidelines:
(1) User-provided inputs and outputs are never part of telemetry as-is. This includes the "reason" label for operations like terminate or rewind, as well as the simpler cases of inputs and outputs to orchestrators and entities.
(2) When application-level exceptions are logged, we only log a sanitized version of them that includes the exception type, and the exception stack trace. The exception message is absent, as it may contain sensitive information.
And that's it. To this end, I've refactored many methods in the
EndToEndTracer
class, as well as their callers. In most cases, the refactoring is simply to avoid logging sensitive parameters via ETW, but still logging them to the user's Application Insights.The more complicated refactorings are all around logging exceptions. In the easy case - I simply refactored the string parameter containing the exception string into an actual
Exception
type, that I can use to create a sanitized exception string containing only the exception type and stack trace. Other cases where the exception object wasn't available in the immediate caller are handled on a case-by-case basis.Finally - some diffs in this PR are minor improvements like adding nullable analysis and removing build warnings.
resolves N/A
Pull request checklist
pending_docs.md
release_notes.md
/src/Worker.Extensions.DurableTask/AssemblyInfo.cs