Minimize telemetry surface area #2844

davidmrdavid · 2024-06-04T00:51:48Z

Under certain conditions (user opting in to tracing raw inputs and outputs), our telemetry may capture sensitive information. For example, through logs of exceptions, inputs, output, etc. T

This PR aims to remove the sensitive information by implementing these guidelines:

(1) User-provided inputs and outputs are never part of telemetry as-is. This includes the "reason" label for operations like terminate or rewind, as well as the simpler cases of inputs and outputs to orchestrators and entities.

(2) When application-level exceptions are logged, we only log a sanitized version of them that includes the exception type, and the exception stack trace. The exception message is absent, as it may contain sensitive information.

And that's it. To this end, I've refactored many methods in the EndToEndTracer class, as well as their callers. In most cases, the refactoring is simply to avoid logging sensitive parameters via ETW, but still logging them to the user's Application Insights.

The more complicated refactorings are all around logging exceptions. In the easy case - I simply refactored the string parameter containing the exception string into an actual Exception type, that I can use to create a sanitized exception string containing only the exception type and stack trace. Other cases where the exception object wasn't available in the immediate caller are handled on a case-by-case basis.

Finally - some diffs in this PR are minor improvements like adding nullable analysis and removing build warnings.

resolves N/A

Pull request checklist

davidmrdavid · 2024-06-04T01:12:17Z

src/WebJobs.Extensions.DurableTask/EndToEndTraceHelper.cs

@@ -239,26 +238,6 @@ public void ExtensionWarningEvent(string hubName, string functionName, string in
            }
        }

-        public void ProcessingOutOfProcPayload(


this wasn't used

src/WebJobs.Extensions.DurableTask/Listener/TaskEntityShim.cs

davidmrdavid · 2024-06-11T20:47:34Z

Just realized a bunch of logging unit tests broke, which is good, but it means I need to update them. Still, I'd appreciate a quick pass on the proposed refactoring in the meantime. Thanks

davidmrdavid · 2024-06-13T01:41:36Z

Tests are passing. I'd appreciate a review here, @cgillum. Thanks!

cgillum · 2024-06-13T16:52:48Z

src/WebJobs.Extensions.DurableTask/EndToEndTraceHelper.cs

+            return sanitizedPayload;
+        }
+
+        private string SanitizeException(Exception? exception, out string iloggerExceptionString, bool isReplay = false)


The design of this API is a bit confusing. It's not clear what the difference is supposed to be between the output (return value) and the out variable. Rather than having a named out parameter and an unnamed return value, it would be better from a code understandability perspective to have two named out parameters and just return void. Having this kind of usage clarity is especially important in this case because making a mistake in interpreting the outputs of this method could cause us to accidentally leak sensitive information.

Good point - I have refactored the implementation to use two out variables.

cgillum · 2024-06-13T16:55:50Z

src/WebJobs.Extensions.DurableTask/EndToEndTraceHelper.cs

        public void FunctionFailed(
            string hubName,
            string functionName,
            string instanceId,
            string reason,
+            string sanitizedReason,


I noticed that sanitizedReason is going to ETW while reason is going to ILogger, but don't we send both of these payloads to Kusto, ETW to DurableFunctionsEvents and ILogger to FunctionsLogs? I know that's not the case for the DTFx tracing code, but I thought the WebJobs ILogger logs went to FunctionsLogs.

You're right. Throughout this PR, we only sanitize the logs to DurableFunctionsEvents and not the ILogger-powered logs to FunctionsLogs. This is because the ILogger logs are also sent to the user's Application Insights instance, and I was worried about changing the logging behavior to a user-facing component.

So I'm choosing to delay our decision to sanitize the FunctionsLogs for now so that we can unblock sanitizing the DF Kusto table. Does that seem reasonable?

OK, thanks for clarifying. It would be good to confirm what the plan is for sanitizing FunctionsLogs just to make sure that the work done in this PR isn't redundant or need to be reversed. Happy to chat about this offline.

Let's discuss this offline. But in general I think the clean up done here is unlikely to need to be reversed. If anything, I think it's more likely that it will need to be expanded to include FunctionsLogs as well. I'm just opting to merge a minimal improvement for now.

cgillum · 2024-06-13T16:58:09Z

src/WebJobs.Extensions.DurableTask/OutOfProcMiddleware.cs

@@ -95,7 +95,7 @@ public async Task CallOrchestratorAsync(DispatchMiddlewareContext dispatchContex
                this.Options.HubName,
                functionName.Name,
                instance.InstanceId,
-                isReplaying ? "(replay)" : this.extension.GetIntputOutputTrace(startEvent.Input),
+                startEvent.Input,


I can't tell by looking at the diffs, but are we using GetInputOutputTrace at all anymore in the code?

At the time this of writing this comment, there was a single leftover use, but that was an accident. Starting from this commit (43f63fc) you can see the method has been removed.

So yes - this PR intends to remove this method altogether. Instead of sanitizing the logs at each calling site of our EndToEndTraceHelper, I opted to centralizing the sanitization in the EndtoEndTraceHelper itself. I think that should help minimize the chance of sanitization errors.

src/WebJobs.Extensions.DurableTask/EndToEndTraceHelper.cs

src/WebJobs.Extensions.DurableTask/Listener/TaskEntityShim.cs

src/WebJobs.Extensions.DurableTask/EndToEndTraceHelper.cs

…ithub.com/Azure/azure-functions-durable-extension into dajusto/remove-potentially-sensitive-logs

test/FunctionsV2/EndToEndTraceHelperTests.cs

…e-extension into dajusto/remove-potentially-sensitive-logs

bachuv

LGTM!

first draft

94a1f2c

davidmrdavid changed the title ~~[WIP] Removing potentially sensitive logs from telemetry~~ Remove potentially sensitive logs from telemetry Jun 4, 2024

davidmrdavid changed the title ~~Remove potentially sensitive logs from telemetry~~ Minimize telemetry surface area Jun 4, 2024

davidmrdavid marked this pull request as ready for review June 4, 2024 01:11

davidmrdavid commented Jun 4, 2024

View reviewed changes

src/WebJobs.Extensions.DurableTask/Listener/TaskEntityShim.cs Show resolved Hide resolved

davidmrdavid added 14 commits June 10, 2024 13:08

remove unecessary diff

a56cf6d

refactor

ffdd81c

refactor

17dceeb

Refactor

ba345d1

refactor

77ebaa6

remove unecessary diff

e1e865c

add nullable checks

6bd3a33

add nullable checks in E2ETraceHelper

95bcd87

remove whitespace

dbd62d2

add nullability checks

1ea40e3

clean up

a43292c

refactorings

e4fab7a

refactorings

39f400e

add comments in csproj

80f6a09

davidmrdavid added 7 commits June 12, 2024 17:40

quick test

d3103fe

remove nullable assignment

1fe4e67

small edit

caf746e

minor refactor

10ac5a9

remove line

900b1a2

remove line

032f566

null string handling

f31cad9

davidmrdavid requested a review from cgillum June 13, 2024 01:41

cgillum reviewed Jun 13, 2024

View reviewed changes

apply feedback

43f63fc

davidmrdavid requested a review from cgillum June 19, 2024 23:24

Merge branch 'dev' into dajusto/remove-potentially-sensitive-logs

f30f622

bachuv reviewed Jun 27, 2024

View reviewed changes

davidmrdavid added 7 commits July 2, 2024 17:03

remove replay controls

b1ec95d

Merge branch 'dajusto/remove-potentially-sensitive-logs' of https://g…

de2e41c

…ithub.com/Azure/azure-functions-durable-extension into dajusto/remove-potentially-sensitive-logs

add unit tests

73fed6e

Add unit tests

e16b912

clean up tests

8ed5d30

pass tests

5d15568

increase timeout

778cecd

bachuv reviewed Jul 8, 2024

View reviewed changes

test/FunctionsV2/EndToEndTraceHelperTests.cs Show resolved Hide resolved

davidmrdavid added 2 commits July 9, 2024 09:03

pass linter

bac3b33

Merge branch 'dev' of https://github.com/Azure/azure-functions-durabl…

2af6b47

…e-extension into dajusto/remove-potentially-sensitive-logs

bachuv approved these changes Jul 9, 2024

View reviewed changes

davidmrdavid merged commit c88b62c into dev Jul 9, 2024
12 checks passed

davidmrdavid deleted the dajusto/remove-potentially-sensitive-logs branch July 9, 2024 18:42

bachuv pushed a commit that referenced this pull request Jul 9, 2024

Minimize telemetry surface area (#2844)

6dc67b9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimize telemetry surface area #2844

Minimize telemetry surface area #2844

davidmrdavid commented Jun 4, 2024 •

edited

Loading

davidmrdavid Jun 4, 2024

davidmrdavid commented Jun 11, 2024

davidmrdavid commented Jun 13, 2024

cgillum Jun 13, 2024

davidmrdavid Jun 17, 2024

cgillum Jun 13, 2024

davidmrdavid Jun 17, 2024

cgillum Jun 17, 2024

davidmrdavid Jun 17, 2024

cgillum Jun 13, 2024

davidmrdavid Jun 17, 2024

bachuv left a comment

Minimize telemetry surface area #2844

Minimize telemetry surface area #2844

Conversation

davidmrdavid commented Jun 4, 2024 • edited Loading

Pull request checklist

Choose a reason for hiding this comment

davidmrdavid commented Jun 11, 2024

davidmrdavid commented Jun 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bachuv left a comment

Choose a reason for hiding this comment

davidmrdavid commented Jun 4, 2024 •

edited

Loading