Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS-6870] Add OP SDS transform guide #20962

Merged
merged 9 commits into from
Dec 14, 2023
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
---
title: Sensitive Data Scanner Transform
kind: documentation
disable_toc: false
further_reading:
- link: "/observability_pipelines/setup/"
tag: "Documentation"
text: "Set up Observability Pipelines"
- link: "/observability_pipelines/working_with_data/"
tag: "Documentation"
text: "Working with data in Observability Pipelines"
---

{{< callout url="https://docs.google.com/forms/d/e/1FAIpQLSfnNnV823zAgOCowCYuXJE5cDtRqIipKsYcNpaOo1LKpGfppA/viewform" btn_hidden="false" header="Request Access!">}}
The <code>sensitive_data_scanner</code> transform is in private beta.
{{< /callout >}}

## Overview

Sensitive data, such as credit card numbers, bank routing numbers, and API keys, are often exposed unintentionally in your logs, which can expose your organization to financial and privacy risks. Use the Observability Pipelines `sensitive_data_scanner` transform to identify, tag, and optionally redact or hash sensitive information before routing data to different destinations. You can use out-of-the-box scanning rules to detect common patterns such as email addresses, credit card numbers, API keys, authorization tokens, and more. Or, create custom scanning rules using regex patterns to match sensitive information.

Check notice on line 20 in content/en/observability_pipelines/guide/sensitive_data_scanner_transform.md

View workflow job for this annotation

GitHub Actions / vale

[vale] content/en/observability_pipelines/guide/sensitive_data_scanner_transform.md#L20

[Datadog.sentencelength] Try to keep your sentence length to 25 words or fewer.
Raw output
{"message": "[Datadog.sentencelength] Try to keep your sentence length to 25 words or fewer.", "location": {"path": "content/en/observability_pipelines/guide/sensitive_data_scanner_transform.md", "range": {"start": {"line": 20, "column": 1}}}, "severity": "INFO"}

Check notice on line 20 in content/en/observability_pipelines/guide/sensitive_data_scanner_transform.md

View workflow job for this annotation

GitHub Actions / vale

[vale] content/en/observability_pipelines/guide/sensitive_data_scanner_transform.md#L20

[Datadog.sentencelength] Try to keep your sentence length to 25 words or fewer.
Raw output
{"message": "[Datadog.sentencelength] Try to keep your sentence length to 25 words or fewer.", "location": {"path": "content/en/observability_pipelines/guide/sensitive_data_scanner_transform.md", "range": {"start": {"line": 20, "column": 381}}}, "severity": "INFO"}
maycmlee marked this conversation as resolved.
Show resolved Hide resolved

## Set up the `sensitive_data_scanner` transform
maycmlee marked this conversation as resolved.
Show resolved Hide resolved

1. Navigate to [Observability Pipelines][1].
1. Click on your pipeline.
1. Click **Edit draft**.
maycmlee marked this conversation as resolved.
Show resolved Hide resolved
1. Click **+ Add Component**.
1. Select the **Transforms** tab.
1. Click the **Sensitive Data Scanner** tile.
1. Enter a name for the component.
1. Select one or more inputs for the transform.
1. Click **Add a New Item** to add a scanning rule, which determines what sensitive information to match within the data.
1. Enter a name for the rule.
1. In the **Define action on match** section, select the action you want to take for the matched information. Redaction, partial redaction, and hashing are all irreversible actions.
maycmlee marked this conversation as resolved.
Show resolved Hide resolved
- If you are redacting the information, specify the text to replace the matched data.
- If you are partially redacting the information, specify the number of characters you want to redact and which part of the matched data to redact.
1. In the **Pattern** section:
- To create a custom scanning rule:
a. Select **Custom** in the **type** dropdown.
b. In the **Define regex** field, enter the regex pattern to check against the data.
- To use an out-of-the-box scanning rule:
a. Select **Library** in the **type** dropdown.
b. Select the scanning rule you want to use in the **Name** dropdown.
1. In the **Scan entire event or portion of it** section:
a. Select if you want to scan the **Entire Event** or **Specific Attributes** in the **Target** dropdown.
- If you are scanning the entire event, you can optionally exclude specific attributes from getting scanned.
- If you are scanning specific attributes, specify which attributes you want to scan.
1. Optionally, add one or more tags to associate with the matched events.
1. If you want to add another rule, click **Add a New Item** and follow steps 10 to 14.
1. Click **Save**.

**Note**: Any rules that you add or update only affect data coming into Observability Pipelines after the rule was defined.

The `sensitive_data_scanner` transform supports Perl Compatible RegEx (PCRE), but the following patterns are not supported:
maycmlee marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

@fuchsnj fuchsnj Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `sensitive_data_scanner` transform supports Perl Compatible RegEx (PCRE), but the following patterns are not supported:
The `sensitive_data_scanner` transform supports Perl Compatible RegEx (PCRE2), but the following patterns are not supported:

The supported syntax is a subset of PCRE2. I have compiled an exhaustive list of exactly what is supported if that helps here at all: https://datadoghq.atlassian.net/wiki/spaces/SDS/pages/3221292288/SDS+Regex+Syntax

Just note that the syntax listed above is what the Core SDS library supports. OPW is currently using a different implementation (which is NOT PCRE or PCRE2 compliant) but it will be switched out very soon (likely next week).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fuchsnj, good to know! I'd like to publish this doc this week, do you think it makes sense to remove this information for now, and then add it back when OPW has switched to be PCRE2 compliant?

Also, thanks for the confluence link. At some point, it seems we should have public documentation on what regex syntax is supported? Is it usually so specific what syntax is available? Or can we point them to the regex101.com site you mention?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to publish this doc this week, do you think it makes sense to remove this information for now, and then add it back when OPW has switched to be PCRE2 compliant?

It's probably fine to keep it as PCRE2 for now. The differences are fairly subtle and not something users are likely to notice.

At some point, it seems we should have public documentation on what regex syntax is supported?

That confluence doc is mostly for internal documentation now, and to help write any public documentation whenever that happens. I don't expect you to include the whole thing here. I expect in the future we will probably have more detailed public docs about the regex syntax in the SDS docs that you could link to here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks for the info!

- Backreferences and capturing sub-expressions (lookarounds)
- Arbitrary zero-width assertions
- Subroutine references and recursive patterns
- Conditional patterns
- Backtracking control verbs
- The \C "single-byte" directive (which breaks UTF-8 sequences)
- The \R newline match
- The \K start of match reset directive
- Callouts and embedded code
- Atomic grouping and possessive quantifiers

## Further reading

{{< partial name="whats-next/whats-next.html" >}}

[1]: https://app.datadoghq.com/observability-pipelines
Loading