Skip to content

Latest commit

 

History

History
85 lines (70 loc) · 4.33 KB

dlp.md

File metadata and controls

85 lines (70 loc) · 4.33 KB

Data Loss Prevention API

From the Cloud Data Loss Prevention documentation, "Cloud DLP helps you better understand and manage sensitive data. It provides fast, scalable classification and redaction for sensitive data elements like credit card numbers, names, social security numbers, US and selected international identifier numbers, phone numbers, and GCP credentials. Cloud DLP classifies this data using more than 90 predefined detectors to identify patterns, formats, and checksums, and even understands contextual clues. You can optionally redact data as well, using techniques like masking, secure hashing, tokenization, bucketing, and format-preserving encryption."

In this project, the DLP API is configured as a logging filter for the paymentservice microservice of the microservices-demo application. Since all application logs are sent to Stackdriver Logging, this filter is added to remove sensitive data from log events before reaching the Stackdriver target.

This is accomplished by customizing the fluentd configuration so that the paymentservice application logs are not initially sent directly to Stackdriver, but are first submitted to the DLP API for redaction. What is returned is then logged via submission to Stackdriver Logs.

Log Redaction via the DLP API

Specifically, after a purchase is completed in the microservices demo web application, a log event such as this is generated by paymentservice. Note that the unredacted (demo) credit card number is included in the log event:

{"severity":"info","time":1555345379891,"message":"PaymentService#Charge invoked with request {\"amount\":{\"currency_code\":\"USD\",\"units\":\"41\",\"nanos\":180000000},\"credit_card\":{\"credit_card_number\":\"4432-8015-6152-0454\",\"credit_card_cvv\":672,\"credit_card_expiration_year\":2020,\"credit_card_expiration_month\":1}}","pid":1,"hostname":"paymentservice-799fb9bdd-9sqdt","name":"paymentservice-server","v":1}

Once sent to the DLP API, this is what is returned and logged:

{
...
 severity:  "INFO"
 textPayload:  "{"severity":"info","time":1555345379891,"message":"PaymentService#Charge invoked with request {\"amount\":{\"currency_code\":\"USD\",\"units\":\"41\",\"nanos\":180000000},\"credit_card\":{\"credit_card_number\":\"[CREDIT_CARD_NUMBER]\",\"credit_card_cvv\":672,\"credit_card_expiration_year\":2020,\"credit_card_expiration_month\":1}}","pid":1,"hostname":"paymentservice-799fb9bdd-9sqdt","name":"paymentservice-server","v":1}
"
 timestamp:  "2019-04-15T16:22:59.891425283Z"
}

This is the DeidentifyTemplate template created as described in the readme:

"deidentifyConfig": {
    "infoTypeTransformations": {
      "transformations": [
        {
          "infoTypes": [
            {
              "name": "CREDIT_CARD_NUMBER"
            }
          ],
          "primitiveTransformation": {
            "replaceWithInfoTypeConfig": {}
          }
        }
      ]
    }
  }
}

The API takes in a DeidentifyTemplate, including a list of infoTypes. From the InfoType documentation: "...name is either a name of your choosing when creating a CustomInfoType, or one of the names listed at https://cloud.google.com/dlp/docs/infotypes-reference when specifying a built-in type". The InfoTypes reference includes a predefined template CREDIT_CARD_NUMBER to match credit card numbers globally.

Scaling Considerations

It's important to note that, as configured, the in-scope-cluster's fluentd will be making a DLP API call for every log event (and not necessarily every one containing credit card data) sent to the paymentservice. That can be seen in the fluentd's configuration file here.' With this configuration, there are two limiting considerations to take in to account. First, there are DLP API limits, published at https://cloud.google.com/dlp/limits which default to 600 requests/minute. Second, the cost scales with the number of requests (DLP Pricing). This should be taken into account before deploying this style architecture at scale.