Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: CDI-2183 Add databricks-cluster-log-permissions module #532

Merged
merged 3 commits into from
Oct 31, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions databricks-cluster-log-permissions/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# README
<!-- START -->
## Requirements

| Name | Version |
|------|---------|
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | >= 0.13 |

## Providers

| Name | Version |
|------|---------|
| <a name="provider_aws"></a> [aws](#provider\_aws) | n/a |
| <a name="provider_aws.czi-logs"></a> [aws.czi-logs](#provider\_aws.czi-logs) | n/a |
| <a name="provider_databricks"></a> [databricks](#provider\_databricks) | n/a |

## Modules

No modules.

## Resources

| Name | Type |
|------|------|
| [aws_iam_instance_profile.cluster_log_cluster](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_instance_profile) | resource |
| [aws_iam_instance_profile.cluster_log_cluster_rw](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_instance_profile) | resource |
| [aws_iam_policy.cluster_log_bucket_read_access](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_policy) | resource |
| [aws_iam_policy.cluster_log_bucket_write_access](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_policy) | resource |
| [aws_iam_role.cluster_log_cluster_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role) | resource |
| [aws_iam_role.cluster_log_rw_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role) | resource |
| [aws_iam_role_policy_attachment.additional_write_access_attachment](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy_attachment) | resource |
| [aws_iam_role_policy_attachment.read_access_attachment](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy_attachment) | resource |
| [aws_iam_role_policy_attachment.write_access_attachment_default_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy_attachment) | resource |
| [aws_iam_role_policy_attachment.write_access_attachment_rw_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy_attachment) | resource |
| [aws_kms_grant.additional_bucket_kms_encryption_key_grant](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/kms_grant) | resource |
| [aws_kms_grant.bucket_kms_encryption_key_grant_default](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/kms_grant) | resource |
| [aws_kms_grant.bucket_kms_encryption_key_grant_rw](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/kms_grant) | resource |
| [databricks_instance_profile.cluster_log_cluster](https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/instance_profile) | resource |
| [databricks_instance_profile.cluster_log_cluster_rw](https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/instance_profile) | resource |
| [aws_caller_identity.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/caller_identity) | data source |
| [aws_iam_policy_document.assume_role_for_cluster_log_cluster](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_iam_policy_document.cluster_log_bucket_read_access](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_iam_policy_document.cluster_log_bucket_write_access](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |

## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_add_reader"></a> [add\_reader](#input\_add\_reader) | Flag to add reader role for logs - should only be invoked for the ie workspace | `bool` | `false` | no |
| <a name="input_bucket_kms_encryption_key_arn"></a> [bucket\_kms\_encryption\_key\_arn](#input\_bucket\_kms\_encryption\_key\_arn) | ARN for KMS key used to encrypt bucket for cluster logs | `string` | n/a | yes |
| <a name="input_env"></a> [env](#input\_env) | Environment name | `string` | n/a | yes |
| <a name="input_existing_role_names"></a> [existing\_role\_names](#input\_existing\_role\_names) | List of other existing instance policy roles on the workspace for which to add cluster log write permissions | `list(string)` | `[]` | no |

## Outputs

| Name | Description |
|------|-------------|
| <a name="output_default_logging_role_arn"></a> [default\_logging\_role\_arn](#output\_default\_logging\_role\_arn) | ARN of the AWS IAM role created for default logs access |
| <a name="output_rw_logging_role_arn"></a> [rw\_logging\_role\_arn](#output\_rw\_logging\_role\_arn) | ARN of the AWS IAM role created for read and write logs access |
| <a name="output_rw_logging_role_instance_profile_arn"></a> [rw\_logging\_role\_instance\_profile\_arn](#output\_rw\_logging\_role\_instance\_profile\_arn) | ARN of the AWS instance profile created for read and write logs access |
<!-- END -->
200 changes: 200 additions & 0 deletions databricks-cluster-log-permissions/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
# - Creates a standard instance policy to allow clusters to write cluster logs to a destination S3 bucket
# - For a given list of instance profiles, also appends a policy attachment to allow them to write cluster logs, too

###
locals {
default_role_name = "cluster_log_cluster_role" # standard role for clusters - allows both writing and reading cluster logs for only the same workspace
read_write_role_name = "cluster_log_rw_role" # special role - allows both writing and reading cluster logs for all workspaces
path = "/databricks/"

# hacky way to validate if this workspace/cluster should have read permissions
# tflint-ignore: terraform_unused_declarations
validate_add_reader = (var.add_reader == true && var.env != var.global_reader_env) ? tobool("add_reader is not supported for this environment") : true

databricks_bucket_cluster_log_prefix = "cluster-logs"

# kms grants - all roles can read and write
read_write_operations = ["Encrypt", "GenerateDataKey", "Decrypt"]
}

data "aws_iam_policy_document" "assume_role_for_cluster_log_cluster" {
statement {
effect = "Allow"
actions = ["sts:AssumeRole"]
principals {
identifiers = ["ec2.amazonaws.com"]
type = "Service"
}
}
}
resource "aws_iam_role" "cluster_log_cluster_role" {
name = local.default_role_name
path = local.path
description = "Role for cluster to write to cluster log bucket"
assume_role_policy = data.aws_iam_policy_document.assume_role_for_cluster_log_cluster.json
}

resource "aws_iam_role" "cluster_log_rw_role" {
count = var.add_reader == true ? 1 : 0

name = local.read_write_role_name
path = local.path
description = "Role for cluster to read from and write to cluster log bucket"
assume_role_policy = data.aws_iam_policy_document.assume_role_for_cluster_log_cluster.json
}

###
## write and limited read access
data "aws_iam_policy_document" "cluster_log_bucket_write_access" {
statement {
sid = "ReadWriteClusterLogs"
actions = [
"s3:PutObject",
"s3:PutObjectAcl",
"s3:GetObject",
"s3:ListBucket",
"s3:GetBucketLocation"
]

resources = [
"arn:aws:s3:::${var.databricks_logs_bucket_name}/${local.databricks_bucket_cluster_log_prefix}/*",
"arn:aws:s3:::${var.databricks_logs_bucket_name}"
]
}
statement {
sid = "ReadWriteEncryptedClusterLogs"
actions = [
"kms:Encrypt",
"kms:Decrypt",
"kms:GenerateDataKey",
]

resources = [
var.bucket_kms_encryption_key_arn
]
}
}

resource "aws_iam_policy" "cluster_log_bucket_write_access" {
name = "cluster_log_bucket_write_access_policy"
path = local.path
policy = data.aws_iam_policy_document.cluster_log_bucket_write_access.json
}

resource "aws_iam_role_policy_attachment" "write_access_attachment_default_role" {
policy_arn = aws_iam_policy.cluster_log_bucket_write_access.arn
role = local.default_role_name
}

resource "aws_iam_role_policy_attachment" "write_access_attachment_rw_role" {
count = var.add_reader == true ? 1 : 0

policy_arn = aws_iam_policy.cluster_log_bucket_write_access.arn
role = local.read_write_role_name
}

## non-standard global-read access

data "aws_iam_policy_document" "cluster_log_bucket_read_access" {
count = var.add_reader == true ? 1 : 0

statement {
sid = "ReadAllClusterLogs"
actions = [
"s3:GetObject",
"s3:GetObjectVersion"
]

resources = [
"arn:aws:s3:::${var.databricks_logs_bucket_name}/*",
"arn:aws:s3:::${var.databricks_logs_bucket_name}"
]
}
}

resource "aws_iam_policy" "cluster_log_bucket_read_access" {
count = var.add_reader == true ? 1 : 0

name = "cluster_log_bucket_read_access_policy"
path = local.path
policy = data.aws_iam_policy_document.cluster_log_bucket_read_access[0].json
}

resource "aws_iam_role_policy_attachment" "read_access_attachment" {
count = var.add_reader == true ? 1 : 0

policy_arn = aws_iam_policy.cluster_log_bucket_read_access[0].arn
role = local.read_write_role_name
}

## kms access

data "aws_caller_identity" "current" {
provider = aws
}

resource "aws_kms_grant" "bucket_kms_encryption_key_grant_default" {
provider = aws.logs_destination

name = "cluster-log-kms-grant-${data.aws_caller_identity.current.account_id}-write"
key_id = var.bucket_kms_encryption_key_arn
grantee_principal = aws_iam_role.cluster_log_cluster_role.arn
operations = local.read_write_operations
}

resource "aws_kms_grant" "bucket_kms_encryption_key_grant_rw" {
count = var.add_reader == true ? 1 : 0
provider = aws.logs_destination

name = "cluster-log-kms-grant-${data.aws_caller_identity.current.account_id}-read-write"
key_id = var.bucket_kms_encryption_key_arn
grantee_principal = aws_iam_role.cluster_log_rw_role[0].arn
operations = local.read_write_operations
}

## standard instance profile(s)

resource "aws_iam_instance_profile" "cluster_log_cluster" {
name = "cluster-log-cluster-instance-profile"
path = local.path
role = aws_iam_role.cluster_log_cluster_role.name
}

resource "databricks_instance_profile" "cluster_log_cluster" {
depends_on = [aws_iam_instance_profile.cluster_log_cluster]
instance_profile_arn = aws_iam_instance_profile.cluster_log_cluster.arn
}

resource "aws_iam_instance_profile" "cluster_log_cluster_rw" {
count = var.add_reader == true ? 1 : 0

name = "cluster-log-rw-instance-profile"
path = local.path
role = aws_iam_role.cluster_log_rw_role[0].name
}

resource "databricks_instance_profile" "cluster_log_cluster_rw" {
count = var.add_reader == true ? 1 : 0

depends_on = [aws_iam_instance_profile.cluster_log_cluster_rw]
instance_profile_arn = aws_iam_instance_profile.cluster_log_cluster_rw[0].arn
}

## attach policies to given list of existing instance profiles

resource "aws_iam_role_policy_attachment" "additional_write_access_attachment" {
for_each = toset(var.existing_role_names)

policy_arn = aws_iam_policy.cluster_log_bucket_write_access.arn
role = each.value
}

resource "aws_kms_grant" "additional_bucket_kms_encryption_key_grant" {
for_each = toset(var.existing_role_names)
provider = aws.logs_destination

name = "cluster-log-kms-grant-${data.aws_caller_identity.current.account_id}"
key_id = var.bucket_kms_encryption_key_arn
grantee_principal = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/databricks/${each.value}"
operations = local.read_write_operations
}
14 changes: 14 additions & 0 deletions databricks-cluster-log-permissions/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
output "default_logging_role_arn" {
description = "ARN of the AWS IAM role created for default logs access"
value = aws_iam_role.cluster_log_cluster_role.arn
}

output "rw_logging_role_arn" {
description = "ARN of the AWS IAM role created for read and write logs access"
value = one(aws_iam_role.cluster_log_rw_role[*].arn)
}

output "rw_logging_role_instance_profile_arn" {
description = "ARN of the AWS instance profile created for read and write logs access"
value = one(aws_iam_instance_profile.cluster_log_cluster_rw[*].arn)
}
10 changes: 10 additions & 0 deletions databricks-cluster-log-permissions/providers.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
provider "aws" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alias = "logs_destination"
region = var.destination_account_region

assume_role {
role_arn = "arn:aws:iam::${var.destination_account_id}:role/${var.destination_account_assume_role_name}"
}

allowed_account_ids = [var.destination_account_id]
}
46 changes: 46 additions & 0 deletions databricks-cluster-log-permissions/variables.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
variable "env" {
description = "Environment name"
type = string
}

variable "add_reader" {
description = "Flag to add reader role for logs - should only be invoked for the ie workspace"
type = bool
default = false
}

variable "bucket_kms_encryption_key_arn" {
description = "ARN for KMS key used to encrypt bucket for cluster logs"
type = string
}

variable "existing_role_names" {
description = "List of other existing instance policy roles on the workspace for which to add cluster log write permissions"
type = list(string)
default = []
}

variable "databricks_logs_bucket_name" {
description = "Name of the bucket to store cluster logs"
type = string
}

variable "global_reader_env" {
description = "Name of env to grant global logs reader access to"
type = string
}

variable "destination_account_id" {
description = "Account ID for the logs destination AWS account"
type = string
}

variable "destination_account_region" {
description = "Region for the logs destination AWS account"
type = string
}

variable "destination_account_assume_role_name" {
description = "Role name to assume in the logs destination AWS account"
type = string
}
8 changes: 8 additions & 0 deletions databricks-cluster-log-permissions/versions.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
terraform {
required_providers {
databricks = {
source = "databricks/databricks"
}
}
required_version = ">= 1.3.0"
}
Loading