Skip to content

Latest commit

 

History

History
167 lines (95 loc) · 3.67 KB

README.md

File metadata and controls

167 lines (95 loc) · 3.67 KB
parent title nav_exclude
Infrastructure Components
AWS Glue-Job
false

AWS Glue-Job

source = "git::https://github.com/slalom-ggp/dataops-infra/tree/main/components/aws/glue-job?ref=main"

Overview

Glue is AWS's fully managed extract, transform, and load (ETL) service. A Glue job can be used job to run ETL Python scripts.

Requirements

No requirements.

Providers

The following providers are used by this module:

  • aws

Required Inputs

The following input variables are required:

name_prefix

Description: Standard name_prefix module input. (Prefix counts towards 64-character max length for certain resource types.)

Type: string

environment

Description: Standard environment module input.

Type:

object({
    vpc_id          = string
    aws_region      = string
    public_subnets  = list(string)
    private_subnets = list(string)
  })

resource_tags

Description: Standard resource_tags module input.

Type: map(string)

s3_script_bucket_name

Description: S3 script bucket for Glue transformation job.

Type: string

s3_source_bucket_name

Description: S3 source bucket for Glue transformation job.

Type: string

s3_destination_bucket_name

Description: S3 destination bucket for Glue transformation job.

Type: string

Optional Inputs

The following input variables are optional (have default values):

local_script_path

Description: Optional. If provided, the local script will automatically be uploaded to the remote bucket path. In not provided, will use s3_script_path instead.

Type: string

Default: null

s3_script_path

Description: Ignored if local_script_path is provided. Otherwise, the file at this path will be used for the Glue script.

Type: string

Default: null

with_spark

Description: (Default=True). True for standard PySpark Glue job. False for Python Shell.

Type: bool

Default: true

num_workers

Description: Min 2. The number or worker nodes to dedicate to each instance of the job.

Type: number

Default: 2

max_instances

Description: The maximum number of simultaneous executions.

Type: number

Default: 10

default_arguments

Description: The map of default arguments for this job. You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

Example:

default_arguments = {
  "--DATA_BUCKET_NAME" = "my-bucket"
}

For additional information, see:

Type: map(string)

Default: {}

Outputs

The following outputs are exported:

glue_job_name

Description: The name of the Glue job.

summary

Description: Summary of Glue resources created.


Source Files

Source code for this module is available using the links below.


NOTE: This documentation was auto-generated using terraform-docs and s-infra from slalom.dataops. Please do not attempt to manually update this file.