Skip to content

Ontotext-AD/terraform-aws-graphdb

Repository files navigation

GraphDB AWS Terraform Module

This Terraform module allows you to provision an GraphDB cluster within a Virtual Private Cloud (VPC). The module provides a flexible way to configure the cluster and the associated VPC components. It implements the GraphDB reference architecture. Check the official documentation for more details.

Table of contents

About GraphDB

GraphDB logo

Ontotext GraphDB is a highly efficient, scalable and robust graph database with RDF and SPARQL support. With excellent enterprise features, integration with external search applications, compatibility with industry standards, and both community and commercial support, GraphDB is the preferred database choice of both small independent developers and big enterprises.

Features

The module provides the building blocks of configuring, deploying and provisioning a highly available cluster of GraphDB across multiple availability zones using EC2 Autoscaling Group. Key features of the module include:

  • EC2 Autoscaling Group
  • Network Load Balancer
  • NAT Gateway for outbound connections
  • Route53 Private Hosted Zone for internal GraphDB cluster communication
  • IAM Policies and roles
  • VPC
  • Monitoring
  • Backup
  • and many more

Versioning

The Terraform module follows the Semantic Versioning 2.0.0 rules and has a release lifecycle separate from the GraphDB versions. The next table shows the version compatability between GraphDB, and the Terraform module.

GraphDB Terraform GraphDB
Version 1.x.x Version 10.6.x
Version 1.2.x Version 10.7.x
Version 1.3.x Version 10.8.x

You can track the particular version updates of GraphDB in the changelog.

Prerequisites

Before you begin using this Terraform module, ensure you meet the following prerequisites:

Inputs

Name Description Type Default Required
common_tags (Optional) Map of common tags for all taggable AWS resources. map(string) {} no
resource_name_prefix Resource name prefix used for tagging and naming AWS resources string n/a yes
aws_region AWS region to deploy resources into string n/a yes
override_owner_id Override the default owner ID used for the AMI images string null no
deploy_backup Deploy backup module bool true no
backup_schedule Cron expression for the backup job. string "0 0 * * *" no
backup_retention_count Number of backups to keep. number 7 no
backup_enable_bucket_replication Enable or disable S3 bucket replication bool false no
lb_internal Whether the load balancer will be internal or public bool false no
lb_deregistration_delay Amount time, in seconds, for GraphDB LB target group to wait before changing the state of a deregistering target from draining to unused. string 300 no
lb_health_check_path The endpoint to check for GraphDB's health status. string "/rest/cluster/node/status" no
lb_health_check_interval (Optional) Interval in seconds for checking the target group healthcheck. Defaults to 10. number 10 no
lb_tls_certificate_arn ARN of the TLS certificate, imported in ACM, which will be used for the TLS listener on the load balancer. string "" no
lb_tls_policy TLS security policy on the listener. string "ELBSecurityPolicy-TLS13-1-2-2021-06" no
allowed_inbound_cidrs_lb (Optional) List of CIDR blocks to permit inbound traffic from to load balancer list(string) null no
allowed_inbound_cidrs_ssh (Optional) List of CIDR blocks to permit for SSH to GraphDB nodes list(string) null no
ec2_instance_type EC2 instance type string "r6i.2xlarge" no
ec2_key_name (Optional) key pair to use for SSH access to instance string null no
graphdb_node_count Number of GraphDB nodes to deploy in ASG number 3 no
vpc_dns_hostnames Enable or disable DNS hostnames support for the VPC bool true no
vpc_id Specify the VPC ID if you want to use existing VPC. If left empty it will create a new VPC string "" no
vpc_public_subnet_ids Define the Subnet IDs for the public subnets that are deployed within the specified VPC in the vpc_id variable list(string) [] no
vpc_private_subnet_ids Define the Subnet IDs for the private subnets that are deployed within the specified VPC in the vpc_id variable list(string) [] no
vpc_private_subnet_cidrs CIDR blocks for private subnets list(string) [ "10.0.0.0/19", "10.0.32.0/19", "10.0.64.0/19" ] no
vpc_public_subnet_cidrs CIDR blocks for public subnets list(string) [ "10.0.128.0/20", "10.0.144.0/20", "10.0.160.0/20" ] no
vpc_cidr_block CIDR block for VPC string "10.0.0.0/16" no
vpc_dns_support Enable or disable the support of the DNS service bool true no
single_nat_gateway Enable or disable the option to have single NAT Gateway. bool false no
enable_nat_gateway Enable or disable the creation of the NAT Gateway bool true no
vpc_endpoint_service_accept_connection_requests (Required) Whether or not VPC endpoint connection requests to the service must be accepted by the service owner - true or false. bool true no
vpc_endpoint_service_allowed_principals (Optional) The ARNs of one or more principals allowed to discover the endpoint service. list(string) null no
vpc_enable_flow_logs Enable or disable VPC Flow logs bool false no
vpc_flow_logs_lifecycle_rule_status Define status of the S3 lifecycle rule. Possible options are enabled or disabled. string "Disabled" no
vpc_flow_logs_expiration_days Define the days after which the VPC flow logs should be deleted number 7 no
lb_enable_private_access Enable or disable the private access via PrivateLink to the GraphDB Cluster bool false no
ami_id (Optional) User-provided AMI ID to use with GraphDB instances. If you provide this value, please ensure it will work with the default userdata script (assumes latest version of Ubuntu LTS). Otherwise, please provide your own userdata script using the user_supplied_userdata_path variable. string null no
graphdb_version GraphDB version string "10.8.0" no
device_name The device to which EBS volumes for the GraphDB data directory will be mapped. string "/dev/sdf" no
ebs_volume_type Type of the EBS volumes, used by the GraphDB nodes. string "gp3" no
ebs_volume_size The size of the EBS volumes, used by the GraphDB nodes. number 500 no
ebs_volume_throughput Throughput for the EBS volumes, used by the GraphDB nodes. number 250 no
ebs_volume_iops IOPS for the EBS volumes, used by the GraphDB nodes. number 8000 no
ebs_default_kms_key KMS key used for ebs volume encryption. string "alias/aws/ebs" no
prevent_resource_deletion Defines if applicable resources should be protected from deletion or not bool true no
graphdb_license_path Local path to a file, containing a GraphDB Enterprise license. string null no
graphdb_admin_password Password for the 'admin' user in GraphDB. string null no
graphdb_cluster_token Cluster token used for authenticating the communication between the nodes. string null no
route53_zone_dns_name DNS name for the private hosted zone in Route 53 string "graphdb.cluster" no
graphdb_external_dns External domain name where GraphDB will be accessed string "" no
deploy_monitoring Enable or disable toggle for monitoring bool false no
monitoring_route53_measure_latency Enable or disable route53 function to measure latency bool false no
monitoring_actions_enabled Enable or disable actions on alarms bool false no
monitoring_sns_topic_endpoint Define an SNS endpoint which will be receiving the alerts via email string null no
monitoring_sns_protocol Define an SNS protocol that you will use to receive alerts. Possible options are: Email, Email-JSON, HTTP, HTTPS. string "email" no
monitoring_enable_detailed_instance_monitoring If true, the launched EC2 instance will have detailed monitoring enabled bool false no
monitoring_endpoint_auto_confirms Enable or disable endpoint auto confirm subscription to the sns topic bool false no
monitoring_log_group_retention_in_days Log group retention in days number 30 no
monitoring_route53_health_check_aws_region Define the region in which you want the monitoring to be deployed. It is used to define where the Route53 Availability Check will be deployed, since if it is not specified it will deploy the check in us-east-1 and if you deploy in different region it will not find the dimensions. string "us-east-1" no
monitoring_route53_availability_http_port Define the HTTP port for the Route53 availability check number 80 no
monitoring_route53_availability_https_port Define the HTTPS port for the Route53 availability check number 443 no
graphdb_properties_path Path to a local file containing GraphDB properties (graphdb.properties) that would be appended to the default in the VM. string null no
graphdb_java_options GraphDB options to pass to GraphDB with GRAPHDB_JAVA_OPTS environment variable. string null no
deploy_logging_module Enable or disable logging module bool false no
logging_enable_bucket_replication Enable or disable S3 bucket replication bool false no
s3_enable_access_logs Enable or disable access logs bool false no
s3_access_logs_lifecycle_rule_status Define status of the S3 lifecycle rule. Possible options are enabled or disabled. string "Disabled" no
s3_access_logs_expiration_days Define the days after which the S3 access logs should be deleted. number 30 no
s3_expired_object_delete_marker Indicates whether Amazon S3 will remove a delete marker with no noncurrent versions. If set to true, the delete marker will be expired; if set to false the policy takes no action. bool true no
s3_mfa_delete Enable MFA delete for either Change the versioning state of your bucket or Permanently delete an object version. Default is false. This cannot be used to toggle this setting but is available to allow managed buckets to reflect the state in AWS string "Disabled" no
s3_versioning_enabled Enable versioning. Once you version-enable a bucket, it can never return to an unversioned state. You can, however, suspend versioning on that bucket. string "Enabled" no
s3_abort_multipart_upload Specifies the number of days after initiating a multipart upload when the multipart upload must be completed. number 7 no
s3_enable_replication_rule Enable or disable S3 bucket replication string "Disabled" no
lb_access_logs_lifecycle_rule_status Define status of the S3 lifecycle rule. Possible options are enabled or disabled. string "Disabled" no
lb_enable_access_logs Enable or disable access logs for the NLB bool false no
lb_access_logs_expiration_days Define the days after which the LB access logs should be deleted. number 14 no
bucket_replication_destination_region Define in which Region should the bucket be replicated string null no
asg_enable_instance_refresh Enables instance refresh for the GraphDB Auto scaling group. A refresh is started when any of the following Auto Scaling Group properties change: launch_configuration, launch_template, mixed_instances_policy bool false no
asg_instance_refresh_checkpoint_delay Number of seconds to wait after a checkpoint. number 3600 no
graphdb_enable_userdata_scripts_on_reboot (Experimental) Modifies cloud-config to always run user data scripts on EC2 boot bool false no
create_s3_kms_key Enable creation of KMS key for S3 bucket encryption bool false no
s3_kms_key_admin_arn ARN of the role or user granted administrative access to the S3 KMS key. string "" no
s3_key_rotation_enabled Specifies whether key rotation is enabled. bool true no
s3_kms_default_key Define default S3 KMS key string "alias/aws/s3" no
s3_cmk_alias The alias for the CMK key. string "alias/graphdb-s3-cmk-key" no
s3_kms_key_enabled Specifies whether the key is enabled. bool true no
s3_key_specification Specification of the Key. string "SYMMETRIC_DEFAULT" no
s3_key_deletion_window_in_days The waiting period, specified in number of days for AWS to delete the KMS key(Between 7 and 30). number 30 no
s3_cmk_description Description for the KMS Key string "KMS key for S3 bucket encryption." no
s3_external_kms_key_arn Externally provided KMS CMK string "" no
parameter_store_cmk_alias The alias for the CMK key. string "alias/graphdb-param-cmk-key" no
parameter_store_key_admin_arn ARN of the key administrator role for Parameter Store string "" no
parameter_store_key_tags A map of tags to assign to the resources. map(string) {} no
parameter_store_key_rotation_enabled Specifies whether key rotation is enabled. bool true no
parameter_store_default_key Define default key for parameter store if no KMS key is used string "alias/aws/ssm" no
parameter_store_key_enabled Specifies whether the key is enabled. bool true no
parameter_store_key_spec Specification of the Key. string "SYMMETRIC_DEFAULT" no
parameter_store_key_deletion_window_in_days The waiting period, specified in number of days for AWS to delete the KMS key(Between 7 and 30). number 30 no
parameter_store_cmk_description Description for the KMS Key string "KMS key for Parameter Store bucket encryption." no
create_parameter_store_kms_key Enable creation of KMS key for Parameter Store encryption bool false no
parameter_store_external_kms_key Externally provided KMS CMK string "" no
ebs_key_admin_arn ARN of the key administrator role for Parameter Store string "" no
ebs_key_tags A map of tags to assign to the resources. map(string) {} no
ebs_key_rotation_enabled Specifies whether key rotation is enabled. bool true no
default_ebs_cmk_alias The alias for the default Managed key. string "alias/aws/ebs" no
ebs_cmk_alias Define custom alias for the CMK Key string "alias/graphdb-cmk-ebs-key" no
ebs_key_spec Specification of the Key. string "SYMMETRIC_DEFAULT" no
ebs_key_deletion_window_in_days The waiting period, specified in number of days for AWS to delete the KMS key(Between 7 and 30). number 30 no
ebs_cmk_description Description for the KMS Key string "KMS key for S3 bucket encryption." no
ebs_external_kms_key Externally provided KMS CMK string "" no
ebs_key_enabled Enable or disable toggle for ebs volume encryption. bool true no
create_ebs_kms_key Creates KMS key for the EBS volumes bool false no
create_sns_kms_key Enable Customer managed keys for encryption. If set to false it will use AWS managed key. bool false no
sns_cmk_description Description for the KMS key for the encryption of SNS string "KMS CMK Key to encrypt SNS topics" no
sns_key_admin_arn ARN of the role or user granted administrative access to the SNS KMS key. string "" no
deletion_window_in_days The waiting period, specified in number of days for AWS to delete the KMS key(Between 7 and 30). number 30 no
sns_external_kms_key ARN of the external KMS key that will be used for encryption of SNS topics string "" no
sns_cmk_key_alias The alias for the SNS CMK key. string "alias/graphdb-sns-cmk-key-alias" no
sns_default_kms_key ARN of the default KMS key that will be used for encryption of SNS topics string "alias/aws/sns" no
sns_key_spec Specification of the Key. string "SYMMETRIC_DEFAULT" no
sns_key_enabled Specifies whether the key is enabled. bool true no
sns_rotation_enabled Specifies whether key rotation is enabled. bool true no

Usage

Important Variables (Inputs)

The following are the important variables you should configure when using this module:

  • aws_region: The region in which GraphDB will be deployed.
  • ec2_instance_type: The instance type for the GDB cluster nodes. This should match your performance and cost requirements.
  • graphdb_node_count: The number of instances in the cluster. Recommended is 3, 5 or 7 to have consensus according to the Raft algorithm. For a single node deployment set the value to 1.
  • graphdb_license_path : The path where the license for the GraphDB resides.
  • graphdb_admin_password: This variable allows you to set password of your choice. If nothing is specified it will be autogenerated, you can find the autogenerated password in the SSM Parameter Store. You should know that is base64 Encoded.

To use this module, follow these steps:

  1. Copy and paste into your Terraform configuration, insert the variables, and run terraform init:

    module "graphdb" {
      source  = "Ontotext-AD/graphdb/aws"
      version = "~> 1.0"
    
      resource_name_prefix     = "graphdb"
      aws_region               = "us-east-1"
      ec2_instance_type        = "m5.xlarge"
      graphdb_license_path     = "path-to-graphdb-license"
      allowed_inbound_cidrs_lb = ["0.0.0.0/0"]
    }
  2. Initialize the module and its required providers with:

    terraform init
  3. Before deploying, make sure to inspect the plan output with:

    terraform plan
  4. After a careful review of the output plan, deploy with:

    terraform apply

Once deployed, you should be able to access the environment at the generated FQDN that has been outputted at the end.

Examples

In this section you will find examples regarding customizing your GraphDB deployment.

GraphDB Configurations

There are several ways to customize the GraphDB properties.

  1. Using a Custom GraphDB Properties File:

    You can specify a custom GraphDB properties file using the graphdb_properties_path variable. For example:

    graphdb_properties_path = "<path_to_custom_graphdb_properties_file>"
  2. Setting Java Options with graphdb_java_options:

    Another option is to set Java options using the graphdb_java_options variable. For instance, if you want to print the command line flags, use:

    graphdb_java_options = "-XX:+PrintCommandLineFlags"

Note: The options mention above will be appended to the ones set in the user data script.

Customize GraphDB Version

graphdb_version = "10.8.0"

Purge Protection

Resources that support purge protection have them enabled by default. You can override the default configurations with the following variables:

prevent_resource_deletion = false

Backup

To enable deployment of the backup module, you need to enable the following flag:

deploy_backup = true

Monitoring

To enable deployment of the monitoring module, you need to enable the following flag:

deploy_monitoring = true

Providing a TLS certificate

# Example ARN
lb_tls_certificate_arn = "arn:aws:acm:us-east-1:123456789012:certificate/12345678-1234-1234-1234-123456789012"

Private Deployment

To ensure access to GraphDB exclusively through a private network, you must set the following variables to true:

# Enable creation of a private service endpoint
lb_enable_private_access = true
# Enable private access to the Network Load Balancer and disable public access
lb_internal = true

By configuring these variables accordingly you enforce GraphDB accessibility solely via a private network, enhancing security and control over network traffic.

Logging

To enable the logging feature the first thing that you should do is to switch the deploy_logging_module variable to true.

There are several logging features that can be enabled with the following variables:

EBS Volume Configurations

This Terraform module creates EBS volumes and mounts them to EC2 instances to store data. You can modify the default settings by changing the values of the following variables:

ebs_volume_size                            = 1024
ebs_volume_iops                            = 10000
ebs_volume_throughput                      = 500

S3 Access Logs

To enable the S3 Bucket access logs for the backup bucket you should switch the following values to true:

deploy_logging_module = true
s3_access_logs_lifecycle_rule_status = "Enabled"
s3_enable_access_logs = true

Load Balancer Access Logs

To enable the load balancer logs you should enable the following variables to true:

deploy_logging_module = true
lb_access_logs_lifecycle_rule_status = true
lb_enable_access_logs = true

VPC Flow Logs

To enable the VPC Flow logs you should switch the following variables to true:

deploy_logging_module = true
vpc_enable_flow_logs = true
vpc_flow_logs_lifecycle_rule_status = "Enabled"

KMS Encryption using Customer Master Keys

Parameter Store encryption

You can encrypt parameters stored in AWS Systems Manager Parameter Store using KMS CMKs. This ensures that sensitive data, such as configuration secrets, are securely encrypted at rest.

Keys

To utilize CMK, you should set the following variable enable_graphdb_parameter_store_kms_key to true. This will generate a new KMS Key.

If enable_graphdb_parameter_store_kms_key is set to false, the encryption will be disabled.

You can also provide your own key using the parameter_store_external_kms_key variable.

enable_graphdb_parameter_store_kms_key = true
parameter_store_external_kms_key       = "arn:aws:kms:us-east-1:123456789012:key/your-external-key-arn"
Key Admin

You can designate a Key admin by setting the graphdb_parameter_store_key_admin_arn variable, or you can use the current AWS account by leaving this parameter empty.

graphdb_parameter_store_key_admin_arn = "arn:aws:iam::123456789012:role/KeyAdminRole"

EBS encryption

You can enhance the security of EBS volumes by using KMS CMKs to encrypt data at rest. This provides an additional layer of protection for data stored on EBS volumes attached to EC2 instances.

Keys

To enable CMK, set create_graphdb_ebs_kms_key to true. This will create a new KMS Key.

If create_graphdb_ebs_kms_key is set to false the default AWS key encryption will be used.

You can also provide your own key using the ebs_external_kms_key variable.

create_graphdb_ebs_kms_key = true
ebs_external_kms_key = "arn:aws:kms:us-east-1:123456789012:key/your-external-key-arn"
Key Admin

You can designate a Key admin by setting the graphdb_ebs_key_admin_arn variable, or you can use the current AWS account by leaving this parameter empty.

graphdb_ebs_key_admin_arn = "arn:aws:iam::123456789012:role/KeyAdminRole"

S3 encryption

You can secure S3 bucket objects by encrypting them with KMS CMKs, ensuring data at rest is protected. This safeguards the integrity and confidentiality of data stored in S3 buckets.

Keys

To use CMK, set create_s3_kms_key to true. This will create a new KMS Key.

If create_s3_kms_key is set to false, the default AWS key alias/aws/s3 will be used.

You can also provide your own key using the s3_external_kms_key_arn variable.

create_s3_kms_key = true
s3_external_kms_key_arn = "arn:aws:kms:us-east-1:123456789012:key/your-external-key-arn"
Key Admin

You can designate a Key admin by setting the s3_kms_key_admin_arn variable, or you can use the current AWS account by leaving this parameter empty.

s3_kms_key_admin_arn = "arn:aws:iam::123456789012:role/KeyAdminRole"

Replication

You can enable replication for S3 buckets by setting the following variables to true:

logging_enable_bucket_replication = true
s3_enable_replication_rule = "Enabled"

Deploying in an existing VPC

If you have an existing VPC in your account, you can use it to deploy the GraphDB cluster.

Just specify values for the following variables:

vpc_id = "vpc-12345678"
vpc_public_subnet_ids = ["subnet-123456","subnet-234567","subnet-345678"]
vpc_private_subnet_ids = ["subnet-456789","subnet-567891","subnet-678912"]

Single Node Deployment

This Terraform module can deploy a single instance of GraphDB. To do this, set graphdb_node_count to 1, and the rest will be handled automatically.

Important: While it is possible to scale from a single node to a cluster deployment (e.g., from 1 node to 3 nodes), it is not recommended. Synchronizing the repository across all nodes can be time-consuming and may cause scripts to time out.

Updating configurations on an active deployment

Updating Configurations

When faced with scenarios such as an expired license, or the need to modify the graphdb.properties file or other GraphDB-related configurations, you can apply changes via terraform apply and then you can either:

  • Manually terminate instances one by one, beginning with the follower nodes and concluding with the leader node as the last instance to be terminated.
  • Scale in the number of instances in the scale set to zero and then scale back up to the original number of nodes.
  • Set the graphdb_enable_userdata_scripts_on_reboot variable to true. This ensures that user data scripts are executed on each reboot, allowing you to update the configuration of each node. The reboot option would essentially achieve the same outcome as the termination and replacement approach, but it is still experimental.
Please note that the scale in and up option will result in greater downtime than the other options, where the downtime should be less.

Both actions will trigger the user data script to run again, updating files and properties overrides with the new values. Please note that changing the graphdb_admin_password via terraform apply will not update the password in GraphDB. Support for this will be introduced in the future.

Upgrading GraphDB Version

To automatically update the GraphDB version with terraform apply, you could set asg_enable_instance_refresh to true in your tfvars file. This configuration will enable instance refresh for the ASG and will replace your already running instances with new ones, one at a time.

By default, the instance refresh process waits for one hour before updating the next instance. This delay allows GraphDB time to sync with other nodes. You can adjust this delay by changing the asg_instance_refresh_checkpoint_delay value. If there are many writes to the cluster, consider increasing this delay.

Note that any changes to GraphDB configurations will be applied during the instance refresh process, except for the graphdb_admin_password. Support for updating the admin password will be introduced in a future release.

⚠️ WARNING

Enabling asg_enable_instance_refresh while scaling out the GraphDB cluster may lead to data replication issues or broken cluster configuration. Existing instances could still undergo the refresh process, might change their original Availability zone and new nodes might fail to join the cluster due to the instance refresh, depending on the data size.

We strongly recommend disabling asg_enable_instance_refresh when scaling up the cluster.

To work around this issue, you can manually set "Scale-in protection" on the existing nodes, scale out the cluster, and then remove the "Scale-in protection". However, any configuration changes will not be applied to the old instances, which could cause them to drift apart.

Local Development

Instead of using the module dependency, you can create a local variables file named terraform.tfvars and provide configuration overrides there. Here's an example of a terraform.tfvars file:

terraform.tfvars

aws_region = "us-east-1"

resource_name_prefix = "my-prefix"

graphdb_license_path = "/path/to/your/license.license"

ec2_instance_type = "c5a.2xlarge"

allowed_inbound_cidrs_lb = ["0.0.0.0/0"]

Release History

All notable changes between version are tracked and documented at CHANGELOG.md.

Contributing

Check out the contributors guide CONTRIBUTING.md.

License

This code is released under the Apache 2.0 License. See LICENSE for more details.