This deployment type is intended for greenfield/pov/lab purposes. It will deploy a fully functioning sandbox environment in a new Resource Group/VNet with test workload VMs. Full set of resources provisioned listed below; Effectively, this will create all network infrastructure dependencies for an Azure environment. Everything from "Base" deployment type (Creates 1 new Resource Group; 1 VNet with 1 public subnet and 1 private/workload subnet; 1 Centos server workload in the private subnet; 1 Bastion Host in the public subnet assigned a Public IP; and generates local key pair .pem file for ssh access).
Additionally: Depending on the configuration, creates 1 or more Flexible Orchestration Virtual Machine Scale Sets (VMSS) and scaling policies for Cloud Connector in private subnet(s); and 1 function app for VMSS; Standard Azure Load Balancer; and workload private subnet UDR routing to the Load Balancer Frontend IP.
If run_manual_sync variable is True (True by default) the bash script scripts/manual_sync.sh is invoked to perform this manual sync (more information in the Caveates section), it is advised that you run from a MacOS or Linux workstation and have the following tools installed: - bash | curl | jq
- WSL2 DNS bug: If you are trying to run these Azure terraform deployments specifically from a Windows WSL2 instance like Ubuntu and receive an error containing a message similar to this "dial tcp: lookup management.azure.com on 172.21.240.1:53: cannot unmarshal DNS message" please refer here for a WSL2 resolv.conf fix. microsoft/WSL#5420 (comment).
- Function App Manual Sync: On creation time of the Function App, used for managing Cloud Connectors in the Scale Set, Azure requires that a "Manual Sync" operation is done. This can be done through an API call or through simply navigating to the Function App on the Azure console and having the page load. This action will tell the Function App to load the zip file from the Storage Account and start running the Functions. We have attemped to automate this Manual Sync call through terraform by triggering scripts/manual_sync.sh through a provisioner in the Function App Terraform module. If this attempt fails an output message (shown below) will be displayed in the testbed.txt and printed to the screen at the end of the deployment. If the Manual Sync operation fails during terraform apply, the steps listed in the message can be used to remediate the issue. This is a one time action at Function App creation time.
**IMPORTANT (ONLY APPLICABLE FOR INITIAL CREATE OF FUNCTION APP)**
Based on the recorded output, the manual sync to start your Azure Function App failed. To perform this manual sync perform one of the following steps:
1. Navigate to the Azure Function App /subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.Web/sites/<function-app> on the Azure Portal. The loading of the Function App page triggers the manual sync and will start your Function App.
2. Attempt to rerun the manual_sync.sh script manually using the following command (path to file is based on root of the repo):
../../modules/terraform-zscc-function-app-azure/manual_sync.sh <subscription-id> <resource-group> <function-app>
**IMPORTANT (ONLY APPLICABLE FOR INITIAL CREATE OF FUNCTION APP)**
- Security Stack will be deployed into its own Resource Group.
- Based on zonal needs, a VMSS will be created in each configured zone.
- An Azure Internal Load Balancer (ILB) is deployed on top of all the Scale Sets and is used as the entry point for the Security Stack.
- A NAT Gateway will be deployed in each configured zone and will have a dedicated IP associated with it, this will be used for outbound traffic from the Cloud Connectors.
It is recommended that this security stack is deployed into its own VNet (Security VNet) and Workload VNets are peered with it. Once the security stack is deployed, route tables in the Workload VNets should have a User Defined Route steering traffic to the ILB sitting on top of the Cloud Connectors.
The Azure Function App will contain two Azure Functions.
- Health Monitoring Function - Responsible for using the custom metrics published by each CC to determine if there are any unhealthy CCs that need to be replaced. If a CC is found to be unhealthy, the function will terminate the instance and will replace it with a new one. This function will run every one minute.
- Resource Sync Function - Responsible for ensuring the VMs advertised in your Cloud Connector Group on the Zscaler Cloud Connector Portal match what is existing in your Azure Scale Set. If it finds that a CC exists in the Cloud Connector Group but not in the Azure Scale Set, it will perform the clean up of that instance from the Cloud Connector Group to ensure the two entities are in sync. This function will every every 30 minutes.
From the examples directory, run the zsec bash script that walks to all required inputs.
- ./zsec up
- enter "greenfield"
- enter "base_cc_vmss"
- follow the remainder of the authentication and configuration input prompts.
- script will detect client operating system and download/run a specific version of terraform in a temporary bin directory
- inputs will be validated and terraform init/apply will automatically exectute.
- verify all resources that will be created/modified and enter "yes" to confirm
Modify/populate any required variable input values in base_cc_vmss/terraform.tfvars file and save.
From base_cc_vmss directory execute:
- terraform init
- terraform apply
From the examples directory, run the zsec bash script that walks to all required inputs.
- ./zsec destroy
From base_cc_vmss directory execute:
- terraform destroy
This solution includes two entities that will be performing Azure Operations, the Cloud Connectors and the Azure Function App. The Cloud Connector will need a Managed Identity associated with it along with the Azure Function App. For the Azure Function App to be able to perform the operations described above, it will need an increased permission set that is not necessarily required for the Cloud Connector. To enforce proper RBAC we are allowing for two Managed Identities to be used. One specifically for the Cloud Connector with the reduced permissions set and one for the Function App with the expanded permission set.
Set the following variables:
cc_vm_managed_identity_name = <cc-managed-identity-name>
cc_vm_managed_identity_rg = <cc-managed-identity-resource-group>
function_app_managed_identity_name = <function-app-managed-identity-name>
function_app_managed_identity_rg = <function-app-managed-identity-resource-group>
Configure the following options:
Cloud Connector User Managed Identity Information:
Is the Managed Identity in the same Subscription ID? [yes/no]: yes
Managed Identity is in the same Subscription
Enter Managed Identity Name: <cc-managed-identity-name>
Enter Managed Identity Resource Group: <cc-managed-identity-resource-group>
Function App User Managed Identity Information:
Assign the same User Managed Identity (<cc-managed-identity-name>) to Function App? [yes/no]: no
Enter Function App designated Managed Identity Name: <function-app-managed-identity-name>
Enter Function App designated Managed Identity Resource Group: <function-app-managed-identity-resource-group>
- Enables you to redefine minimum Cloud Connectors in Scale Set for specific time periods.
- Should be used if you have predictable traffic patterns (9am-5pm Monday-Friday).
Setting the following variables:
scheduled_scaling_enabled = true
scheduled_scaling_vmss_min_ccs = 3
scheduled_scaling_days_of_week = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
scheduled_scaling_start_time_hour = 7
scheduled_scaling_start_time_min = 30
scheduled_scaling_end_time_hour = 18
scheduled_scaling_end_time_min = 30
Configure the following options:
Do you want to enable scheduled scaling on the VMSS? [yes/no]: yes
Enter the minimum amount of scheduled Cloud Connectors in VMSS? [Default=2]: 3
Apply Scheduled Scaling Policy on Sunday? [yes/no]: no
Not configuring Sunday on Scheduled Scaling configuration.
Apply Scheduled Scaling Policy on Monday? [yes/no]: yes
Adding Monday on Scheduled Scaling configuration.
Apply Scheduled Scaling Policy on Tuesday? [yes/no]: yes
Adding Tuesday on Scheduled Scaling configuration.
Apply Scheduled Scaling Policy on Wednesday? [yes/no]: yes
Adding Wednesday on Scheduled Scaling configuration.
Apply Scheduled Scaling Policy on Thursday? [yes/no]: yes
Adding Thursday on Scheduled Scaling configuration.
Apply Scheduled Scaling Policy on Friday? [yes/no]: yes
Adding Friday on Scheduled Scaling configuration.
Apply Scheduled Scaling Policy on Saturday? [yes/no]: no
Not configuring Saturday on Scheduled Scaling configuration.
Configuring the following days on the Scheduled Scaling Policy: Monday Tuesday Wednesday Thursday Friday
Enter the start time hour for the scheduled scaling configuration? [Default=9]: 7
Enter the start time min for the scheduled scaling configuration? [Default=0]: 30
Enter the end time hour for the scheduled scaling configuration? [Default=17]: 18
Enter the end time min for the scheduled scaling configuration? [Default=30]: 30
Cloud Connector Health metrics are published every 1 minute by the Cloud Connector and are managed by Application Insights. One easy way to view the metrics is to navigate to one of the running instances: Resource Group -> Scale Set -> Instances (tab on left) -> select instance -> Metrics (tab on left). Next create a metric query where:
- Scope = vm-name
- Metric Namespace = zscaler/cloudconnectors
- Metric = cloud_connector_aggr_health
- Aggregation = average
Cloud Connectors in a Scale Set publish scaling metrics to the Scale Set resource once a minute. These scaling metrics include smedge_cpu_utilization, smedge_mem_utilization, smedge_bytes_in and smedge_bytes_out. The scaling rules in the Scale Set scaling configuration will look at the smedge_cpu_utilization and compare it to the defined threshold.
To view these metrics navigate to the Scale Set you are interested in: Resource Group -> select Scale Set -> Metrics (tab on left). Next create a metrics query where:
- Scope = scale-set-name
- Metric Namespace = zscaler/cloudconnectors
- Metric = smedge_metrics
- Aggregation = average
Lastly, create a filter where:
There are a couple approaches for viewing logs from a Function inside a Function App.
To view recent invocations you can navigate to the function you are interested in: Resource Group -> select Function App -> select Function (shown on overview page) -> Invocations
To view real time logs from function executing at that time you can navigate to the function you interested in: Resource Group -> select Function App -> select Function (shown on overview page) -> Logs
The more complex but powerful approach for viewing logs would be to use Application Insights. Application Insights will give you the ability to perform queries to view specific log messages, executions, timeframes, etc. One basic example of viewing logs from the Health Monitor function where it has found no instances need to be terminated. You can see that a specific message is defined when querying the logs, this will allow you to refine your search instead of manually going through each invocation or continuously watching the real time streaming.
Navigate to: Resource Group -> Application Insights -> Logs (tab on left) Use the following query:
union traces
| union exceptions
| where timestamp > ago(1d)
| where customDimensions['Category'] == 'Function.healthMonitor.User' or customDimensions['Category'] == 'Function.healthMonitor'
| where message contains "No instances to terminate on this iteration."
| order by timestamp asc
| project
timestamp,
message = iff(message != '', message, iff(innermostMessage != '', innermostMessage, customDimensions.['prop__{OriginalFormat}']))
Each Cloud Connector will broadcast its health to the Azure Application Insights Instance in the Resource Group (to view these metrics refer to Debugging Tips->Viewing Cloud Connector Health Metrics). The health relates to the dataplanes health and correlates to the active/inactive state you will in the Cloud Connetor Group on the Zscaler Connector Portal. This health is evaluated by a process in the Cloud Connector and a value is published to this metric every 1 minute, 0 indicates unhealthy and 100 indicates healthy. An instance should be replaced in one of the two scenarios:
- The Cloud Connector reports unhealty 5 times in a row. This indicates the Cloud Connector is down and should be replaced.
- The Cloud Connector reports unhealthy 7 out of 10 times. This indicates the Cloud Connector is flapping and should be replaced.
The Health Monitoring Function in the Function App will perform this evaluation every 1 minute and will determine if any instances should be replaced. When an instance is replaced, it will be terminated and the Health Monitoring Function will ensure a new one is brought up to replace it.
In this scenario you should first check to see if the metrics published by the unhealthy instance are of value 0, this indicated unhealthy (100 indicates healthy). Please refer to the Debugging Tips->Viewing Cloud Connector Health Metrics section. If you are seeing the value of this metric at 0 for a long period of time (refer to FAQs->When is a Cloud Connector considered to be unhealthy and should be replaced?), the next thing you should check is to see if the Function App is running. During creation of the Function App, there is a manual sync trigger that needs to be successfully invoked for the Function App to start (refer to Caveats/Considerations->Function App Manual Sync), if the Function App is not running the unhealthy instances will not be replaced. Navigate to the Function App on the Azure Console to invoke the Manual Sync and view the invocations (Debugging Tips->Viewing Function App Logs->Recent Invocations) to see if it has been running.
This can be configured through modifying the following terraform variable and then applying the change:
terminate_unhealthy_instances = false
It can also be configured manually on Azure Portal by navigating to the environment variables of the Function App: Resource Group -> select Function App -> Environment variables. Then selecting TERMINATE_UNHEALTHY_INSTANCES and setting the value to false. Once this is done apply the change.
Yes, this can be done with terraform by not setting the following variables: function_app_managed_identity_name and function_app_managed_identity_rg.
Mgmt IP address will not be printed after the terraform executes because the dynamic nature of a Scale Set results in us not know what the IP address is. Therefore if you wish to SSH into one of the Cloud Connectors you will need to find the instance you are interested in on the Azure Portal to get the IP address to use for the connection.
To find this Mgmt IP navigate to: Resource Group -> select Scale Set -> Instances (tab on left) -> select Instance -> Network Settings (tab on left). Once here you can check to make sure you are looking at the mgmt interface. This can be confirmed by seeing “mgmt” in the interface name. From there you can copy the IP address.
Name | Version |
---|---|
terraform | >= 0.13.7, < 2.0.0 |
azurerm | >= 3.108.0, <= 3.116 |
local | ~> 2.5.0 |
null | ~> 3.1.0 |
random | ~> 3.3.0 |
tls | ~> 3.4.0 |
Name | Version |
---|---|
local | ~> 2.5.0 |
random | ~> 3.3.0 |
tls | ~> 3.4.0 |
Name | Source | Version |
---|---|---|
bastion | ../../modules/terraform-zscc-bastion-azure | n/a |
cc_functionapp | ../../modules/terraform-zscc-function-app-azure | n/a |
cc_identity | ../../modules/terraform-zscc-identity-azure | n/a |
cc_lb | ../../modules/terraform-zscc-lb-azure | n/a |
cc_nsg | ../../modules/terraform-zscc-nsg-azure | n/a |
cc_vmss | ../../modules/terraform-zscc-ccvmss-azure | n/a |
network | ../../modules/terraform-zscc-network-azure | n/a |
workload | ../../modules/terraform-zscc-workload-azure | n/a |
Name | Type |
---|---|
local_file.private_key | resource |
local_file.testbed | resource |
local_file.user_data_file | resource |
random_string.suffix | resource |
tls_private_key.key | resource |
Name | Description | Type | Default | Required |
---|---|---|---|---|
accelerated_networking_enabled | Enable/Disable accelerated networking support on all Cloud Connector service interfaces | bool |
true |
no |
arm_location | The Azure Region where resources are to be deployed | string |
"westus2" |
no |
azure_vault_url | Azure Vault URL | string |
n/a | yes |
bastion_nsg_source_prefix | user input for locking down SSH access to bastion to a specific IP or CIDR range | string |
"*" |
no |
cc_subnets | Cloud Connector Subnets to create in VNet. This is only required if you want to override the default subnets that this code creates via network_address_space variable. | list(string) |
null |
no |
cc_vm_managed_identity_name | Azure Managed Identity name to attach to the CC VM. E.g zspreview-66117-mi | string |
n/a | yes |
cc_vm_managed_identity_rg | Resource Group of the Azure Managed Identity name to attach to the CC VM. E.g. edgeconnector_rg_1 | string |
n/a | yes |
cc_vm_prov_url | Zscaler Cloud Connector Provisioning URL | string |
n/a | yes |
ccvm_image_offer | Azure Marketplace Cloud Connector Image Offer | string |
"zia_cloud_connector" |
no |
ccvm_image_publisher | Azure Marketplace Cloud Connector Image Publisher | string |
"zscaler1579058425289" |
no |
ccvm_image_sku | Azure Marketplace Cloud Connector Image SKU | string |
"zs_ser_gen1_cc_01" |
no |
ccvm_image_version | Azure Marketplace Cloud Connector Image Version | string |
"latest" |
no |
ccvm_instance_type | Cloud Connector Image size | string |
"Standard_D2s_v3" |
no |
ccvm_source_image_id | Custom Cloud Connector Source Image ID. Set this value to the path of a local subscription Microsoft.Compute image to override the Cloud Connector deployment instead of using the marketplace publisher | string |
null |
no |
encryption_at_host_enabled | User input for enabling or disabling host encryption | bool |
true |
no |
env_subscription_id | Azure Subscription ID where resources are to be deployed in | string |
n/a | yes |
environment | Customer defined environment tag. ie: Dev, QA, Prod, etc. | string |
"Development" |
no |
existing_log_analytics_workspace | Set to True if you wish to use an existing Log Analytics Workspace to associate with the AppInsights Instance. Default is false meaning Terraform module will create a new one | bool |
false |
no |
existing_log_analytics_workspace_id | ID of existing Log Analytics Workspace to associate with the AppInsights Instance. | string |
"" |
no |
existing_storage_account | Set to True if you wish to use an existing Storage Account to associate with the Function App. Default is false meaning Terraform module will create a new one | bool |
false |
no |
existing_storage_account_name | Name of existing Storage Account to associate with the Function App. | string |
"" |
no |
existing_storage_account_rg | Resource Group of existing Storage Account to associate with the Function App. | string |
"" |
no |
function_app_managed_identity_name | Azure Managed Identity name to attach to the Function App. E.g zspreview-66117-mi | string |
"" |
no |
function_app_managed_identity_rg | Resource Group of the Azure Managed Identity name to attach to the Function App. E.g. edgeconnector_rg_1 | string |
"" |
no |
health_check_interval | The interval, in seconds, for how frequently to probe the endpoint for health status. Typically, the interval is slightly less than half the allocated timeout period (in seconds) which allows two full probes before taking the instance out of rotation. The default value is 15, the minimum value is 5 | number |
15 |
no |
http_probe_port | Port number for Cloud Connector cloud init to enable listener port for HTTP probe from Azure LB | number |
50000 |
no |
load_distribution | Azure LB load distribution method | string |
"Default" |
no |
managed_identity_subscription_id | Azure Subscription ID where the User Managed Identity resource exists. Only required if this Subscription ID is different than env_subscription_id | string |
null |
no |
name_prefix | The name prefix for all your resources | string |
"zscc" |
no |
network_address_space | VNet IP CIDR Range. All subnet resources that might get created (public, workload, cloud connector) are derived from this /16 CIDR. If you require creating a VNet smaller than /16, you may need to explicitly define all other subnets via public_subnets, workload_subnets, cc_subnets, and route53_subnets variables | string |
"10.1.0.0/16" |
no |
number_of_probes | The number of probes where if no response, will result in stopping further traffic from being delivered to the endpoint. This values allows endpoints to be taken out of rotation faster or slower than the typical times used in Azure | number |
1 |
no |
owner_tag | Customer defined owner tag value. ie: Org, Dept, username, etc. | string |
"zscc-admin" |
no |
path_to_scripts | Path to script_directory | string |
"" |
no |
probe_threshold | The number of consecutive successful or failed probes in order to allow or deny traffic from being delivered to this endpoint. After failing the number of consecutive probes equal to this value, the endpoint will be taken out of rotation and require the same number of successful consecutive probes to be placed back in rotation. | number |
2 |
no |
public_subnets | Public/Bastion Subnets to create in VNet. This is only required if you want to override the default subnets that this code creates via network_address_space variable. | list(string) |
null |
no |
run_manual_sync | Set to True if you would like terraform to run the manual sync operation to start the Function App after creation. The alternative is to navigate to the Function App on the Azure Portal UI or to manually invoke the script yourself. | bool |
true |
no |
scale_in_threshold | Metric threshold for determining scale in. | number |
50 |
no |
scale_out_threshold | Metric threshold for determining scale out. | number |
70 |
no |
scheduled_scaling_days_of_week | Days of the week to apply scheduled scaling profile. | list(string) |
[ |
no |
scheduled_scaling_enabled | Enable scheduled scaling on top of metric scaling. | bool |
false |
no |
scheduled_scaling_end_time_hour | Hour to end scheduled scaling profile. | number |
17 |
no |
scheduled_scaling_end_time_min | Minute to end scheduled scaling profile. | number |
0 |
no |
scheduled_scaling_start_time_hour | Hour to start scheduled scaling profile. | number |
9 |
no |
scheduled_scaling_start_time_min | Minute to start scheduled scaling profile. | number |
0 |
no |
scheduled_scaling_timezone | Timezone the times for the scheduled scaling profile are specified in. | string |
"Pacific Standard Time" |
no |
scheduled_scaling_vmss_min_ccs | Minimum number of CCs in vmss for scheduled scaling profile. | number |
2 |
no |
support_access_enabled | If Network Security Group is being configured, enable a specific outbound rule for Cloud Connector to be able to establish connectivity for Zscaler support access. Default is true | bool |
true |
no |
terminate_unhealthy_instances | Indicate whether detected unhealthy instances are terminated or not. | bool |
true |
no |
tls_key_algorithm | algorithm for tls_private_key resource | string |
"RSA" |
no |
upload_function_app_zip | By default, this Terraform will create a new Storage Account/Container/Blob to upload the zip file. The function app will pull from the blobl url to run. Setting this value to false will prevent creation/upload of the blob file | bool |
true |
no |
vmss_default_ccs | Default number of CCs in vmss. | number |
2 |
no |
vmss_max_ccs | Maximum number of CCs in vmss. | number |
16 |
no |
vmss_min_ccs | Minimum number of CCs in vmss. | number |
2 |
no |
workload_count | The number of Workload VMs to deploy | number |
1 |
no |
workloads_subnets | Workload Subnets to create in VNet. This is only required if you want to override the default subnets that this code creates via network_address_space variable. | list(string) |
null |
no |
zones | Specify which availability zone(s) to deploy VM resources in if zones_enabled variable is set to true | list(string) |
[ |
no |
zones_enabled | Determine whether to provision Cloud Connector VMs explicitly in defined zones (if supported by the Azure region provided in the location variable). If left false, Azure will automatically choose a zone and module will create an availability set resource instead for VM fault tolerance | bool |
false |
no |
zscaler_cc_function_public_url | Publicly accessible URL path where Function App can pull its zip file build from. This is only required when var.upload_function_app_zip is set to false | string |
"" |
no |
Name | Description |
---|---|
testbedconfig | Azure Testbed results |