Skip to content

Latest commit

 

History

History
 
 

sample1_basic_azure_databricks_environment

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Deploying a secure Azure Databricks environment using Infrastructure as Code

Contents

1. Solution Overview

It is a recommended pattern for enterprise applications to automate platform provisioning to achieve consistent, repeatable deployments using Infrastructure as Code (IaC). This practice is highly encouraged by organizations that run multiple environments such as Dev, Test, Performance Test, UAT, Blue and Green production environments, etc. IaC is also very effective in managing deployments when the production environments are spread across multiple regions across the globe.

Tools like Azure Resource Manager (ARM), Terraform, and the Azure Command Line Interface (CLI) enable you to declaratively script the cloud infrastructure and use software engineering practices such as testing and versioning while implementing IaC.

This sample will focus on automating the provisioning of a basic Azure Databricks environment using the Infrastructure as Code pattern

1.1. Scope

The following list captures the scope of this sample:

  1. Provision an Azure Databricks environment using ARM templates orchestrated by a shell script.
  2. The following services will be provisioned as a part of the basic Azure Databricks environment setup:
    1. Azure Databricks Workspace
    2. Azure Storage account with hierarchical namespace enabled to support ABFS
    3. Azure key vault to store secrets and access tokens

Details about how to use this sample can be found in the later sections of this document.

1.2. Architecture

The below diagram illustrates the deployment process flow followed in this sample:

alt text

1.2.1. Patterns

Following are the cloud design patterns being used by this sample:

1.3. Technologies used

The following technologies are used to build this sample:

2. How to use this sample

This section holds the information about usage instructions of this sample.

2.1. Prerequisites

The following are the prerequisites for deploying this sample :

  1. Github account
  2. Azure Account
    • Permissions needed: The ability to create and deploy to an Azure resource group, a service principal, and grant the collaborator role to the service principal over the resource group.

    • Active subscription with the following resource providers enabled:

      • Microsoft.Databricks
      • Microsoft.DataLakeStore
      • Microsoft.Storage
      • Microsoft.KeyVault

2.1.1 Software Prerequisites

  1. Azure CLI installed on the local machine
    • Installation instructions can be found here
  2. For Windows users,
    1. Option 1: Windows Subsystem for Linux
    2. Option 2: Use the devcontainer published here as a host for the bash shell. For more information about Devcontainers, see here.

2.2. Setup and deployment

IMPORTANT NOTE: As with all Azure Deployments, this will incur associated costs. Remember to teardown all related resources after use to avoid unnecessary costs. See here for a list of deployed resources.

Below listed are the steps to deploy this sample :

  1. Fork and clone this repository. Navigate to (CD) single_tech_samples/databricks/sample1_basic_azure_databricks_environment/.

  2. The sample depends on the following environment variables to be set before the deployment script is run:

    • DEPLOYMENT_PREFIX - Prefix for the resource names which will be created as a part of this deployment
    • AZURE_SUBSCRIPTION_ID - Subscription ID of the Azure subscription where the resources should be deployed.
    • AZURE_RESOURCE_GROUP_NAME - Name of the containing resource group
    • AZURE_RESOURCE_GROUP_LOCATION - Azure region where the resources will be deployed. (e.g. australiaeast, eastus, etc.)
    • DELETE_RESOURCE_GROUP - Flag to indicate the cleanup step for the resource group
  3. Run '/deploy.sh'

    Note: The script will prompt you to log in to the Azure account for authorization to deploy resources.

    The script will validate the ARM templates and the environment variables before deploying the resources. It will also display the status of each stage of the deployment while it executes. The following screenshot displays the log for a successful run:

    Note: DEPLOYMENT_PREFIX for this deployment was set as lumustest

    alt text

2.3. Deployed Resources

The following resources will be deployed as a part of this sample once the script is executed:

1.Azure Databricks workspace.

alt text

2.Azure Storage with hierarchical namespace enabled.

alt text

2.Azure Key vault with all the secrets configured.

alt text

2.4. Deployment validation

The following steps can be performed to validate the correct deployment of this sample:

  1. Users with appropriate access rights should be able to:

    1. launch the workspace from the Azure portal.
    2. Access the control plane for the storage account and key vault through the Azure portal.
    3. View the secrets configured in the Azure Key vault
    4. View deployment logs in the Azure resource group alt text

2.5. Clean-up

Please follow the below steps to clean up your environment :

The clean-up script can be executed to clean up the resources provisioned in this sample. Following are the steps to execute the script:

  1. Navigate to (CD) single_tech_samples/databricks/sample1_basic_azure_databricks_environment/.

  2. Run '/destroy.sh'

The following screenshot displays the log for a successful clean-up run:

alt text

3. Next Step

Deploying Enterprise-grade Azure Databricks environment using Infrastructure as Code aligned with Anti-Data-Exfiltration Reference architecture