The Iterative Provider is a Terraform plugin that enables full lifecycle management of cloud computing resources, including GPUs, from your favorite vendors. Two types of resources are available:
- Runner (
iterative_cml_runner
) - Machine (
iterative_machine
)
The Provider is designed for benefits like:
- Unified logging for workflows run in cloud resources
- Automatic provision of cloud resources
- Automatic unregister and removal of cloud resources (never forget to turn your GPU off again)
- Arguments inherited from the GitHub/GitLab runner for ease of integration
(
name
,labels
,idle-timeout
,repo
,token
, anddriver
)
A self hosted runner based on a thin wrapper over the GitLab and GitHub self-hosted runners, abstracting their functionality to a common specification that allows adjusting the main runner settings, like idle timeouts, or custom runner labels.
The runner resource also provides features like unified logging and automated cloud resource provisioning and management through various vendors.
This provider requires a repository token for registering and unregistering self-hosted runners during the cloud resource lifecycle. Depending on the platform you use, the instructions to get that token may vary; please refer to your platform documentation:
This token can be passed to the provider through the CML_TOKEN
or environment
variable, like in the following example:
export CML_TOKEN=···
Additionally, you need to provide credentials for the cloud provider where the computing resources should be allocated. Follow the steps below to get started.
-
- Setup your provider credentials as ENV variables
AWS
export AWS_SECRET_ACCESS_KEY=YOUR_KEY
export AWS_ACCESS_KEY_ID=YOUR_ID
export CML_TOKEN=YOUR_REPO_TOKEN
Azure
export AZURE_CLIENT_ID=YOUR_ID
export AZURE_CLIENT_SECRET=YOUR_SECRET
export AZURE_SUBSCRIPTION_ID=YOUR_SUBSCRIPTION_ID
export AZURE_TENANT_ID=YOUR_TENANT_ID
export CML_TOKEN=YOUR_REPO_TOKEN
- Save your terraform file
main.tf
.
AWS
terraform {
required_providers {
iterative = {
source = "iterative/iterative"
}
}
}
provider "iterative" {}
resource "iterative_machine" "machine" {
repo = "https://github.com/iterative/cml"
driver = "github"
labels = "tf"
cloud = "aws"
region = "us-west"
instance_type = "m"
# Uncomment it if GPU is needed:
# instance_gpu = "tesla"
}
Azure
terraform {
required_providers {
iterative = {
source = "iterative/iterative"
}
}
}
provider "iterative" {}
resource "iterative_machine" "machine" {
repo = "https://github.com/iterative/cml"
driver = "github"
labels = "tf"
cloud = "azure"
region = "us-west"
instance_type = "m"
# Uncomment it if GPU is needed:
# instance_gpu = "tesla"
}
- Launch it!
terraform init
terraform apply --auto-approve
Variable | Values | Default | |
---|---|---|---|
driver |
gitlab github |
The kind of runner that you are setting | |
repo |
The Git repository to subscribe to. | ||
token |
A personal access token. In GitHub, your token must have Workflow and Repository permissions. If not specified, the Iterative Provider looks for the environmental variable CML_REPO | ||
labels |
cml |
Your runner will listen for workflows tagged with this label. Ideal for assigning workflows to select runners. | |
idle-timeout |
5min | The maximum time for the runner to wait for jobs. After timeout, the runner will unregister automatically from the repository and clean up all cloud resources. If set to 0 , the runner will never time out (be warned if you've got a cloud GPU). |
|
cloud |
aws azure |
Sets cloud vendor. | |
region |
us-west us-east eu-west eu-north |
us-west |
Sets the collocation region. AWS or Azure regions are also accepted. |
image |
iterative-cml in AWS Canonical:UbuntuServer:18.04-LTS:latest in Azure |
Sets the image to be used. On AWS, the provider searches the cloud provider by image name (not by id), taking the lastest version if multiple images with the same name are found. Defaults to iterative-cml image. On Azure uses the form Publisher:Offer:SKU:Version |
|
spot |
boolean | false | If true, launch a spot instance |
spot_price |
float with 5 decimals at most | -1 | Sets the maximum price that you are willing to pay by the hour. If not specified, the current spot bidding pricing will be used |
name |
iterative_{UID} | Sets the instance name and related resources based on that name. In Azure, groups everything under a resource group with that name. | |
instance_hdd_size |
10 | Sets the instance hard disk size in GB | |
instance_type |
m , l , xl |
m |
Sets the instance CPU size. You can also specify vendor specific machines in AWS i.e. t2.micro . See equivalences table below. |
instance_gpu |
``, testla , `k80` |
`` | Selects the desired GPU for supported instance_types . |
ssh_private |
An SSH private key in PEM format. If not provided, one private and public key wll be automatically generated and returned in terraform.tfstate |
Setup instructions:
- Setup your provider credentials as ENV variables
AWS
export AWS_SECRET_ACCESS_KEY=YOUR_KEY
export AWS_ACCESS_KEY_ID=YOUR_ID
Azure
export AZURE_CLIENT_ID=YOUR_ID
export AZURE_CLIENT_SECRET=YOUR_SECRET
export AZURE_SUBSCRIPTION_ID=YOUR_SUBSCRIPTION_ID
export AZURE_TENANT_ID=YOUR_TENANT_ID
- Save your terraform file
main.tf
AWS
terraform {
required_providers {
iterative = {
source = "iterative/iterative"
}
}
}
provider "iterative" {}
resource "iterative_machine" "machine" {
cloud = "aws"
region = "us-west"
name = "machine"
instance_hdd_size = "10"
instance_type = "m"
# Uncomment it if GPU is needed:
# instance_gpu = "tesla"
}
Azure
terraform {
required_providers {
iterative = {
source = "iterative/iterative"
}
}
}
provider "iterative" {}
resource "iterative_machine" "machine" {
cloud = "azure"
region = "us-west"
name = "machine"
instance_hdd_size = "10"
instance_type = "m"
## Uncomment it if GPU is needed:
# instance_gpu = "tesla"
}
- Launch your instance
terraform init
terraform apply --auto-approve
- Stop the instance
Run to destroy your instance:
terraform destroy --auto-approve
Variable | Values | Default | |
---|---|---|---|
cloud |
aws azure |
Sets cloud vendor. | |
region |
us-west us-east eu-west eu-north |
us-west |
Sets the collocation region. AWS or Azure regions are also accepted. |
image |
iterative-cml in AWS Canonical:UbuntuServer:18.04-LTS:latest in Azure |
Sets the image to be used. On AWS the provider does a search in the cloud provider by image name not by id, taking the lastest version in case there are many with the same name. Defaults to iterative-cml image. On Azure uses the form Publisher:Offer:SKU:Version | |
name |
iterative_{UID} | Sets the instance name and related resources based on that name. In Azure, groups everything under a resource group with that name. | |
spot |
boolean | false | If true launch a spot instance |
spot_price |
float with 5 decimals at most | -1 | Sets the max price that you are willing to pay by the hour. If not specified, the current spot bidding price will be used. |
instance_hdd_size |
10 | Sets the instance hard disk size in GB | |
instance_type |
m , l , xl |
m |
Sets the instance CPU size. You can also specify vendor specific machines in AWS i.e. t2.micro . See equivalences table below. |
instance_gpu |
``, testla , `k80` |
`` | Sets the desired GPU for supported instance_types . |
ssh_private |
SSH private key in PEM format. If not provided, one private and public key wll be automatically generated and returned in terraform.tfstate | ||
startup_script |
Startup script also known as userData on AWS and customData in Azure. It can be expressed as multiline text using TF heredoc syntax |
To be able to use instance_type
and instance_gpu
, you'll need access to
launch instances from supported cloud vendors. Please
ensure that you have sufficient quotas with your cloud provider for the
instances you intend to provision with Iterative Provider. If you're just
starting out with a new account with a vendor, we recommend trying Iterative
Provider with approved instances, such as the t2.micro
instance for AWS.
Example with native AWS instace type and region
terraform {
required_providers {
iterative = {
source = "iterative/iterative"
version = "0.5.1"
}
}
}
provider "iterative" {}
resource "iterative_machine" "machine" {
region = "us-west-1"
ami = "iterative-cml"
instance_name = "machine"
instance_hdd_size = "10"
instance_type = "t2.micro"
}
The Iterative Provider currently supports AWS and Azure. Google Cloud Platform is not currently supported.
AWS instance equivalences
The instance type in AWS is calculated by joining the instance_type
and
instance_gpu
values.
type | gpu | aws |
---|---|---|
m | m5.2xlarge | |
l | m5.8xlarge | |
xl | m5.16xlarge | |
m | k80 | p2.xlarge |
l | k80 | p2.8xlarge |
xl | k80 | p2.16xlarge |
m | tesla | p3.xlarge |
l | tesla | p3.8xlarge |
xl | tesla | p3.16xlarge |
region | aws |
---|---|
us-west | us-west-1 |
us-east | us-east-1 |
eu-north | us-north-1 |
eu-west | us-west-1 |
Azure instance equivalences
The instance type in Azure is calculated by joining the instance_type
and
instance_gpu
type | gpu | azure |
---|---|---|
m | Standard_F8s_v2 | |
l | Standard_F32s_v2 | |
xl | Standard_F64s_v2 | |
m | k80 | Standard_NC6 |
l | k80 | Standard_NC12 |
xl | k80 | Standard_NC24 |
m | tesla | Standard_NC6s_v3 |
l | tesla | Standard_NC12s_v3 |
xl | tesla | Standard_NC24s_v3 |
region | azure |
---|---|
us-west | westus2 |
us-east | eastus |
eu-north | northeurope |
eu-west | westeurope |
We've created a GPU-ready image based on Ubuntu 18.04. It comes with the following stack already installed:
- Nvidia drivers
- Docker
- Nvidia-docker