This repo contains Ansible and Terraform scripts for spinning up CometBFT test networks on Digital Ocean (DO).
After you have all the prerequisites installed:
-
Set up your personal access token for DO
doctl auth init
If you have executed this and the following steps before, you may be able to skip to step 5. And if your token expired, you may need to force the use of the one you just generated here by using
doctl auth init -t <new token>
instead.doctl auth init -t dop_v1_0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef
-
Get the fingerprint of the SSH key you want to be associated with the root user on the created VMs
doctl compute ssh-key list
-
Set up your Digital Ocean credentials as Terraform variables. Be sure to write them to
./tf/terraform.tfvars
as this file is ignored in.gitignore
.cat <<EOF > ./tf/terraform.tfvars do_token = "dop_v1_0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef" ssh_keys = ["ab:cd:ef:01:23:45:67:89:ab:cd:ef:01:23:45:67:89"] EOF
-
Initialize Terraform (only needed once)
make terraform-init
After you have set up the infrastructure:
-
Set up the test you will run in the
experiment.mk
file:- Set the path to your manifest file in the variable
MANIFEST
. - Set the commit hash of CometBFT that you to install in the nodes in the variable
VERSION_TAG
. - If you want to deploy a subset of the validators with a different version of CometBFT, set
the variable
VERSION2_TAG
to the commit hash you want to install in that subset. Then set the proportion of nodes that will runVERSION_TAG
andVERSION2_TAG
in the variablesVERSION_WEIGHT
andVERSION2_WEIGHT
respectively. - If necessary, set the variables
DO_INSTANCE_TAGNAME
andDO_VPC_SUBNET
to customized values to prevent collisions with other QA runs, including possible other users of the DigitalOcean project who might be running these scripts. If the subnet is allocated in the private IP address range 172.16.0.0/12, as it is in the unmodified file, a good choice should be in the range 172.16.16.0/20 - 172.31.240.0/20. You may also need to rename the DO projectcmt-testnet
in thetf/project.tf
file to a unique name.
- Set the path to your manifest file in the variable
-
Create the VMs for the validators and Prometheus as specified in the manifest file. Be sure to use your actual DO token and SSH key fingerprints for the
do_token
anddo_ssh_keys
variables.make terraform-apply
After creating the DO droplets, this command will generate two files with information about the IP addresses of the nodes: an Ansible inventory file
./ansible/hosts
, and./ansible/testnet/infrastructure-data.json
for E2E'srunner
tool.Note that installing packages defined in
tf/user-data.txt
may take more time than expected. It's possible that the installation process has not yet finished even when DO says that droplets have been created successfully. -
Generate the testnet configuration
make configgen
-
Install all necessary software on the created VMs using Ansible
make ansible-install
-
Initialize the Prometheus instance
make prometheus-init
-
Start the test application on all of the validators
make start-network
Initialize the load-runner node, if not it's yet running:
make loadrunners-init
The following command will start sending load until Ctrl-C is sent, so consider running this in its own terminal:
make runload
-
Once the execution is over, stop the network:
make stop-network
-
Retrieve the data produced during the execution. You can either use the following command to retrieve both the prometheus and the blockstore databases together
make retrieve-data
To retrieve them independently use the following for prometheus, which will retrieve the data from all nodes.
make retrieve-prometheus-data
For blockstore, use the following. Here, notice that the target node from which the data is retrieved can be changed via the environment variable
RETRIEVE_TARGET_HOST
."any"
(default) - retrieve from one random validator from the inventory."all"
- retrieve from all nodes (very slow!);- set it to the exact name of a validator to retrieve from that particular validator.
make retrieve-blockstore
If you need to restart the running experiment, run the following command:
make restart
This command will delete all of the prometheus data, and re-initialize the nodes on the network. The nodes will restart with the same configuration files and IDs that they previously used, but all of their data will be deleted and reset.
Do not forget to destroy the experiment to stop charging.
make terraform-destroy
You may want to keep running some nodes to retrieve data from them and destroy the others. The following commands will destroy all nodes except the Prometheus node and the last validator.
cd tf && terraform state rm digitalocean_droplet.testnet-prometheus digitalocean_droplet.testnet-node[199]
make terraform-destroy
Once you've completed setting up the network, take a look at your
ansible/hosts
file for the IP address of the Prometheus server, then navigate
to that address on port 9090 in your web browser in order to query collected
metrics and view their associated graphs.