HPC tool kit demo failed in second attempt! #1175
Replies: 1 comment 1 reply
-
In general this is a sign that something was not destroyed properly from your original deployment. Calling
To condense the error message, the leftover infrastructure is:
Items 1 & 2 can be viewed and deleted ( Items 3-9 can be viewed and deleted at https://console.cloud.google.com/compute/metadata. Unfortunately, none of the items 1-9 show up when you search cloud console, so it is understandable that you did not find them in search. There may be more items that need to be cleaned up once these are deleted. I would suggest cleaning up the items above and then try to deploy the cluster again, if there are additional conflicts they will be listed. Alternatively, if you are able, it can sometimes be a good practice to create a new project for test deployments so when you are done you can simply delete the project and make sure nothing is left behind. This is by no means required. When used properly, the terraform destroy command you used should clean up all of these artifacts. |
Beta Was this translation helpful? Give feedback.
-
Hi List
I have a basic question: I tried HPC Toolkit and followed 'guide me' option to build a HPC as it is defined in the tutorial page.
My First attempt was successful but after a week when I tried again the same procedure it gave ERROR messages after the commands:
./ghpc create community/examples/slurm-gcp-v5-hpc-centos7.yaml -l IGNORE --vars project_id=gcpbillcheking1 (it was OK!)
terraform -chdir=slurm-gcp-v5/primary init (OK!)
terraform -chdir=slurm-gcp-v5/primary validate (OK!)
terraform -chdir=slurm-gcp-v5/primary apply (Eneterd: yes during its execution, but it gave error)
The part of ERROR message is:
EOT
+ name = (known after apply)
+ name_pref = "slurmgcpv5-compute-debug-ghpc-"
+ project = "gcpbillcheking1"
+ region = (known after apply)
+ self_link = (known after apply)
+ tags = [
+ "slurmgcpv5",
]
+ tags_fingerprint = (known after apply)
Plan: 30 to add, 0 to change, 0 to destroy.
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
module.slurm_controller.module.slurm_controller_instance.random_uuid.cluster_id: Creating...
module.slurm_controller.module.slurm_controller_instance.random_uuid.cluster_id: Creation complete after 0s [id=4100620d-13f1-f573-6f1c-295827941298]
module.slurm_controller.module.slurm_controller_instance.random_string.topic_suffix: Creating...
module.slurm_login.module.slurm_login_instance.random_string.suffix: Creating...
module.homefs.random_id.resource_name_suffix: Creating...
module.homefs.random_id.resource_name_suffix: Creation complete after 0s [id=Bj989g]
module.slurm_controller.module.slurm_controller_instance.random_string.topic_suffix: Creation complete after 0s [id=qQrAi9Mu]
module.slurm_login.module.slurm_login_instance.random_string.suffix: Creation complete after 0s [id=ugd5x6pt]
module.network1.module.vpc.module.vpc.google_compute_network.network: Creating...
module.compute_partition.module.slurm_partition.google_compute_project_metadata_item.partition_startup_scripts["ghpc_startup_sh"]: Creating...
module.slurm_controller.module.slurm_controller_instance.google_compute_project_metadata_item.slurmdbd_conf: Creating...
module.slurm_controller.module.slurm_controller_instance.google_compute_project_metadata_item.controller_startup_scripts["ghpc_startup_sh"]: Creating...
module.slurm_controller.module.slurm_controller_instance.google_compute_project_metadata_item.slurm_conf: Creating...
module.slurm_controller.module.slurm_controller_instance.google_compute_project_metadata_item.cgroup_conf: Creating...
module.slurm_controller.module.slurm_controller_instance.google_compute_project_metadata_item.compute_startup_scripts["ghpc_startup_sh"]: Creating...
module.network1.module.nat_ip_addresses["us-central1"].google_compute_address.ip[1]: Creating...
module.network1.module.nat_ip_addresses["us-central1"].google_compute_address.ip[0]: Creating...
module.debug_partition.module.slurm_partition.google_compute_project_metadata_item.partition_startup_scripts["ghpc_startup_sh"]: Creating...
module.slurm_login.module.slurm_login_instance.google_compute_project_metadata_item.login_startup_scripts["ghpc_startup_sh"]: Creating...
module.network1.module.vpc.module.vpc.google_compute_network.network: Still creating... [10s elapsed]
module.slurm_login.module.slurm_login_instance.google_compute_project_metadata_item.login_startup_scripts["ghpc_startup_sh"]: Still creating... [10s elapsed]
module.network1.module.vpc.module.vpc.google_compute_network.network: Still creating... [20s elapsed]
module.slurm_login.module.slurm_login_instance.google_compute_project_metadata_item.login_startup_scripts["ghpc_startup_sh"]: Still creating... [20s elapsed]
module.network1.module.vpc.module.vpc.google_compute_network.network: Creation complete after 23s [id=projects/gcpbillcheking1/global/networks/slurm-gcp-v5-net]
module.network1.module.cloud_router["us-central1"].google_compute_router.router: Creating...
module.network1.module.vpc.module.subnets.google_compute_subnetwork.subnetwork["us-central1/slurm-gcp-v5-primary-subnet"]: Creating...
module.network1.module.vpc.module.firewall_rules.google_compute_firewall.rules["slurm-gcp-v5-net-fw-allow-iap-ssh-ingress"]: Creating...
module.network1.module.vpc.module.firewall_rules.google_compute_firewall.rules["slurm-gcp-v5-net-fw-allow-internal-traffic"]: Creating...
module.slurm_login.module.slurm_login_instance.google_compute_project_metadata_item.login_startup_scripts["ghpc_startup_sh"]: Creation complete after 24s [id=slurmgcpv5-slurm-login_ugd5x6pt-script-ghpc_startup_sh]
module.slurm_login.module.slurm_login_instance.module.slurm_login_instance.data.local_file.startup: Reading...
module.slurm_login.module.slurm_login_instance.module.slurm_login_instance.data.local_file.startup: Read complete after 0s [id=de68d872e4df054209706dbeee9bfec9dca89970]
module.network1.module.cloud_router["us-central1"].google_compute_router.router: Creation complete after 3s [id=projects/gcpbillcheking1/regions/us-central1/routers/slurm-gcp-v5-net-router]
module.network1.module.vpc.module.subnets.google_compute_subnetwork.subnetwork["us-central1/slurm-gcp-v5-primary-subnet"]: Still creating... [10s elapsed]
module.network1.module.vpc.module.firewall_rules.google_compute_firewall.rules["slurm-gcp-v5-net-fw-allow-internal-traffic"]: Still creating... [10s elapsed]
module.network1.module.vpc.module.firewall_rules.google_compute_firewall.rules["slurm-gcp-v5-net-fw-allow-iap-ssh-ingress"]: Still creating... [10s elapsed]
module.network1.module.vpc.module.firewall_rules.google_compute_firewall.rules["slurm-gcp-v5-net-fw-allow-internal-traffic"]: Creation complete after 12s [id=projects/gcpbillcheking1/global/firewalls/slurm-gcp-v5-net-fw-allow-internal-traffic]
module.network1.module.vpc.module.firewall_rules.google_compute_firewall.rules["slurm-gcp-v5-net-fw-allow-iap-ssh-ingress"]: Creation complete after 12s [id=projects/gcpbillcheking1/global/firewalls/slurm-gcp-v5-net-fw-allow-iap-ssh-ingress]
module.network1.module.vpc.module.subnets.google_compute_subnetwork.subnetwork["us-central1/slurm-gcp-v5-primary-subnet"]: Creation complete after 15s [id=projects/gcpbillcheking1/regions/us-central1/subnetworks/slurm-gcp-v5-primary-subnet]
╷
│ Error: Error creating Address: googleapi: Error 409: The resource 'projects/gcpbillcheking1/regions/us-central1/addresses/slurm-gcp-v5-net-nat-ips-us-central1-0' already exists, alreadyExists
│
│ with module.network1.module.nat_ip_addresses["us-central1"].google_compute_address.ip[0],
│ on .terraform/modules/network1.nat_ip_addresses/main.tf line 52, in resource "google_compute_address" "ip":
│ 52: resource "google_compute_address" "ip" {
│
╵
╷
│ Error: Error creating Address: googleapi: Error 409: The resource 'projects/gcpbillcheking1/regions/us-central1/addresses/slurm-gcp-v5-net-nat-ips-us-central1-1' already exists, alreadyExists
│
│ with module.network1.module.nat_ip_addresses["us-central1"].google_compute_address.ip[1],
│ on .terraform/modules/network1.nat_ip_addresses/main.tf line 52, in resource "google_compute_address" "ip":
│ 52: resource "google_compute_address" "ip" {
│
╵
╷
│ Error: key "slurmgcpv5-slurm-partition-compute-script-ghpc_startup_sh" already present in metadata for project "gcpbillcheking1". Use
terraform import
to manage it with Terraform│
│ with module.compute_partition.module.slurm_partition.google_compute_project_metadata_item.partition_startup_scripts["ghpc_startup_sh"],
│ on .terraform/modules/compute_partition.slurm_partition/terraform/slurm_cluster/modules/slurm_partition/main.tf line 168, in resource "google_compute_project_metadata_item" "partition_startup_scripts":
│ 168: resource "google_compute_project_metadata_item" "partition_startup_scripts" {
│
╵
╷
│ Error: key "slurmgcpv5-slurm-partition-debug-script-ghpc_startup_sh" already present in metadata for project "gcpbillcheking1". Use
terraform import
to manage it with Terraform│
│ with module.debug_partition.module.slurm_partition.google_compute_project_metadata_item.partition_startup_scripts["ghpc_startup_sh"],
│ on .terraform/modules/debug_partition.slurm_partition/terraform/slurm_cluster/modules/slurm_partition/main.tf line 168, in resource "google_compute_project_metadata_item" "partition_startup_scripts":
│ 168: resource "google_compute_project_metadata_item" "partition_startup_scripts" {
│
╵
╷
│ Error: key "slurmgcpv5-slurm-tpl-slurm-conf" already present in metadata for project "gcpbillcheking1". Use
terraform import
to manage it with Terraform│
│ with module.slurm_controller.module.slurm_controller_instance.google_compute_project_metadata_item.slurm_conf,
│ on .terraform/modules/slurm_controller.slurm_controller_instance/terraform/slurm_cluster/modules/slurm_controller_instance/main.tf line 193, in resource "google_compute_project_metadata_item" "slurm_conf":
│ 193: resource "google_compute_project_metadata_item" "slurm_conf" {
│
╵
╷
│ Error: key "slurmgcpv5-slurm-tpl-cgroup-conf" already present in metadata for project "gcpbillcheking1". Use
terraform import
to manage it with Terraform│
│ with module.slurm_controller.module.slurm_controller_instance.google_compute_project_metadata_item.cgroup_conf,
│ on .terraform/modules/slurm_controller.slurm_controller_instance/terraform/slurm_cluster/modules/slurm_controller_instance/main.tf line 200, in resource "google_compute_project_metadata_item" "cgroup_conf":
│ 200: resource "google_compute_project_metadata_item" "cgroup_conf" {
│
╵
╷
│ Error: key "slurmgcpv5-slurm-tpl-slurmdbd-conf" already present in metadata for project "gcpbillcheking1". Use
terraform import
to manage it with Terraform│
│ with module.slurm_controller.module.slurm_controller_instance.google_compute_project_metadata_item.slurmdbd_conf,
│ on .terraform/modules/slurm_controller.slurm_controller_instance/terraform/slurm_cluster/modules/slurm_controller_instance/main.tf line 207, in resource "google_compute_project_metadata_item" "slurmdbd_conf":
│ 207: resource "google_compute_project_metadata_item" "slurmdbd_conf" {
│
╵
╷
│ Error: key "slurmgcpv5-slurm-controller-script-ghpc_startup_sh" already present in metadata for project "gcpbillcheking1". Use
terraform import
to manage it with Terraform│
│ with module.slurm_controller.module.slurm_controller_instance.google_compute_project_metadata_item.controller_startup_scripts["ghpc_startup_sh"],
│ on .terraform/modules/slurm_controller.slurm_controller_instance/terraform/slurm_cluster/modules/slurm_controller_instance/main.tf line 231, in resource "google_compute_project_metadata_item" "controller_startup_scripts":
│ 231: resource "google_compute_project_metadata_item" "controller_startup_scripts" {
│
╵
╷
│ Error: key "slurmgcpv5-slurm-compute-script-ghpc_startup_sh" already present in metadata for project "gcpbillcheking1". Use
terraform import
to manage it with Terraform│
│ with module.slurm_controller.module.slurm_controller_instance.google_compute_project_metadata_item.compute_startup_scripts["ghpc_startup_sh"],
│ on .terraform/modules/slurm_controller.slurm_controller_instance/terraform/slurm_cluster/modules/slurm_controller_instance/main.tf line 243, in resource "google_compute_project_metadata_item" "compute_startup_scripts":
│ 243: resource "google_compute_project_metadata_item" "compute_startup_scripts" {
Please also note that I deleted the previous instances manually and ran the following command to clean it:
terraform -chdir=slurm-gcp-v5/primary destroy -auto-approve
Also before this second attempt I searched the 'slurm*' to get back any previous traces - BUT IT WAS ABSENT !
So I guess that some terraform data is missing and please help us to resolve this!
Thanks in advance
Krishna
Beta Was this translation helpful? Give feedback.
All reactions