Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: REMOVE_HOST operation validation failed #201

Closed
4 tasks done
pradeep17j opened this issue Jul 24, 2024 · 5 comments
Closed
4 tasks done

Error: REMOVE_HOST operation validation failed #201

pradeep17j opened this issue Jul 24, 2024 · 5 comments
Labels
bug Bug

Comments

@pradeep17j
Copy link

pradeep17j commented Jul 24, 2024

Code of Conduct

  • I have read and agree to the Code of Conduct.
  • Vote on this issue by adding a 👍 reaction to the original issue initial description to help the maintainers prioritize.
  • Do not leave "+1" or other comments that do not add relevant information or questions.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Terraform

1.9.2

Terraform Provider

0.9.1

VMware Cloud Foundation

5.2

Description

Remove host operation on a cluster fails with error message below.

Message: REMOVE_HOST operation validation failed due to: Cluster HA/DRS: Check if cluster has HA & DRS enabled

Remediation Message:

Reference Token: PJUHO6

Cause: 

This happens because the provider by default sets the high_availability_enabled": false,
Below is the snippet of terraform.tfstate file which shows that high_availability_enabled=False.

The provider should set this to true.

{
  "version": 4,
  "terraform_version": "1.9.2",
  "serial": 20,
  "lineage": "1a400f44-29a9-8c66-e115-5f240a8fdca6",
  "outputs": {},
  "resources": [
    {
      "mode": "managed",
      "type": "vcf_domain",
      "name": "workload_domain1",
      "provider": "provider[\"registry.terraform.io/vmware/vcf\"]",
      "instances": [
        {
          "schema_version": 0,
          "attributes": {
            "cluster": [
              {
                "cluster_image_id": "",
                "evc_mode": "",
                "geneve_vlan_id": 200,
                "high_availability_enabled": false,
                "host": [
                  {

Affected Resources or Data Sources

r/workload_domain

Terraform Configuration

resource "vcf_domain" "workload_domain1" {
  name = "sfo-w01-vc01"

  #vCenter Settings
  vcenter_configuration {
    name            = var.wld1_vcenter_configuration["name"]
    fqdn            = var.wld1_vcenter_configuration["fqdn"]
    vm_size         = var.wld1_vcenter_configuration["vm_size"]
    storage_size    = var.wld1_vcenter_configuration["storage_size"]
    ip_address      = var.wld1_vcenter_configuration["ip"]
    subnet_mask     = var.wld1_vcenter_configuration["subnet_mask"]
    gateway         = var.wld1_vcenter_configuration["gateway"]
    datacenter_name = "${var.workload_domain1_name}-dc"
    root_password   = var.wld_passwords["vcenter"]
  }

Debug Output

vcf_domain.workload_domain1: Modifying... [id=7eaebd4a-d7da-4de0-81c5-2c53faedefb3]
vcf_domain.workload_domain1: Still modifying... [id=7eaebd4a-d7da-4de0-81c5-2c53faedefb3, 10s elapsed]
vcf_domain.workload_domain1: Still modifying... [id=7eaebd4a-d7da-4de0-81c5-2c53faedefb3, 20s elapsed]
vcf_domain.workload_domain1: Still modifying... [id=7eaebd4a-d7da-4de0-81c5-2c53faedefb3, 30s elapsed]
vcf_domain.workload_domain1: Still modifying... [id=7eaebd4a-d7da-4de0-81c5-2c53faedefb3, 40s elapsed]
vcf_domain.workload_domain1: Still modifying... [id=7eaebd4a-d7da-4de0-81c5-2c53faedefb3, 50s elapsed]
vcf_domain.workload_domain1: Still modifying... [id=7eaebd4a-d7da-4de0-81c5-2c53faedefb3, 1m0s elapsed]

│ Error: Task with ID = e2ece250-7875-4cd8-a67d-7082faacafdc , Name: "Removing host(s) from cluster" Type: "CLUSTER_COMPACTION" is in state Failed

│   with vcf_domain.workload_domain1,
│   on wld1.tf line 134, in resource "vcf_domain" "workload_domain1":
│  134: resource "vcf_domain" "workload_domain1" {

Panic Output

No response

Expected Behavior

Remove host operation on a cluster should have succeeded.

Actual Behavior

Fails to remove because by default the vSphere HA is OFF.

Steps to Reproduce

  • Create a workload_domain
  • Add hosts to the cluster
  • Remove host from cluster -- > Fails here

Environment Details

No response

Screenshots

No response

References

No response

@pradeep17j pradeep17j added bug Bug needs-triage Needs Triage labels Jul 24, 2024
@github-actions github-actions bot added the pending-review Pending Review label Jul 24, 2024
@tenthirtyam
Copy link
Collaborator

Please use the markdown editor to formal the hcl and console output for readability

Reference: https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks

@tenthirtyam tenthirtyam removed the pending-review Pending Review label Jul 30, 2024
@tenthirtyam tenthirtyam added this to the Backlog milestone Jul 30, 2024
@tenthirtyam
Copy link
Collaborator

Proper formatting applied for maintainer readability and historical context.

@tenthirtyam tenthirtyam changed the title Workload domain creation happens with high_availability_enabled=False , this causes issues while REMOVE_HOST workflow Error: REMOVE_HOST operation validation failed Jul 30, 2024
@spacegospod
Copy link
Contributor

high_availability_enabled is optional and simply setting it to true should work around the problem.

As for the default value - it should stay false.
This is an optional property according to the documentation https://developer.broadcom.com/xapis/vmware-cloud-foundation-api/latest/data-structures/AdvancedOptions/
Furthermore enabling HA on a cluster should be a concious decision and should be specified explicitly by the user.

@tenthirtyam tenthirtyam removed the needs-triage Needs Triage label Aug 8, 2024
@tenthirtyam
Copy link
Collaborator

Closing per #201 (comment).

@tenthirtyam tenthirtyam closed this as not planned Won't fix, can't repro, duplicate, stale Aug 13, 2024
@tenthirtyam tenthirtyam removed this from the Backlog milestone Aug 13, 2024
Copy link

I'm going to lock this issue because it has been closed for 30 days. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bug
Projects
None yet
Development

No branches or pull requests

3 participants