Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
gojeaqui committed Oct 16, 2020
0 parents commit 5d111e5
Show file tree
Hide file tree
Showing 15 changed files with 1,935 additions and 0 deletions.
117 changes: 117 additions & 0 deletions ADD_NODE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Adding Cluster nodes after installation

https://docs.openshift.com/container-platform/4.3/installing/installing_vsphere/installing-vsphere.html#installation-approve-csrs_installing-vsphere

Once the Cluster has been installed and is in production you have to be careful with terraform apply, because if there is Dynamic Provisioning in use then some disks might be attached to Cluster nodes that terraform does not know about.

To deal with the drift caused by the Dynamic Provisioning the strategy is to use the terraform apply -target option.

## Creating the node with terraform
First add the new node the terraform.tfvars configuration file also increasing the node count.

### Terraform manual process:
Check the terraform state to get the name of the target for the next node that you want to add.

Example:
```
[root@bastion ~]# terraform state list | grep virtual_machine.vm
module.compute.vsphere_virtual_machine.vm[0]
module.compute.vsphere_virtual_machine.vm[1]
module.control_plane.vsphere_virtual_machine.vm[0]
module.control_plane.vsphere_virtual_machine.vm[1]
module.control_plane.vsphere_virtual_machine.vm[2]
```

So, in order to add a 3rd worker node execute this terraform apply:
```
terraform apply -target=module.compute.vsphere_virtual_machine.vm[2]
```

Remember to add the new node(s) to the DHCP server.

### Terraform automatic process:
Run config-gen.py add terraform.tfvars and follow the prompts, it will give you the terraform apply command to run

## Adding the created node
Wait until the node gets to the login prompt

Check to see if there are CSRs to approve
```
[root@bastion ~]# oc get csr
NAME AGE REQUESTOR CONDITION
csr-f4vdb 2m53s system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
csr-hkhsh 16m system:node:infra-0.ocp4.example.com Approved,Issued
csr-kcp9g 15m system:node:infra-0.ocp4.example.com Approved,Issued
csr-kt6sb 55m system:node:master-1.ocp4.example.com Approved,Issued
csr-zvblm 42m system:node:master-2.ocp4.example.com Approved,Issued
```

Aprobe the CSRs
```
[root@bastion ~]# oc adm certificate approve csr-f4vdb
```

Check again, the first time the node appears as: "system:serviceaccount:openshift-machine-config-operator:node-bootstrapper"
```
[root@bastion html]# oc get csr
NAME AGE REQUESTOR CONDITION
csr-9dg9f 79m system:node:master-2.ocp4.example.com Approved,Issued
csr-9f7hx 25s system:node:worker-0.ocp4.example.com Pending
csr-bkv6s 111m system:node:infra-2.ocp4.example.com Approved,Issued
...
```

If there are many CSR there is a faster way to approve them all:
```
oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
```

Check the pods being created on the new nodes
```
[root@bastion html]# oc get pods -A -o wide | grep worker-0
openshift-cluster-node-tuning-operator tuned-nkshd 0/1 ContainerCreating 0 4m50s 10.76.54.191 worker-0.ocp4.example.com
openshift-monitoring node-exporter-jzhq9 0/2 Init:0/1 0 4m50s 10.76.54.191 worker-0.ocp4.example.com
openshift-multus multus-8hfcr 0/1 Init:0/5 0 4m50s 10.76.54.191 worker-0.ocp4.example.com
openshift-sdn ovs-2l7wf 0/1 ContainerCreating 0 4m50s 10.76.54.191 worker-0.ocp4.example.com
openshift-sdn sdn-2x5vd 0/1 Init:0/1 0 4m50s 10.76.54.191 worker-0.ocp4.example.com
```

Check that the node gets addedd to the list of nodes
```
[root@bastion ~]# oc get nodes
NAME STATUS ROLES AGE VERSION
infra-0.ocp4.example.com Ready worker 17h v1.16.2
infra-1.ocp4.example.com Ready worker 17h v1.16.2
infra-2.ocp4.example.com NotReady worker 7s v1.16.2
master-0.ocp4.example.com Ready master 17h v1.16.2
master-1.ocp4.example.com Ready master 17h v1.16.2
master-2.ocp4.example.com Ready master 17h v1.16.2
```

Eventually the node is restarted and addedd to the cluster in the "Ready" state
```
[root@bastion ~]# oc get nodes
NAME STATUS ROLES AGE VERSION
infra-0.ocp4.example.com Ready worker 17h v1.16.2
infra-1.ocp4.example.com Ready worker 17h v1.16.2
infra-2.ocp4.example.com Ready worker 2m13s v1.16.2
master-0.ocp4.example.com Ready master 17h v1.16.2
master-1.ocp4.example.com Ready master 17h v1.16.2
master-2.ocp4.example.com Ready master 17h v1.16.2
```

If the new node was and infra node, it is necessary to tag it as such:
```
oc label node infra-2.ocp4.example.com node-role.kubernetes.io/infra=""
```

And remove the worker tag:
```
oc label node infra-2.ocp4.example.com node-role.kubernetes.io/worker-
```

Verify that the new nodes are tagged as infra and not worker:
```
oc get nodes
```

48 changes: 48 additions & 0 deletions DEL_NODE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Deleting Cluster nodes after installation

https://docs.openshift.com/container-platform/3.11/admin_guide/manage_nodes.html#deleting-nodes

Once the Cluster has been installed and is in production you have to be careful with terraform apply, because if there is Dynamic Provisioning in use then some disks might be attached to Cluster nodes that terraform does not know about.

To deal with the drift caused by the Dynamic Provisioning the strategy is to use the terraform apply -target option.

### List the nodes in the OpenShift Cluster
```
[openshift@bastion ~]$ oc get nodes
NAME STATUS ROLES AGE VERSION
master-0.ocp4.example.com Ready master 8d v1.16.2
master-1.ocp4.example.com Ready master 8d v1.16.2
master-2.ocp4.example.com Ready master 8d v1.16.2
infra-0.ocp4.example.com Ready infra 8d v1.16.2
infra-1.ocp4.example.com Ready infra 8d v1.16.2
infra-2.ocp4.example.com Ready infra 8d v1.16.2
worker-0.ocp4.example.com Ready worker 8d v1.16.2
worker-1.ocp4.example.com Ready worker 8d v1.16.2
worker-2.ocp4.example.com Ready worker 8d v1.16.2
```

### Remove the node from the OpenShift Cluster
```
[root@bastion ~]$ oc delete node worker-2.ocp4.example.com
```

After running this command you can remove the node through VMWare or through terraform

### Terraform manual process:
Check the terraform state to get the name of the target for the node that you want to remove.

Example:
```
[root@bastion ~]$ terraform state list | grep virtual_machine.vm
module.compute.vsphere_virtual_machine.vm[0]
module.compute.vsphere_virtual_machine.vm[1]
module.control_plane.vsphere_virtual_machine.vm[0]
module.control_plane.vsphere_virtual_machine.vm[1]
module.control_plane.vsphere_virtual_machine.vm[2]
```

So, in order to remove the 3rd worker node execute this terraform destroy:
```
terraform destroy -target=module.compute.vsphere_virtual_machine.vm[2]
```

5 changes: 5 additions & 0 deletions OWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# See the OWNERS docs: https://git.k8s.io/community/contributors/guide/owners.md
# This file just uses aliases defined in OWNERS_ALIASES.

approvers:
- vsphere-approvers
137 changes: 137 additions & 0 deletions PERMISSIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
## Required Privileges (terraform)

In order to use Terraform provider as non priviledged user, some Roles within vCenter must be assigned the following privileges:

- Datastore (Role: ocp-terraform-datastore)
- Allocate space
- Low level file operations
- Profile-driven storage (Role: ocp-terraform-vcenter)
- Profile-driven storage view
- Network (Role: ocp-terraform-network)
- Assign network
- Resource (Role: ocp-terraform-resource)
- Assign vApp to resource pool
- Assign virtual machine to resource pool
- vApp (Role: ocp-terraform-vm)
- Clone
- View OVF environment
- vApp application configuration
- vApp instance configuration
- vApp resource configuration
- Virtual machine (Role: ocp-terraform-vm)
- Change Configuration (all)
- Edit Inventory (all)
- Guest operations (all)
- Interaction (all)
- Provisioning (all)

And these roles have to be given permission on the following entities:
Role | Entity | Propagate to Children | Description
---- | ------ | --------- | -----------
ocp-terraform-vm | VM Folder | Yes | The folder where VMs will be alocated
ocp-terraform-vm | Virtual Machine | No | The OVA template that will be cloned
ocp-terraform-network | VM Network | No | The VM Network the VMs will attach to
ocp-terraform-datastore | Datastore | No | The Datastore where the VMs disk0 will reside
ocp-terraform-resource | Resource Pool | No | The Resource Pool the VMs will we added to
ocp-terraform-vcenter | vCenter | No | Profile-driven storage view
Read-Only (System) | Virtual Switch | No | The Distributed Virtual Switch (\*)

(\*) If the VM Network is going to be on a Distributed Virtual Switch then this permissions needs to be applied as well

Command line example:
```
# CLI Role creation
govc role.create ocp-terraform-network Network.Assign
govc role.create ocp-terraform-datastore Datastore.AllocateSpace Datastore.FileManagement
govc role.create ocp-terraform-vcenter StorageProfile.View
govc role.create ocp-terraform-resource Resource.AssignVAppToPool Resource.AssignVMToPool
govc role.create ocp-terraform-vm \
VApp.ApplicationConfig VApp.Clone VApp.ExtractOvfEnvironment VApp.InstanceConfig VApp.ResourceConfig \
Folder.Create Folder.Delete \
VirtualMachine.Config.AddNewDisk VirtualMachine.Config.AdvancedConfig VirtualMachine.Config.CPUCount \
VirtualMachine.Config.DiskExtend VirtualMachine.Config.EditDevice VirtualMachine.Config.Memory \
VirtualMachine.Config.Rename VirtualMachine.Config.Resource VirtualMachine.Config.Settings \
VirtualMachine.GuestOperations.Execute VirtualMachine.GuestOperations.Modify VirtualMachine.GuestOperations.ModifyAliases \
VirtualMachine.GuestOperations.Query VirtualMachine.GuestOperations.QueryAliases \
VirtualMachine.Interact.ConsoleInteract VirtualMachine.Interact.GuestControl VirtualMachine.Interact.Pause \
VirtualMachine.Interact.PowerOff VirtualMachine.Interact.PowerOn VirtualMachine.Interact.Reset \
VirtualMachine.Interact.SetCDMedia VirtualMachine.Interact.Suspend VirtualMachine.Interact.ToolsInstall \
VirtualMachine.Inventory.Create VirtualMachine.Inventory.CreateFromExisting VirtualMachine.Inventory.Delete \
VirtualMachine.Inventory.Move VirtualMachine.Inventory.Register VirtualMachine.Inventory.Unregister \
VirtualMachine.Provisioning.Clone VirtualMachine.Provisioning.CloneTemplate VirtualMachine.Provisioning.CreateTemplateFromVM \
VirtualMachine.Provisioning.Customize VirtualMachine.Provisioning.DeployTemplate VirtualMachine.Provisioning.DiskRandomAccess \
VirtualMachine.Provisioning.DiskRandomRead VirtualMachine.Provisioning.FileRandomAccess VirtualMachine.Provisioning.GetVmFiles \
VirtualMachine.Provisioning.MarkAsTemplate VirtualMachine.Provisioning.MarkAsVM VirtualMachine.Provisioning.ModifyCustSpecs \
VirtualMachine.Provisioning.PromoteDisks VirtualMachine.Provisioning.PutVmFiles VirtualMachine.Provisioning.ReadCustSpecs
# CLI Permissions set
$USER = "[email protected]"
$FOLDER = "openshift/ocp"
$DATACENTER = "Datacenter"
$DATASTORE = "Datastore"
$NETWORK = "VM Network"
$RESOURCE = "openshift"
govc permissions.set -principal $USER -role ocp-terraform-vm -propagate=true "/$DATACENTER/vm/$FOLDER"
govc permissions.set -principal $USER -role ocp-terraform-vm -propagate=false "/$DATACENTER/vm/templates/rhcos"
govc permissions.set -principal $USER -role ocp-terraform-network -propagate=false "/$DATACENTER/network/$NETWORK"
govc permissions.set -principal $USER -role ocp-terraform-datastore -propagate=false "/$DATACENTER/datastore/$DATASTORE"
govc permissions.set -principal $USER -role ocp-terraform-resource -propagate=false "/$DATACENTER/host/Cluster/Resources/$RESOURCE"
govc permissions.set -principal $USER -role ocp-terraform-vcenter -propagate=false "/"
```

The config-gen.py script generates the commands needed to create these roles and assign them to the corresponding vCenter objects.

These settings have been tested with:
- [vSphere 6.7](https://pubs.vmware.com/vsphere-60/index.jsp?topic=%2Fcom.vmware.vsphere.security.doc%2FGUID-18071E9A-EED1-4968-8D51-E0B4F526FDA3.html)
- [vSphere 6.0](https://pubs.vmware.com/vsphere-60/index.jsp?topic=%2Fcom.vmware.vsphere.security.doc%2FGUID-18071E9A-EED1-4968-8D51-E0B4F526FDA3.html)
- [vSphere 5.5](https://pubs.vmware.com/vsphere-55/index.jsp?topic=%2Fcom.vmware.vsphere.security.doc%2FGUID-18071E9A-EED1-4968-8D51-E0B4F526FDA3.html).

## Required Privileges (dynamic provisioning)
[Permissions | vSphere Storage for Kubernetes](https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/vcp-roles.html)

Command line example:
```
# CLI Role creation
# StorageProfile.View (Profile-driven storage view) at the vCenter level
govc role.create k8s-system-read-and-spbm-profile-view StorageProfile.View
# Low level file operations on the datastore
govc role.create manage-k8s-volumes Datastore.AllocateSpace Datastore.FileManagement
# Virtual Machine Privileges
govc role.create manage-k8s-node-vms \
Resource.AssignVMToPool \
VirtualMachine.Config.AddExistingDisk \
VirtualMachine.Config.AddNewDisk \
VirtualMachine.Config.AddRemoveDevice \
VirtualMachine.Config.RemoveDisk \
VirtualMachine.Inventory.Create \
VirtualMachine.Inventory.Delete \
VirtualMachine.Config.Settings
# CLI Permissions set
$USER = "[email protected]"
$FOLDER = "openshift/ocp"
$DATACENTER = "Datacenter"
$DATASTORE = "Datastore"
$NETWORK = "VM Network"
# Read-only permissions
govc permissions.set -principal $USER -role ReadOnly -propagate=false "/$DATACENTER"
govc permissions.set -principal $USER -role ReadOnly -propagate=false "/$DATACENTER/datastore/$DATASTORE"
govc permissions.set -principal $USER -role ReadOnly -propagate=false "/$DATACENTER/host/$HOST"
govc permissions.set -principal $USER -role ReadOnly -propagate=false "/$DATACENTER/vm/$FOLDER"
govc permissions.set -principal $USER -role ReadOnly -propagate=false "/$DATACENTER/network/$NETWORK"
govc permissions.set -principal $USER -role k8s-system-read-and-spbm-profile-view -propagate=false
govc permissions.set -principal $USER -role manage-k8s-volumes -propagate=false /$DATACENTER/datastore/$DATASTORE
govc permissions.set -principal $USER -role manage-k8s-node-vms -propagate=true /$DATACENTER/host/$HOST
govc permissions.set -principal $USER -role manage-k8s-node-vms -propagate=true /$DATACENTER/vm/$FOLDER
```

For additional information on roles and permissions, please refer to official VMware documentation:
- [Managing Permissions for vCenter Components](https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.security.doc/GUID-3B78EEB3-23E2-4CEB-9FBD-E432B606011A.html)
- [Required Privileges for Common Tasks](https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.security.doc/GUID-4D0F8E63-2961-4B71-B365-BBFA24673FDB.html)
- [Using Roles to Assign Privileges](https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.security.doc/GUID-18071E9A-EED1-4968-8D51-E0B4F526FDA3.html)
- [Defined Privileges](https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.security.doc/GUID-ED56F3C4-77D0-49E3-88B6-B99B8B437B62.html)
Loading

0 comments on commit 5d111e5

Please sign in to comment.