diff --git a/content/en/apps/guides/hosting/4.x/migrating-projects-docker-compose-to-k3s-on-vmware.md b/content/en/apps/guides/hosting/4.x/migrating-projects-docker-compose-to-k3s-on-vmware.md new file mode 100644 index 000000000..b60f1231a --- /dev/null +++ b/content/en/apps/guides/hosting/4.x/migrating-projects-docker-compose-to-k3s-on-vmware.md @@ -0,0 +1,193 @@ +--- +title: "Migrating projects on VMware that were installed via docker-compose to k3s" +linkTitle: "migrating projects on VMware from docker-compose to k3s" +weight: 20 +description: > + Here we will outline the process for migrating projects that were installed on VMware by docker-compose on the VM's root disk to a k3s installation that exists in the same VMware datacenter. +--- + +### Prerequisites + +This doc will assume you have followed or read over our guides to [self-hosting-k3s-multinode.md]() and [troubleshooting-k3s-on-vmware.md](). Please ensure your terminal is authenticated to your vCenter datacenter by following the govc or curl authentication steps in those docs. + +### Current cht-core installation setup + +This doc assumes your existing cht-core projects are installed on separate VM's that are launched by a VM template in VMware. Docker compose was used to install cht-core project on each VM, and all data was stored on that VMs root disk. There was also a VMware encyption policy that all above disks mentioned are encrypted. + +### Our goal cht-core setup + +Our goal is to merge all of our projects into a k3s VMware deployment for ease of maintainability, support of multi-node cht-core projects, and other benefits that are highlighted in our k3s VMware installation docs. + +To reach our goal, we will need to complete the following steps: +* Shutting down existing running project, this will mean downtime for the project +* Copy disk from shut-down VM +* Mounting disk to k3s control-plane node +* Decrypting disk +* Copying data via rsync from old disk to a container inside k3s that uses PersistentVolumeClaims +* Using the above PVC inside new cht-core deployment templates to re-deploy the cht-core project inside k3s +* Updating nginx to forward traffic to our project installation +* Verifying all services are running for cht-core +* Taking a backup and verifying backup policy works + +### Shut down existing project VM + +Identify which VM you wish to shutdown, by navigating and identifying the VM name that matches your project. + + govc ls //vm/ + +*Note* Before powering off the vm, ensure you have access to the vm's terminal to decrypt the root disk if you ever need to re-use this VM, or restart this process. +Power the vm off: + govc vm.power -off //vm/ + +### Copy disk from shut-down VM + +We will want to copy the data from the recently shut-down VM above to a new vmdk disk file that we will mount onto our k3s control-plane server. Copying the disk will ensure we have a backup plan if anything goes wrong during our migration process. This will keep the original project data intact. + + govc device.info -vm //vm/ + +Example of above output: + + Name: disk-1000-0 + Type: VirtualDisk + Label: Hard disk 1 + Summary: 1,048,576,000 KB + Key: 2000 + Controller: lsilogic-1000 + Unit number: 0 + File: [] /.vmdk + +Take note of the .vmdk file name from the above output, as well as the datastore name. + + govc datastore.ls -ds= . + +Now run the same `device.info` command above pointing to your k3s control-plane VM. Be sure to take note of the or of the output. + +Let's copy the .vmdk from our shut-down VM to the k3s control-plane server: + + govc datastore.cp --ds= --ds-target= /.vmdk + /_clone.vmdk + +This process will take some time to complete. We suggest running these commands from a screen session on a server inside the VMware datacenter, such as the k3s control-plane servers. + +### Mounting copied disk to k3s control-plane server VM + +After succesfully copying the disk, we want to mount the copied file onto our k3s control-plane server VM. + + govc vm.disk.attach -vm //vm// -ds= -disk=/_clone.vmdk + +We will have to restart the k3s control plane server VM that we attached the above disk to force VMware SCSI controller to sync. You will notice the disk won't be listed using commands such as `lsblk` until a restart has occurred. Ideally, you would want to mount this disk on an isolated VM and copy k3s credentials to that vm. That step would prevent having to restart a k3s control plane server and waiting for it to resync to the k3s cluster. + +### Decrypting disk + +Once you've rebooted your k3s server vm, we will need to decrypt the disk, if necessary. +*Note* Your disk may be mounted to a different location than sdb3 below, use `lsblk` to determine the path + + sudo cryptsetup open --type luks /dev/sdb3 + # enter encryption password + sudo vgdisplay #Save UID info + sudo vgrename + sudo vgchange -a y + sudo lvdisplay #Note LV_PATH + sudo mount LV_PATH /srv + +Find the location of your couchdb data, most likely located in: +`/srv/home//cht/couchdb` + +### Copying data via rsync + +Now that we have our project's data mounted to our k3s vm server, we can create a Pod deployment inside k3s that uses a PersistentVolumeClaim backed by the VMware Container Storage Interface that will provision a Container Network Storage volume inside VMware and manage that storage disk across failovers. + +We'll want to save the script from this [serverfault.com rsync q&a](https://serverfault.com/questions/741670/rsync-files-to-a-kubernetes-pod/887402#887402) as `rsync-helper.sh` on our k3s vm server + +Launch a busybox pod that uses a pvc resource using the template below: +``` +apiVersion: v1 +kind: Pod +metadata: + name: busybox +spec: + containers: + - name: busybox + image: k8s.gcr.io/busybox + command: [ "/bin/sh", "-c", "tail -f /dev/null" ] + volumeMounts: + - name: volume1 + mountPath: "/opt/couchdb/data" + volumes: + - name: volume1 + persistentVolumeClaim: + claimName: busybox-pvc1 + restartPolicy: Never +--- +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: busybox-pvc1 +spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 100Gi + storageClassName: vmware-sc +``` + +Deploy the above template into your k3s cluster: +`kubectl -n apply -f busybox.yaml` + +Let's use rsync, with the `rsync-helper.sh` script to move data from our original mounted disk to busybox-pvc1 a provisioned pvc resource inside k3s: + + sudo rsync -av --progress -stats -e './rsync-helper.sh' /srv/home//cht/couchdb/data/ busybox:/opt/couchdb/data/ + +* Optional: Depending on if you are moving a project from single-node CouchDB to multi-node CouchDB during this migration into k3s, now would be the point in the process to follow the couchdb-migration steps. + +### Using the above PVC inside new cht-core deployment templates to re-deploy the cht-core project inside k3s + +We'll have to delete the resources created in the previous step, but ensure to save the volume ID of the pvc that was created. You can identify this by navigating through VMware GUI > Container Network Storage, or by running `kubectl get pvc` and `kubectl describe pvc ` and noting the volumeHandle. + +```` +``` +apiVersion: v1 +kind: PersistentVolume +metadata: + name: -pv + annotations: + pv.kubernetes.io/provisioned-by: csi.vsphere.vmware.com +spec: + capacity: + storage: 50Gi + accessModes: + - ReadWriteOnce + persistentVolumeReclaimPolicy: Retain + storageClassName: vmware-sc + claimRef: + namespace: + name: -pvc + csi: + driver: csi.vsphere.vmware.com + fsType: ext4 # Change fstype to xfs or ntfs based on the requirement. + volumeAttributes: + type: "vSphere CNS Block Volume" + volumeHandle: # First Class Disk (Improved Virtual Disk) ID +--- +kind: PersistentVolumeClaim +apiVersion: v1 +metadata: + name: -pvc +spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: Gi + storageClassName: vmware-sc + volumeName: -pv +--- +``` +```` + +* Save the above templates to a directory for the project name, such as `/home/ubuntu/cht-projects/project-name`. + +* Create a namespace per project in k3s `kubectl create namespace `. Ensure you edit your templates created above to reflect the namespace you created and want to deploy the project to. + + diff --git a/content/en/apps/guides/hosting/4.x/self-hosting-k3s-multinode.md b/content/en/apps/guides/hosting/4.x/self-hosting-k3s-multinode.md new file mode 100644 index 000000000..bfde1d290 --- /dev/null +++ b/content/en/apps/guides/hosting/4.x/self-hosting-k3s-multinode.md @@ -0,0 +1,415 @@ +--- +title: "k3s - multiple node deployment for VMware" +linkTitle: "Self Hosting - k3s Multiple Nodes" +weight: 20 +description: > + Hosting the CHT on self run VMware infrastructure for multiple CHT-Core projects that utilize horizontally scaled CouchDB nodes +--- + +{{% pageinfo %}} +This page covers an example k3s cluster setup on a VMware datacenter with vSphere 7+ for a national deployment across 50 counties capable of supporting 20,000+ CHWs concurrently. After setup, administrators should only add VMs to the cluster or deploy CHT Core projects to be orchestrated. + +{{% /pageinfo %}} + +### About container orchestration + +A container orchestrator helps easily allocate hardware resources spread across a datacenter. For national scale projects, or a deployments with a large number of CHT Core instances, Medic recommends a lightweight Kubernetes orchestrator called [k3s](https://docs.k3s.io/). The orchestrator will: + +* monitor resources across a group of virtual machines (aka "nodes") +* place CHT Core projects where there is available resource +* migrate projects to spare resources if combined utilization is high or there are underlying issues. + +Instead of provisioning one VM per CHT Core project, we will provision larger VMs and deploy multiple CHT Core projects on one VM, with each project receiving optional resource limitations, like CPU and RAM. + +In this example an orchestrator is deploying 50 CHT Core projects, one for each county. We will provision 9 large VMs and place 6 CHT Core projects on each VM. This allows for spare resources for failovers and lets the orchestrator decide on which VM projects live. Further, we get automated efficient use of datacenter resource utilization and avoids future manual allocations. + +### Nodes + +We'll be using two types of k3s nodes in this deployment: + +* [HA control-plane](https://docs.k3s.io/installation/requirements#cpu-and-memory) nodes - these enable high availability (HA) and provide access to kube API. These are containers running inside `kube-system` namespace which are often associated with the control-plane. They include coreDNS, traefik (ingress), servicelb, VMware Cloud Provisioner Interface (CPI), and VMWare Container Storage Interface (CSI) + +* Agent or worker nodes - these run the CHT Core containers and projects. They will also run services that tie in networking and storage. VMware CSI-node will be running here which enables agents to mount volumes from VMware Virtual-SAN for block data storage. Agents will also run servicelb-traefik containers which allow the nodes to route traffic to correct projects and handle load-balancing and internal networking. + +## Prerequisites + +### Servers / Virtual Machines + +Provision 3 Ubuntu servers (22.04 as of this writing) that meet k3s specifications for [HA etcd](https://docs.k3s.io/installation/requirements#cpu-and-memory) + +As we're provisioning an example deployment here for 50 counties and over 20,000 CHWs, the RAM, CPU and storage numbers will differ for you specific deployment. + +To support all 50 counties, provision 3 Ubuntu servers (22.04 as of this writing) with **4 vCPU and 8GB Ram**. Ensure they also meet k3s specifications for [HA etcd](https://docs.k3s.io/installation/requirements#cpu-and-memory). + +Provision 9 Ubuntu servers (again 22.04 as of this writing) for your k3s agent/worker servers. Each should have **48 vCPU, 192 GB Ram, and 50gb local storage**. + +For any additional VMs you add to the k3s cluster, you will need to ensure networking, roles, and extra configuration parameters that are noted below are configured on the VM. + +To ensure your hardware is not over-provisioned, add more VMs to your k3s cluster when you want to deploy more CHT Core projects. This gives you flexibility of not needing to provision them initially as they can easily be added later. + +### Network + +Ensure the above provisioned VMs: + +* abide by [Inbound Rules for k3s Server Nodes](https://docs.k3s.io/installation/requirements#inbound-rules-for-k3s-server-nodes) +* If you're using Ubuntu's ufw, follow [firewall considerations for k3s on Ubuntu](https://docs.k3s.io/advanced#ubuntu--debian) +* are restricted to the IP addresses of the k3s nodes so only they can connect to the service ports + +### Add Roles and Permissions to our VMs + +Following the [vSphere docs](https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/2.0/vmware-vsphere-csp-getting-started/GUID-0AB6E692-AA47-4B6A-8CEA-38B754E16567.html#GUID-0AB6E692-AA47-4B6A-8CEA-38B754E16567), first create the following vSphere roles in vSphere for Container Storage (CSN): +* CNS-VM +* CNS-DATASTORE +* CNS-SEARCH-AND-SPBM + +Now, on the VM settings, we can apply these roles as described in the above document. + +Any provisioned VM in the previous step, should recieve CNS-VM role. +The top-level vCenter server will recieve CNS-SEARCH-AND-SPBM role. +Virtual-SAN should recieve CNS-DATASTORE. +And all servers should have the READONLY role (this may already be active) + + +### Enable Necessary Extra Parameters on all VMs + +Following along the above document, we want to verify VM Hardware Version is 15 or greater, and that disk.EnableUUID parameter is configured. + +On each node, through vSphere Client (GUI): +1. disk.EnableUUID + 1. In the vSphere Client, right-click the VM and select Edit Settings. + 2. Click the VM Options tab and expand the Advanced menu. + 3. Click Edit Configuration next to Configuration Parameters. + 4. Configure the disk.EnableUUID parameter. If the parameter exists, make sure that its value is set to True. If the parameter is not present, add it and set its value to True. + +2. Verify VM hardware version at 15 or higher, and upgrade if necessary + 1. In the vSphere Client, navigate to the virtual machine. + 2. Select Actions > Compatibility > Upgrade VM Compatibility. + 3. Click Yes to confirm the upgrade. + 4. Select a compatibility and click OK. + +3. Add VMware Paravirtual SCSI storage controller to the VM + 1. In the vSphere Client, right-click the VM and select Edit Settings. + 2. On the Virtual Hardware tab, click the Add New Device button. + 3. Select SCSI Controller from the drop-down menu. + 4. Expand New SCSI controller and from the Change Type menu, select VMware Paravirtual. + 5. Click OK. + + +### Identify vSphere Provider IDs, Node IDs, and datacenter name + +Bootstrap parameters for k3s on VMware require UUID identification of each node that will join the cluster. + +For each of the provisioned VMs, you can navigate to the VM in vCenter interface and retrieve the UUID. + +Another method is to make the following calls to vCenter Server API. You may have a VPN that you connect to first before being able to access your vCenter GUI. These commands should be run from the same network that allows that access. + +When running the commands below, be sure to replace the placeholders with your own values: +* `` +* `` +* `` + +And any others as well! + +* Get an authentication-token: + + curl -k -X POST https:///rest/com/vmware/cis/session -u ':' +ID= + + +* List all your VMs and identify the VM-number that was provisioned earlier: + + curl -k -X GET -H "vmware-api-session-id: $ID" https:///api/vcenter/vm + +* Retrieve your instance_uuid by first making a `curl` call: + + curl -k -X GET -H "vmware-api-session-id: $ID" https:///api/vcenter/vm/vm- + + +* Inside the JSON response of the `curl` call get the, `instance_uuid`, in this case it's `215cc603-e8da-5iua-3333-a2402c05121`, but yours will be different: + + "identity":{"name":"k3s_worker_node_4","instance_uuid":"215cc603-e8da-5iua-3333-a2402c05121" + + +* Retrieve your datacenter name, to be used in configuration files for VMware CSI and CPI + + curl -k -X GET -H "vmware-api-session-id: $ID" https:///rest/vcenter/datacenter + +You will want to save the "name" of your datacenter. + +* Retrieve your cluster-id, to be used in config file for VMware CSI + + curl -k -X GET -H "vmware-api-session-id: $ID" https:///api/vcenter/cluster + + +You can also use the [govc cli tool](https://github.com/vmware/govmomi/blob/main/govc/README.md#binaries) to retrieve this information: + + export GOVC_INSECURE=1 + export GOVC_URL='https://:@ + + govc ls / + /vm \ + /network \ + /host \ + /datastore + + #To retrieve all Node VMs + govc ls //vm \ + /vm/ \ + /vm/ \ + /vm/ \ + /vm/ \ + /vm/ + + + +## Install k3s + +### First Control-Plane VM + +SSH into your first control-plane VM that was provisioned and configured above and [install docker](https://docs.docker.com/engine/install/ubuntu/). + +For k3s version compatibiltiy with vCenter and vMware CPI/CSI, we will need to use k3s v1.25, cpi v1.25, and csi v2.7.2 per the `curl` call below. + +Run the following CLI command inside the control-plane VM, filling out these two specific values: + - ``: Please generate a token ID, and save it. This will be required for the entirety of the k3s cluster existence and required to add additional servers to the k3s cluster + - ``: This was the UUID for this specific VM that we identified earlier +``` +curl -sfL https://get.k3s.io | K3S_KUBECONFIG_MODE="644" INSTALL_K3S_EXEC="server" INSTALL_K3S_VERSION="v1.25.14+k3s1" sh -s - \ + --docker --token \ + --cluster-init --disable-cloud-controller \ + --kubelet-arg="cloud-provider=external" \ + --kubelet-arg="provider-id=vsphere://" +``` + +### Second and third Control-Plane VMs + +SSH into your second/third control-plane VM. + +Please fill out these values below and run the cli command: + - ``: Required to be the same token you used in the first control-plane setup + - ``: This is the IP of the first control-plane server you setup, and allows this second server to discover the initial one. + - ``: This is the UUID for this second VM that we identified earlier. This will be different than the one you used for control plane 1. + +``` +curl -sfL https://get.k3s.io | K3S_KUBECONFIG_MODE="644" INSTALL_K3S_EXEC="server" INSTALL_K3S_VERSION="v1.25.14+k3s1" sh -s \ +--docker --token \ +--server https:// \ +--server https://:6443 \ +--kubelet-arg="cloud-provider=external" \ +--kubelet-arg="provider-id=vsphere://" +``` + +## Deploy VMware Cloud Provisioner Interface (CPI) to your k3s cluster + +SSH into one of your control plane servers. +Download the template for CPI, ensure you are aware of your current working directory. This will be the location where the CPI template is saved. +``` +pwd +wget https://raw.githubusercontent.com/kubernetes/cloud-provider-vsphere/release-1.25/releases/v1.25/vsphere-cloud-controller-manager.yaml +``` + +Modify the vsphere-cloud-controller-manager.yaml file downloaded above and update vCenter Server information. + +1) Add your `` and ``, `` to the section below inside that yaml: +``` +apiVersion: v1 +kind: Secret +metadata: + name: vsphere-cloud-secret + labels: + vsphere-cpi-infra: secret + component: cloud-controller-manager + namespace: kube-system + # NOTE: this is just an example configuration, update with real values based on your environment +stringData: + .username: "" + .password: "" +``` +2) Please add your `` and ``, `` and `` to the ConfigMap section inside that yaml. +* Note: If your vCenter actively uses https with valid certificates, then inside the `global:` stanza, you will want to set `insecureFlag: false`. Most set-ups will want this to remain true with`insecureFlag: true` . + +``` +apiVersion: v1 +kind: ConfigMap +metadata: + name: vsphere-cloud-config + labels: + vsphere-cpi-infra: config + component: cloud-controller-manager + namespace: kube-system +data: + # NOTE: this is just an example configuration, update with real values based on your environment + vsphere.conf: | + # Global properties in this section will be used for all specified vCenters unless overriden in VirtualCenter section. + global: + port: 443 + # set insecureFlag to true if the vCenter uses a self-signed cert + insecureFlag: true + # settings for using k8s secret + secretName: vsphere-cloud-secret + secretNamespace: kube-system + + # vcenter section + vcenter: + my-vc-name: + server: + user: + password: + datacenters: + - +``` + +3) Deploy the template! +``` +/usr/local/bin/k3s kubectl -n kube-system apply -f vsphere-cloud-controller-manager.yaml +``` + +4) Verify CPI containers are running: +``` +/usr/local/bin/k3s kubectl -n kube-system get pods -o wide +/usr/local/bin/k3s kubectl -n kube-system logs vsphere-cloud-controller-manager- +``` + +You will see 3 vsphere-cloud-controller-manager pods running, one per control-plane server. + +Take a peak at all 3 vsphere-controller-manager pods logs to ensure nothing is immediately erring. Common errors are using the incorrect datacenter name, UUIDs for VMs in the k3s curl command, or invalid credentials in the configmap and secrets resources created in step 2 above. If one of these errors is displaying in the log, you will want to delete the deployment (in step 3 above, replace `apply` with `delete`, edit the yaml and re-deploy (run step 3 again). + + +## Deploy VMware Container Storage Interface (CSI) to your k3s cluster + +Follow the [VMware documentation for CSI](https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/2.0/vmware-vsphere-csp-getting-started/GUID-A1982536-F741-4614-A6F2-ADEE21AA4588.html) with these steps: + +1) Run the following command from inside a control-plane server: +``` +/usr/local/bin/k3s kubectl create namespace vmware-system-csi +``` + +2) Taint your control-lane node servers by running the following command. This taint may already exist, if so, thats okay. Please replace with each of your control plane servers. +``` +You can retrieve the names by running `/usr/local/bin/k3s kubectl get nodes -o wide` +/usr/local/bin/k3s kubectl taint node node-role.kubernetes.io/control-plane=:NoSchedule +``` + +3) Create kubernetes secret, which will map authentication credentials and datacenter name to CSI containers. First, create a file `/etc/kubernetes/csi-vsphere.conf`. Be sure to replace ``, ``, `` , ``, `` , `` and `` with your values: + + + [Global] + cluster-id = "" + + [VirtualCenter ""] + insecure-flag = "" + user = "" + password = "" + port = "" + datacenters = ", , ..." + + +4) Create the secret resource in the namespace we created in step 1 by running the following command in the same directory you created the csi-vsphere.conf file: +``` +/usr/local/bin/k3s kubectl create secret generic vsphere-config-secret --from-file=csi-vsphere.conf --namespace=vmware-system-csi +``` + +5) Download the [vSphere CSI v2.7.2 template](https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/v2.7.2/manifests/vanilla/vsphere-csi-driver.yaml) + +There is one minor edit, typically found on line 217-218, under the deployment specification for vsphere-csi-controller. + +Before edit (original value) +``` + nodeSelector: + node-role.kubernetes.io/control-plane: "" +``` + +Please add `true` as the value for this key, seen below: +``` + nodeSelector: + node-role.kubernetes.io/control-plane: "true" +``` + +Now, let's deploy VMware CSI by running the following command: +``` +/usr/local/bin/k3s kubectl -n vmware-system-csi apply -f vsphere-csi-driver.yaml +``` + +Follow the [verification steps seen here in Step 2 of Procedure](https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/2.0/vmware-vsphere-csp-getting-started/GUID-54BB79D2-B13F-4673-8CC2-63A772D17B3C.html) + + +### Create StorageClass in k3s cluster + +We'll need to create a global [StorageClass](https://kubernetes.io/docs/concepts/storage/storage-classes/) resource in our k3s cluster, so CHT Core deployments will be able to ask for persistent storage volumes from the k3s cluster. + +Inside one of the control-plane servers, please create a file `vmware-storageclass.yaml` with the following contents: +``` +kind: StorageClass +apiVersion: storage.k8s.io/v1 +metadata: + name: vmware-sc + annotations: + storageclass.kubernetes.io/is-default-class: "true" +provisioner: csi.vsphere.vmware.com +parameters: + csi.storage.k8s.io/fstype: "ext4" #Optional Parameter +``` + +Deploy this template to the k3s cluster via: +``` +/usr/local/bin/k3s kubectl apply -f vmware-storageclass.yaml +``` + +## Deploying a CHT-Core Project to your new k3s Cluster running on VMware + +This step will neatly fit into helm chart configurations, but here are the manual steps for time being. + +Your persistent volume (PVC) template for all CouchDB's should be as shown below. Note the `storageClassName` parameter should be identical to the `storageClass` we deployed earlier: +``` +# Source: cht-chart/templates/couchdb-n-claim0-persistentvolumeclaim.yaml +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + labels: + cht.service: couchdb-1-claim0 + name: couchdb-1-claim0 +spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 4Gi + storageClassName: vmware-sc +status: {} +``` + +## Kubernetes Concepts +Here are links to docs surrounding the kubernetes concepts that we use in a cht-core project deployed to a k3s cluster. + +* [Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) - This is the main kubernetes resource that contains information regarding all the cht services that will be deployed. +* [ConfigMaps](https://kubernetes.io/docs/concepts/configuration/configmap/) - This contains configuration files, or credentials that containers can retrieve. If you edit the configmap, you should delete containers, which will trigger a new container to download your new edits to any configrations for that service +* [ServiceAccounts](https://kubernetes.io/docs/concepts/security/service-accounts/) - This is used by the upgrade-service that is running inside the cht-core pods (as a container titled upgrade-service). This serviceAccount restricts the upgrade-service from interacting with any other cht-core projects outside of its namespace, and gives the upgrade-service permissions to talk to kubernetes API to upgrade container images when a CHT ADMIN clicks *upgrade* through the Admin interface. +* [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) - This is what forwards traffic to a particular project or pods. In most use-cases, there is an nginx deployed outside of the k3s cluster than contains DNS entries for existing projects, and contains a proxy_pass parameter to send traffic based on host header to any of the k3s server IPs. Inside the k3s cluster, the traefik container and servicelb-traefik containers in kube-system namespace will handle forwarding traffic to the correct cht-core containers based on url +* [Persistent Volume Claim](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) - This is where our project data will be stored. Important to ensure you have configured this correctly, with retain policies intact so the data is not deleted if the project is removed. It's also vital to ensure you have a backup policy either set-up in VMware vCenter GUI or you have configured the csi-snapshotter that comes with vSphere CSI. +* [Services](https://kubernetes.io/docs/concepts/services-networking/service/) - This is utilized for CouchDB nodes to discover each other through DNS rather than internal IPs, which can change. This is also used in the COUCH_URL so API containers can discover where CouchDB is running. + + diff --git a/content/en/apps/guides/hosting/4.x/troubleshooting-k3s-on-vmware.md b/content/en/apps/guides/hosting/4.x/troubleshooting-k3s-on-vmware.md new file mode 100644 index 000000000..4c74a77af --- /dev/null +++ b/content/en/apps/guides/hosting/4.x/troubleshooting-k3s-on-vmware.md @@ -0,0 +1,83 @@ +--- +title: "Troubleshooting k3s on a VMware datacenter" +linkTitle: "k3s - multiple nodes" +weight: 20 +description: > + Here we will outline common VMware datacenter troubleshooting with a specific perspective of k3s cht-core deployments inside that environment. +--- + +### Most common tools + +- [govc](https://github.com/vmware/govmomi/blob/main/govc/README.md)) +- [curl](https://everything.curl.dev/get) + +### Setup variables in your current terminal session + +We'll need to export authentication variables to vCenter Server: + +* govc + export GOVC_URL='https://:@' + export GOVC_INSECURE=1 + +* curl +Grab an auth token: + curl -k -X POST https://rest/com/vmware/cis/session -u ':' + +### Mapping VMware Storage to k3s PersistentVolumeClaim + +Since we are using VMware SAN, it's important to identify the IDs that VMware gives to a storage disk, and how to identify the k3s PersistentVolume resource that is linked. + + kubectl -n get volumeattachments -o wide + kubectl -n get volumeattachments csi- -o yaml + +At the bottom of the above output please note: + specs.source.persistentVolumeName = k3s pvc id + status.attachmentMetadata.diskUUID = VMware storage disk ID + +### Perform failover or send all cht-core projects to another k3s worker server + +If you are receiving multiple alerts for one particular k3s worker VM, you can drain all the projects to a spare VM, and investigate or restart the original k3s worker server to fix any underlying issues. *Note*: This incurs downtime of projects running on that k3s worker server + + kubectl get nodes -o wide + # identify which ip address of the vm you wish to drain all projects from. The next command will cause downtime for all projects on that node, so please use in emergencies or during maintenance windows + + kubectl drain --ignore-daemonsets + # Wait a few minutes as k3s evicts/moves all pods gracefully to a spare worker server + # This ensures graceful failover and will avoid any multi-attach error when trying to attach storage disks to multiple VMs, and helps to avoid couchdb data corruption. + +### Failover occurred and some projects are not coming back + +Say one of your k3s worker server VMs failed or was restarted before a system-administrator was able to run the above section and gracefully failover all running projects. When checking status of pods, some are stuck in ContainerCreating. Running a describe on that particular pod, displays a "Multi-Attach Error", because the storage disk is still attached to the previous VM. + +Describe the pod, and read the Events section at the bottom of the output: + kubectl -n describe + +Next, identify which VMware Storage disk id is mapped to that particular cht-core project's PersistentVolumeClaim. Since we are running multiple cht-core projects on each k3s worker VM, we want to ensure we are looking at the correct project. Save this information for the next few steps. + +We will want to investigate the old failed k3s worker VM that was previously running this project, and force-detach the storage disk from that VM and restart the k3s deployment process, which will mount the project's storage disk to an active k3s worker VM. + +We'll use govc in the examples below, to list our datacenter's VMs: + govc ls //vm/ + # If there is a sub-directory that nests our k3s vm's, you may have to run: + govc ls //vm/ + +Now we'll retrieve device info about which disks are mounted to the k3s worker VM that failed or was drained in the previous step: + + govc device.info -vm //vm// -json + # Now, ensure that disk UUID is the same as the one identified running that cht-core project. + # You will need jq installed for this next step to work + govc device.info -vm //vm// -json diks-* | jq -r .Devices[].backing.uuid + # From the two outputs above, identify what disk number our storage disk is attached as. + # This should be similar to disk-1002 + +Let's remove that disk device from the failed k3s worker VM: + govc device.remove -vm //vm// -keep disk- -p '{"metadata":{"finalizers":[]}}' --type=merge -n + kubectl delete volumeattachments.storage.k8s.io csi- -n + +Now the container network storage volume on that failed k3s worker vm's can be available for mounting by k3s to other available k3s worker vm's. *Note* Default configuration for Container Storage Interface (CSI) is set to wait for 7 minutes after multiple multi-attach errors, before it tries to force-detach storage disks. You may run into this during drained k3s node or accidental k3s vm termination and after following above steps. + +