Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove manual maintenance procedure #273

Merged
merged 3 commits into from
Sep 8, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,110 +1,4 @@
= Maintenance and Update of an OpenShift 4 cluster

In contrast to previous versions of OpenShift, or other Kubernetes distributions, the operating system (https://docs.openshift.com/container-platform/4.11/architecture/architecture-rhcos.html[Red Hat Enterprise Linux CoreOS (RHCOS)]) and OCP4 server components are bundled into a single unit.
In practice, this means that there is no difference between "updating the operating system" or "installing the latest OCP version," it's all the same.

[IMPORTANT]
====
For an upgrade to `4.12`, see xref:oc4:ROOT:how-tos/update_maintenance/v_4_12.adoc[Upgrade to OpenShift 4.12] first.
====

. Get list of available updates:
+
[source,console]
----
oc adm upgrade --as cluster-admin

Updates:

VERSION IMAGE
4.5.19 quay.io/openshift-release-dev/ocp-release@sha256:bae5510f19324d8e9c313aaba767e93c3a311902f5358fe2569e380544d9113e
4.5.20 quay.io/openshift-release-dev/ocp-release@sha256:78b878986d2d0af6037d637aa63e7b6f80fc8f17d0f0d5b077ac6aca83f792a0
4.5.24 quay.io/openshift-release-dev/ocp-release@sha256:f3ce0aeebb116bbc7d8982cc347ffc68151c92598dfb0cc45aaf3ce03bb09d11
----

or

[source,console]
----
kubectl --as cluster-admin get clusterversion version -o json | jq '.status.availableUpdates[] | {image: .image, version: .version}'
----

[NOTE]
====
If you don't get the newest available version, this might be intended.
Red Hat does release new updates to specific cluster, when they do have no known issues.
So on a stable channel you need some patience!
====

. Update the configuration hierarchy
+
Set the following parameters to the values retrieved in the previous step:
+
* `parameters.openshift4_version.spec.desiredUpdate.image`
* `parameters.openshift4_version.spec.desiredUpdate.version`

. Compile the cluster catalog

. Enjoy the show
+
Let the OpenShift operators do their job.
+
[source,console]
----
kubectl --as cluster-admin get clusterversion version --watch
----
+
. Check the upgrade state via the `oc` command:
+
[source,console]
----
$ oc adm upgrade --as cluster-admin
Cluster version is 4.5.24

No updates available.
You may force an upgrade to a specific release image, but doing so may not be supported and result in downtime or data loss.
----
+
NOTE: Even if `oc adm upgrade` shows that the upgrade has completed, it's possible that nodes are still being upgraded.

. Check node upgrade status by checking the status of the `MachineConfigPool` resources:
+
[source,console]
----
$ oc --as=cluster-admin -n openshift-machine-config-operator get machineconfigpool
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-92e100dc64d7c9ecf669b1f69cdb5dca True False False 3 3 3 0 19d
worker rendered-worker-4648c4badfb057c7e3e9f1030fa42507 True False False 6 6 6 0 19d
----
+
[IMPORTANT]
====
Applications on the cluster may get rescheduled without prior notice as long as the worker `MachineConfigPool` doesn't show `Updated=True`.

You can observe the progress of the node upgrades with

[source,console]
----
oc --as=cluster-admin get mcp -w
----
====

. The maintenance of the cluster is only finished once all nodes, including all worker nodes have been upgraded and all `MachineConfigPools` show `Updated=True`.
+
[IMPORTANT]
====
Never leave the cluster in a state with pending node upgrades.
If the Machine Config operator can't drain a node (for example because doing so would violate a `PodDisruptionBudget`) you may have to manually force-drain a node or even manually delete pods that block the node drain.
Always open a follow-up ticket to investigate the underlying issues if manual intervention is required.
====

So far, the upgrade process mostly just worked.
Nevertheless, we've started documenting how to observe the upgrade process in the following section.
More troubleshooting instructions will be added there as we gain experience.

For general information about the upgrade process, check out https://docs.openshift.com/container-platform/latest/updating/updating-cluster-between-minor.html[Updating a cluster between minor versions] of the OpenShift 4 documentation.

Also have a look at the blog post https://www.openshift.com/blog/the-ultimate-guide-to-openshift-release-and-upgrade-process-for-cluster-administrators[The Ultimate Guide to OpenShift Release and Upgrade Process for Cluster Administrators] which is an excellent source to understand the process.
= Maintenance troubleshooting

== Troubleshooting

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -86,4 +86,4 @@ kubectl patch cm admin-acks \

. Upgrade the cluster
+
Follow the steps in xref:oc4:ROOT:how-tos/update_maintenance.adoc[Update/Maintenance].
Set the desired minor version in https://github.com/appuio/component-openshift-upgrade-controller/blob/master/docs/modules/ROOT/pages/references/parameters.adoc#cluster_versionopenshiftversion[`openshift_upgrade_controller.cluster_version.openshiftVersion.Minor`]. The ugprade controller will use this upgrade channel in the next maintenance window.
Original file line number Diff line number Diff line change
Expand Up @@ -85,4 +85,4 @@ kubectl patch cm admin-acks \

. Upgrade the cluster
+
Follow the steps in xref:oc4:ROOT:how-tos/update_maintenance.adoc[Update/Maintenance].
Set the desired minor version in https://github.com/appuio/component-openshift-upgrade-controller/blob/master/docs/modules/ROOT/pages/references/parameters.adoc#cluster_versionopenshiftversion[`openshift_upgrade_controller.cluster_version.openshiftVersion.Minor`]. The ugprade controller will use this upgrade channel in the next maintenance window.
4 changes: 2 additions & 2 deletions docs/modules/ROOT/partials/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -81,12 +81,12 @@
*** xref:oc4:ROOT:how-tos/vsphere/install.adoc[Install]

* Update
** xref:oc4:ROOT:how-tos/update_maintenance.adoc[Update/Maintenance]
** xref:oc4:ROOT:how-tos/update_maintenance/automated-upgrades-at-vshn.adoc[]
** xref:oc4:ROOT:how-tos/new_minor.adoc[Get ready for new minor]
** xref:oc4:ROOT:how-tos/update_maintenance/v_4_12.adoc[Upgrade to OCP4.12]
** xref:oc4:ROOT:how-tos/update_maintenance/v_4_13.adoc[Upgrade to OCP4.13]
** xref:oc4:ROOT:references/architecture/upgrade_controller.adoc[Upgrade Controller]
** xref:oc4:ROOT:how-tos/maintenance_troubleshooting.adoc[Maintenance troubleshooting]

// Support
// Web console
Expand Down Expand Up @@ -175,7 +175,7 @@
// Serverless

* Day two operations
** xref:oc4:ROOT:how-tos/update_maintenance.adoc[Update/Maintenance]
** xref:oc4:ROOT:how-tos/maintenance_troubleshooting.adoc[Maintenance troubleshooting]
** xref:oc4:ROOT:how-tos/debug-nodes.adoc[Debugging Nodes]

** Runbooks
Expand Down