Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: overlay node image before bootstrapping if necessary #8742

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

jlebon
Copy link
Member

@jlebon jlebon commented Jul 16, 2024

As per openshift/enhancements#1637, we're trying
to get rid of all OpenShift-versioned components from the bootimages.

This means that there will no longer be oc, kubelet, or crio
binaries for example, which bootstrapping obviously relies on.

Instead, now we change things up so that early on when booting the
bootstrap node, we pull down the node image, unencapsulate it (this just
means convert it back to an OSTree commit), then mount over its /usr,
and import new /etc content.

This is done by isolating to a different systemd target to only bring
up the minimum number of services to do the pivot and then carry on
with bootstrapping.

This does not incur additional reboots and should be compatible
with AI/ABI/SNO. But it is of course, a huge conceptual shift in how
bootstrapping works. With this, we would now always be sure that we're
using the same binaries as the target version as part of bootstrapping,
which should alleviate some issues such as AI late-binding (see e.g.
https://issues.redhat.com/browse/MGMT-16705).

The big exception of course being the kernel. Relatedly, currently
/usr/lib/modules is also shadowed by the mount, but we could re-mount
it if needed.

To be conservative, the new logic only kicks in when using bootimages
which do not have oc. This will allow us to ratchet this in more
easily.

Down the line, we should be able to replace some of this with
bootc apply-live once that's available (and also works in a live
environment). (See containers/bootc#76.)

For full context, see the linked enhancement and discussions there.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 16, 2024
Copy link
Contributor

openshift-ci bot commented Jul 16, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@jlebon
Copy link
Member Author

jlebon commented Jul 16, 2024

/test e2e-aws-ovn

@jlebon
Copy link
Member Author

jlebon commented Jul 17, 2024

/retest

@jlebon
Copy link
Member Author

jlebon commented Jul 17, 2024

✔️ ci/prow/e2e-aws-ovn — Job succeeded
✔️ ci/prow/e2e-openstack-ovn — Job succeeded

Sweet! 🎉

Looks like e2e-aws-ovn-heterogeneous failed on a different issue.

@jlebon
Copy link
Member Author

jlebon commented Jul 17, 2024

/retest

@jlebon
Copy link
Member Author

jlebon commented Jul 17, 2024

/test e2e-agent-compact-ipv4
/test e2e-agent-sno-ipv4-pxe

@zaneb
Copy link
Member

zaneb commented Jul 17, 2024

/test ?

Copy link
Contributor

openshift-ci bot commented Jul 17, 2024

@zaneb: The following commands are available to trigger required jobs:

  • /test agent-integration-tests
  • /test altinfra-images
  • /test aro-unit
  • /test e2e-agent-compact-ipv4
  • /test e2e-aws-ovn
  • /test e2e-aws-ovn-edge-zones-manifest-validation
  • /test e2e-aws-ovn-upi
  • /test e2e-azure-ovn
  • /test e2e-azure-ovn-upi
  • /test e2e-gcp-ovn
  • /test e2e-gcp-ovn-upi
  • /test e2e-metal-ipi-ovn-ipv6
  • /test e2e-openstack-ovn
  • /test e2e-vsphere-ovn
  • /test e2e-vsphere-ovn-upi
  • /test gofmt
  • /test golint
  • /test govet
  • /test images
  • /test okd-images
  • /test okd-unit
  • /test okd-verify-codegen
  • /test openstack-manifests
  • /test shellcheck
  • /test terraform-images
  • /test terraform-verify-vendor
  • /test tf-lint
  • /test unit
  • /test verify-codegen
  • /test verify-vendor
  • /test yaml-lint

The following commands are available to trigger optional jobs:

  • /test altinfra-e2e-aws-custom-security-groups
  • /test altinfra-e2e-aws-ovn
  • /test altinfra-e2e-aws-ovn-fips
  • /test altinfra-e2e-aws-ovn-imdsv2
  • /test altinfra-e2e-aws-ovn-localzones
  • /test altinfra-e2e-aws-ovn-proxy
  • /test altinfra-e2e-aws-ovn-public-ipv4-pool
  • /test altinfra-e2e-aws-ovn-shared-vpc
  • /test altinfra-e2e-aws-ovn-shared-vpc-local-zones
  • /test altinfra-e2e-aws-ovn-shared-vpc-wavelength-zones
  • /test altinfra-e2e-aws-ovn-single-node
  • /test altinfra-e2e-aws-ovn-wavelengthzones
  • /test altinfra-e2e-azure-capi-ovn
  • /test altinfra-e2e-azure-ovn-shared-vpc
  • /test altinfra-e2e-gcp-capi-ovn
  • /test altinfra-e2e-gcp-ovn-byo-network-capi
  • /test altinfra-e2e-gcp-ovn-secureboot-capi
  • /test altinfra-e2e-gcp-ovn-xpn-capi
  • /test altinfra-e2e-ibmcloud-capi-ovn
  • /test altinfra-e2e-nutanix-capi-ovn
  • /test altinfra-e2e-openstack-capi-ccpmso
  • /test altinfra-e2e-openstack-capi-ccpmso-zone
  • /test altinfra-e2e-openstack-capi-dualstack
  • /test altinfra-e2e-openstack-capi-dualstack-upi
  • /test altinfra-e2e-openstack-capi-dualstack-v6primary
  • /test altinfra-e2e-openstack-capi-externallb
  • /test altinfra-e2e-openstack-capi-nfv-intel
  • /test altinfra-e2e-openstack-capi-ovn
  • /test altinfra-e2e-openstack-capi-proxy
  • /test altinfra-e2e-powervs-capi-ovn
  • /test altinfra-e2e-vsphere-capi-multi-vcenter-ovn
  • /test altinfra-e2e-vsphere-capi-ovn
  • /test altinfra-e2e-vsphere-capi-static-ovn
  • /test altinfra-e2e-vsphere-capi-zones
  • /test azure-ovn-marketplace-images
  • /test e2e-agent-compact-ipv4-appliance-diskimage
  • /test e2e-agent-compact-ipv4-none-platform
  • /test e2e-agent-ha-dualstack
  • /test e2e-agent-sno-ipv4-pxe
  • /test e2e-agent-sno-ipv6
  • /test e2e-aws-overlay-mtu-ovn-1200
  • /test e2e-aws-ovn-edge-zones
  • /test e2e-aws-ovn-fips
  • /test e2e-aws-ovn-heterogeneous
  • /test e2e-aws-ovn-imdsv2
  • /test e2e-aws-ovn-proxy
  • /test e2e-aws-ovn-public-subnets
  • /test e2e-aws-ovn-shared-vpc-custom-security-groups
  • /test e2e-aws-ovn-shared-vpc-edge-zones
  • /test e2e-aws-ovn-single-node
  • /test e2e-aws-ovn-upgrade
  • /test e2e-aws-ovn-workers-rhel8
  • /test e2e-aws-upi-proxy
  • /test e2e-azure-ovn-resourcegroup
  • /test e2e-azure-ovn-shared-vpc
  • /test e2e-azurestack
  • /test e2e-azurestack-upi
  • /test e2e-crc
  • /test e2e-external-aws
  • /test e2e-external-aws-ccm
  • /test e2e-gcp-ovn-byo-vpc
  • /test e2e-gcp-ovn-xpn
  • /test e2e-gcp-secureboot
  • /test e2e-gcp-upgrade
  • /test e2e-gcp-upi-xpn
  • /test e2e-ibmcloud-ovn
  • /test e2e-metal-assisted
  • /test e2e-metal-ipi-ovn
  • /test e2e-metal-ipi-ovn-dualstack
  • /test e2e-metal-ipi-ovn-swapped-hosts
  • /test e2e-metal-ipi-ovn-virtualmedia
  • /test e2e-metal-single-node-live-iso
  • /test e2e-nutanix-ovn
  • /test e2e-openstack-ccpmso
  • /test e2e-openstack-ccpmso-zone
  • /test e2e-openstack-dualstack
  • /test e2e-openstack-dualstack-upi
  • /test e2e-openstack-externallb
  • /test e2e-openstack-nfv-intel
  • /test e2e-openstack-proxy
  • /test e2e-vsphere-ovn-techpreview
  • /test e2e-vsphere-ovn-upi-zones
  • /test e2e-vsphere-ovn-zones
  • /test e2e-vsphere-ovn-zones-techpreview
  • /test e2e-vsphere-static-ovn
  • /test okd-e2e-agent-compact-ipv4
  • /test okd-e2e-agent-ha-dualstack
  • /test okd-e2e-agent-sno-ipv6
  • /test okd-e2e-aws-ovn
  • /test okd-e2e-aws-ovn-upgrade
  • /test okd-e2e-gcp
  • /test okd-e2e-gcp-ovn-upgrade
  • /test okd-e2e-vsphere
  • /test okd-scos-e2e-aws-ovn
  • /test okd-scos-images
  • /test tf-fmt

Use /test all to run the following jobs that were automatically triggered:

  • pull-ci-openshift-installer-master-altinfra-images
  • pull-ci-openshift-installer-master-aro-unit
  • pull-ci-openshift-installer-master-e2e-aws-ovn
  • pull-ci-openshift-installer-master-e2e-aws-ovn-heterogeneous
  • pull-ci-openshift-installer-master-e2e-openstack-ovn
  • pull-ci-openshift-installer-master-gofmt
  • pull-ci-openshift-installer-master-golint
  • pull-ci-openshift-installer-master-govet
  • pull-ci-openshift-installer-master-images
  • pull-ci-openshift-installer-master-okd-unit
  • pull-ci-openshift-installer-master-okd-verify-codegen
  • pull-ci-openshift-installer-master-shellcheck
  • pull-ci-openshift-installer-master-tf-fmt
  • pull-ci-openshift-installer-master-tf-lint
  • pull-ci-openshift-installer-master-unit
  • pull-ci-openshift-installer-master-verify-codegen
  • pull-ci-openshift-installer-master-verify-vendor
  • pull-ci-openshift-installer-master-yaml-lint

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@zaneb
Copy link
Member

zaneb commented Jul 17, 2024

/test e2e-metal-assisted

@jlebon
Copy link
Member Author

jlebon commented Jul 18, 2024

Just surfacing some insights from internal discussions here.

The AI test here is not using the ISO from rhcos.json but instead the one hardcoded in https://github.com/openshift/assisted-service/blob/master/data/default_os_images.json.

The ABI tests here are not using the ISO from the rhcos.json but instead the ISO baked in the release payload within machine-os-images.

@jlebon
Copy link
Member Author

jlebon commented Jul 18, 2024

The ABI tests here are not using the ISO from the rhcos.json but instead the ISO baked in the release payload within machine-os-images.

It's odd to me that machine-os-images is not more tightly coupled to rhcos.json. Was that by design? ISTM like machine-os-images should basically just be output from the installer repo too instead and always match what's in rhcos.json. (Though longer term obviously related is discussions around having managed bootimages, e.g. openshift/enhancements#201.)

@zaneb
Copy link
Member

zaneb commented Jul 19, 2024

It's odd to me that machine-os-images is not more tightly coupled to rhcos.json. Was that by design? ISTM like machine-os-images should basically just be output from the installer repo too instead and always match what's in rhcos.json.

It's tightly coupled in ART builds because machine-os-images depends on the installer image. But upstream CI doesn't have any notion of dependencies between container images. The CI payload just contains the latest master build of every container image. So machine-os-images only gets rebuilt when a PR merges to it, regardless of any changes to the installer repo (let alone unmerged PRs in the installer repo).

Arguably we chose the wrong ISO here, and in the event of a conflict between what rhcos.json says and what's in the payload, we should go with downloading the former so that we can CI changes like this. The assumption is that the payload is canonical, but that's not really true in CI, and CI is theoretically the only place they can differ. (I wonder if this would break disconnected CI tests though - @bfournie do you know? It'd also mean we'd be executing different code paths in CI than users are actually getting in reality.)

@r4f4
Copy link
Contributor

r4f4 commented Jul 19, 2024

Looks like e2e-aws-ovn-heterogeneous failed on a different issue.

This job is not required and expected to fail until #8698 merges.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 6, 2024
@jlebon
Copy link
Member Author

jlebon commented Sep 6, 2024

OK, updated! Didn't rebase to make the diff easier, but also because I don't expect this to be merged before the associated enhancement is merged.

But changes now include:

  • I renamed the nomenclature from "pivot" to "overlay"
  • now with 100% less setenforce 0
  • we no longer use Restart=on-failure because it complicates dependency handling (as discussed in WIP: overlay node image before bootstrapping if necessary #8742 (comment)), and only the tiny section related to networking needs to be retried
  • I've split out the systemctl isolate back to multi-user.target into a separate unit; this is part of the work to make this work with Assisted Installer
  • we now persist /usr/lib/modules
  • we now use ostree container image pull to pull the image

This works with Assisted Installer as well but requires a patch there. I'll open that one shortly.

@jlebon
Copy link
Member Author

jlebon commented Sep 6, 2024

/test e2e-aws-ovn
/test e2e-openstack-ovn

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 6, 2024
@jlebon
Copy link
Member Author

jlebon commented Sep 6, 2024

OK, ended up doing a full rebase so try to get CI going.

/test e2e-aws-ovn
/test e2e-openstack-ovn

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sane to me

Copy link
Contributor

openshift-ci bot commented Sep 6, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cgwalters
Once this PR has been reviewed and has the lgtm label, please assign jhixson74 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jlebon added a commit to jlebon/assisted-installer that referenced this pull request Sep 6, 2024
As per openshift/enhancements#1637, we're trying
to get rid of all OpenShift-versioned components from the bootimages.

This means that there will no longer be oc, kubelet, or crio
binaries for example, which bootstrapping obviously relies on.

To adapt to this, the OpenShift installer now ships a new
`node-image-overlay.service` in its bootstrap Ignition config. This
service takes care of pulling down the node image and overlaying it,
effectively updating the system to the node image version.

Here, we accordingly also adapt assisted-installer so that we run
`node-image-overlay.service` before starting e.g. `kubelet.service` and
`bootkube.service`.

See also: openshift/installer#8742
jlebon added a commit to jlebon/assisted-installer that referenced this pull request Sep 6, 2024
As per openshift/enhancements#1637, we're trying
to get rid of all OpenShift-versioned components from the bootimages.

This means that there will no longer be oc, kubelet, or crio
binaries for example, which bootstrapping obviously relies on.

To adapt to this, the OpenShift installer now ships a new
`node-image-overlay.service` in its bootstrap Ignition config. This
service takes care of pulling down the node image and overlaying it,
effectively updating the system to the node image version.

Here, we accordingly also adapt assisted-installer so that we run
`node-image-overlay.service` before starting e.g. `kubelet.service` and
`bootkube.service`.

See also: openshift/installer#8742
@jlebon jlebon changed the title WIP: bootstrap: pivot into node image before bootstrapping WIP: overlay node image before bootstrapping if necessary Sep 6, 2024
@jlebon
Copy link
Member Author

jlebon commented Sep 6, 2024

Updated for feedback and updated commit and PR title to drop "pivot" wording.
Opened openshift/assisted-installer#899 for Assisted Installer counterpart.

@jlebon
Copy link
Member Author

jlebon commented Sep 7, 2024

/test e2e-aws-ovn
/test e2e-openstack-ovn

jlebon added a commit to jlebon/assisted-installer that referenced this pull request Sep 10, 2024
As per openshift/enhancements#1637, we're trying
to get rid of all OpenShift-versioned components from the bootimages.

This means that there will no longer be oc, kubelet, or crio
binaries for example, which bootstrapping obviously relies on.

To adapt to this, the OpenShift installer now ships a new
`node-image-overlay.service` in its bootstrap Ignition config. This
service takes care of pulling down the node image and overlaying it,
effectively updating the system to the node image version.

Here, we accordingly also adapt assisted-installer so that we run
`node-image-overlay.service` before starting e.g. `kubelet.service` and
`bootkube.service`.

See also: openshift/installer#8742
The if chain just after this expects the final filename and not the
possibly templated version. Strip the suffix in case it's a templated
file.

This notably fixes the mode for `registries.conf`, which became a
templated file and so went from 0644 to 0600. This patch brings it back
to 0644.
As per openshift/enhancements#1637, we're trying
to get rid of all OpenShift-versioned components from the bootimages.

This means that there will no longer be `oc`, `kubelet`, or `crio`
binaries for example, which bootstrapping obviously relies on.

Instead, now we change things up so that early on when booting the
bootstrap node, we pull down the node image, unencapsulate it (this just
means convert it back to an OSTree commit), then mount over its `/usr`,
and import new `/etc` content.

This is done by isolating to a different systemd target to only bring
up the minimum number of services to do the pivot and then carry on
with bootstrapping.

This does not incur additional reboots and should be compatible
with AI/ABI/SNO. But it is of course, a huge conceptual shift in how
bootstrapping works. With this, we would now always be sure that we're
using the same binaries as the target version as part of bootstrapping,
which should alleviate some issues such as AI late-binding (see e.g.
https://issues.redhat.com/browse/MGMT-16705).

The big exception of course being the kernel. Relatedly, note we do
persist `/usr/lib/modules` from the booted system so that loading kernel
modules still works.

To be conservative, the new logic only kicks in when using bootimages
which do not have `oc`. This will allow us to ratchet this in more
easily.

Down the line, we should be able to replace some of this with
`bootc apply-live` once that's available (and also works in a live
environment). (See containers/bootc#76.)

For full context, see the linked enhancement and discussions there.
These bootimages are RHEL-versioned and do not have any OCP components
in them.
golint was complaining about:

```
pkg/asset/ignition/bootstrap/common.go:406:2: ifElseChain: rewrite if-else to switch statement (gocritic)
	if parentDir == "bin" || parentDir == "dispatcher.d" || parentDir == "system-generators" {
	^
```
@jlebon
Copy link
Member Author

jlebon commented Sep 10, 2024

One update I did here is in the live ISO case, we now nuke the temporary repo completely after the checkout. This saves precious RAM space so we can fit within the 16G minimum requirements for SNO.

/test e2e-aws-ovn
/test e2e-openstack-ovn

Copy link
Contributor

openshift-ci bot commented Oct 3, 2024

@jlebon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-heterogeneous 9f5f3ee link false /test e2e-aws-ovn-heterogeneous
ci/prow/artifacts-images 9f5f3ee link true /test artifacts-images
ci/prow/e2e-openstack-ovn 4c29e7f link true /test e2e-openstack-ovn
ci/prow/integration-tests-nodejoiner 4c29e7f link true /test integration-tests-nodejoiner

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 3, 2024
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants