Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC/RFC: Support reinstalls via kexec #712

Closed
wants to merge 1 commit into from
Closed

POC/RFC: Support reinstalls via kexec #712

wants to merge 1 commit into from

Conversation

jlebon
Copy link
Member

@jlebon jlebon commented Dec 10, 2021

Edit: moved to #791.

This is a proof-of-concept showing how we can use kexec to support reinstalling assuming we have access to the rootfs image (this is strongly related to or exactly the same as coreos/fedora-coreos-tracker#399 depending on which flavour of "reset" we're talking about).

Demo:

$ coreos-installer reinstall /dev/vda --platform qemu --insecure
Downloading https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.10/410.84.202112091617-0/x86_64/rhcos-410.84.20.
Downloading https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.10/410.84.202112091617-0/x86_64/rhcos-410.84.202112091617-0-live-rootfs.x86_64.img...
Loading kernel and initramfs with arguments:  coreos.inst.install_dev=/dev/vda console=ttyS0 coreos.inst.platform_id=qemu coreos.inst.insecure
Pivoting
[ 1259.444402] kexec_core: Starting new kernel
[    0.000000] Linux version 4.18.0-305.28.1.el8_4.x86_64 ([email protected]) (gcc version 8.4.1 20200928 (Re1
[    0.000000] Command line:  coreos.inst.install_dev=/dev/vda console=ttyS0 coreos.inst.platform_id=qemu coreos.inst.insecure
[   15.495212] coreos-installer-service[1066]: coreos-installer install /dev/vda --platform qemu --insecure --fetch-retries infinite
[   15.660528] coreos-installer-service[1066]: Installing Red Hat Enterprise Linux CoreOS 410.84.202112091617-0 (Ootpa) x86_64 (512-byte)
[   15.951946]  vda: vda1 vda2 vda3 vda4
[   15.960578]  vda: vda1 vda2 vda3 vda4
[   16.955472] coreos-installer-service[1066]: Read disk 43.7 MiB/3.7 GiB (1%)
[  100.914384] coreos-installer-service[1066]: Read disk 3.7 GiB/3.7 GiB (100%)
[  106.201139] coreos-installer-service[1066]: Install complete.
[  107.902354] reboot: Restarting system
[  107.907561] reboot: machine restart
...
Red Hat Enterprise Linux CoreOS 410.84.202112091617-0 (Ootpa) 4.10
Ignition: ran on 2021/12/10 23:10:41 UTC (this boot)
Ignition: user-provided config was applied

[core@cosa-devsh ~]$ journalctl --list-boots --no-pager
 0 5588b683714241839237799c72ca28c6 Fri 2021-12-10 23:10:33 UTC—Fri 2021-12-10 23:12:01 UTC

(I've cut out a lot of stuff from the logs there to concentrate on the interesting bits; see this attachment for the full output if you're curious.)

It also supports specifying the target Ignition config, image URL, using a local live initramfs and live rootfs, etc...

FCOS is also supported though for there seems to be a bug in the kexec code there.

@jlebon
Copy link
Member Author

jlebon commented Dec 11, 2021

We should be able to avoid reserializing to kargs by re-using the bits from #706 and #708.

@dustymabe
Copy link
Member

Interesting.. Do we have any concerns here about preserving data for filesystems that we want to re-use? i.e. is the kexec safe enough for mounted filesystems that we want to preserve?

@jlebon
Copy link
Member Author

jlebon commented Dec 13, 2021

Interesting.. Do we have any concerns here about preserving data for filesystems that we want to re-use? i.e. is the kexec safe enough for mounted filesystems that we want to preserve?

If the filesystems to preserve were created by Ignition and part of the same Ignition passed back to coreos-installer reinstall, then data should transparently be preserved. Otherwise, the same support for saving existing partitions as added in #321 is also present.

@travier
Copy link
Member

travier commented Dec 13, 2021

We should probably unmount as much as possible and remount everything else RO before doing the kexec.

@travier
Copy link
Member

travier commented Dec 13, 2021

Maybe creating a special systemd target with a similar behavior as shutdown could help here.

@jlebon
Copy link
Member Author

jlebon commented Dec 13, 2021

Maybe creating a special systemd target with a similar behavior as shutdown could help here.

We use systemctl kexec which is pretty much this. :)
And now I realize that's what @dustymabe was actually asking.

Edit: I think my demo output above was a bit misleading because I didn't clarify that I've edited out a lot of stuff. I've attached the full logs now; you can see there systemd shutting down before we kexec the new kernel.

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow cool stuff, I should clearly be watching PRs to coreos-installer more.

I think the code here would be a lot smaller if we only supported stream metadata.

I'd also say I'd be a lot more confident in offering this when it has an e2e test.

arch: &str,
retries: FetchRetries,
) -> Result<(Url, Url)> {
let release_meta_url = format!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this covered by https://github.com/coreos/stream-metadata-rust ?

Oh, I see the comment above.

But...hum, shouldn't we mainly support re-installing from a well known stream (i.e. usually stable)?

The use case of reinstalling from a specific release seems like it can be covered by having the user manually provide a URL to a build.

That would allow them to use a specific release from the FCOS prod builds (until we GC them) and would allow them to use mirrored builds more easily.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stream-metadata-go already has structs for release metadata, with a comment warning that they shouldn't be used. We could do that in -rust also.

But yes, I agree. I understand the desire to reinstall the release that's currently running, but it's also odd not to follow our own recommendation to always run the latest release. @cgwalters' suggestion seems like a good compromise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned this when I demo'ed it but not here: the idea with using the same release was that I had a faint hope we could reuse all the artifacts we have on hand to make it a fully local experience. Obviously the rootfs squashfs would've taken some creativity. :)

Ok(f)
}

// XXX: copied from Zincati; need https://github.com/coreos/rpm-ostree/issues/2389
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah...I guess what we should really do is take the hit of splitting the bindings out into a separate git repo, publish as a crate, but have rpm-ostree pull it in from git main.

That would just make it hard to add a new required field, but we can live with that I think.

)?;

eprintln!("Pivoting");
runcmd!("systemctl", "kexec")?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kexec is great but I think can sometimes have bugs, so it might be useful to think about a mode that instead writes the kernel/initrd to e.g. /boot/coreos-installer-reprovision/{kernel,initramfs.img} and updates the bootloader entries to use them manually. Then we can do the reinstall from the initramfs on the next boot.

The hit is a double reboot, but it avoids kexec.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, possible. One thing worth mentioning though is that this strategy could be useful not just for reinstalling CoreOS from CoreOS, but any situation in which you're running on top of the same block device you'd like to install to. So I could imagine this being used in the container workflow too. In which case, not having to fiddle with boot configuration is a plus because you don't know what the setup is (and of course reinstall doesn't really make sense in that context; maybe would belong better under install with a switch?).

@lukasbestle
Copy link

Can you explain how it works from a high level perspective? I.e. what will the result be after the reinstall: Will the root partition be completely reset as if CoreOS was installed to a blank drive but additional filesystems (like a separate /var partition) will be untouched? Or have I misunderstood it?

@jlebon
Copy link
Member Author

jlebon commented Jan 4, 2022

Can you explain how it works from a high level perspective? I.e. what will the result be after the reinstall: Will the root partition be completely reset as if CoreOS was installed to a blank drive but additional filesystems (like a separate /var partition) will be untouched? Or have I misunderstood it?

Yup exactly. The idea is to be equivalent to coreos-installer install except that it's less friction than re-hosting PXE artifacts or downloading and booting from the live ISO. Re. additional filesystems, see #712 (comment).

Copy link
Contributor

@bgilbert bgilbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This looks promising.

@@ -410,6 +410,50 @@ pub fn download_to_tempfile(url: &Url, retries: FetchRetries) -> Result<File> {
Ok(f)
}

// XXX: try to dedupe with write_image?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd favor just using write_image, or wrapping it if we want a convenience function. It looks like this implements a subset of what write_image supports?

let mut sig_url = url.clone();
sig_url.set_path(&format!("{}.sig", url.path()));
download_and_verify_to_tempfile(url, &sig_url, retries)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to let UrlLocation handle the signature fetching, and unify these paths?

arch: &str,
retries: FetchRetries,
) -> Result<(Url, Url)> {
let release_meta_url = format!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stream-metadata-go already has structs for release metadata, with a comment warning that they shouldn't be used. We could do that in -rust also.

But yes, I agree. I understand the desire to reinstall the release that's currently running, but it's also odd not to follow our own recommendation to always run the latest release. @cgwalters' suggestion seems like a good compromise.

}

// Hackily uses https://releases-art-rhcos.svc.ci.openshift.org; this is probably not kosher.
fn get_rhcos_live_urls(version: &str, arch: &str, retries: FetchRetries) -> Result<(Url, Url)> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far we've avoided hardcoding distro-specific logic, so this is unfortunate. The bootimage semantics of RHCOS may make it difficult to avoid, though.

Ok((kver_dir.join("vmlinuz"), kver_dir.join("initramfs.img")))
}

fn concat_initrds(bottom_initrd: &mut File, top_initrd: &mut File) -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're in the process of disrecommending concatenation; xref coreos/fedora-coreos-tracker#1055 (comment). We could teach the initrd to extract the rootfs from a file on a local partition, but that does seem like a lot of extra work.

initrd
};

// build kargs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you said in #712 (comment), let's try to avoid this. I'd rather see us pass an Ignition config with an embedded installer config to the live boot via an appended initrd.

kargs.push_str(" coreos.inst.insecure");
}
if config.skip_reboot {
kargs.push_str(" coreos.inst.skip_reboot");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could handle this by having the live boot Ignition config create a unit that removes /run/coreos-installer-reboot.

let mut v = VerifyReader::new(&mut bf, Some(sig.as_slice()), VerifyKeys::Production)
.with_context(|| format!("creating verifier for {}", path))?;
copy(&mut v, &mut std::io::sink()).with_context(|| format!("reading {}", path))?;
v.verify_without_logging_failure()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is intended for try_ paths where we fall back to another code path on signature mismatch. Since we do actually fail here, we should use the logging variant.

config.fetch_retries,
)?,
"rhcos" => get_rhcos_live_urls(
&booted_deployment.version,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To preserve bootimage semantics, shouldn't we be installing the aleph-version and not the current version?

@jlebon
Copy link
Member Author

jlebon commented Feb 11, 2022

I think before picking this up again, we should determine what kind of factory reset we want and what constraints we want to be able operate within. Then we can see how to adapt this best given those constraints.

@jlebon
Copy link
Member Author

jlebon commented Feb 22, 2022

Re-opened in #791.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants