Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot boot from /dev/vda, install the os from the running system to /dev/vdb and reboot. #1640

Open
ybettan opened this issue Dec 21, 2023 · 15 comments

Comments

@ybettan
Copy link

ybettan commented Dec 21, 2023

Describe the enhancement

It is not possible to boot from one disk and run coreos-installer install ... to a second disk and reboot.

[FAILED] Failed to mount /sysroot.
[    2.743748] systemd[1]: run-credentials-systemd\x2dsysusers.service.mount: Deactivated successfully.
[    2.744530] systemd[1]: sysroot.mount: Failed to load environment files: No such file or directory
See 'systemctl status sysroot.mount' for details.
[    2.745816] systemd[1]: sysroot.mount: Failed to run 'mount' task: No such file or directory
[    2.746642] systemd[1]: sysroot.mount: Failed with result 'resources'.
[    2.747253] systemd[1]: Failed to mount /sysroot.
[    2.747767] systemd[1]: Startup finished in 1.449s (kernel) + 0 (initrd) + 1.151s (userspace) = 2.600s.
Displaying logs from failed units: sysroot.mount
coreos-unique-boot.service
systemd-udev-settle.service
Dec 19 17:11:15 systemd[1]: sysroot.mount: Failed to load environment files: No such file or directory
Dec 19 17:11:15 systemd[1]: sysroot.mount: Failed to run 'mount' task: No such file or directory
Dec 19 17:11:15 systemd[1]: sysroot.mount: Failed with result 'resources'.
Dec 19 17:11:15 systemd[1]: Failed to mount /sysroot.
Dec 19 17:11:14 systemd[1]: Starting Ensure Unique `boot` Filesystem Label...
Dec 19 17:11:14 rdcore[730]: Error: System has 2 devices with a filesystem labeled 'boot': ["/dev/vdb3", "/dev/vda3"]
Dec 19 17:11:14 systemd[1]: coreos-unique-boot.service: Main process exited, code=exited, status=1/FAILURE
Dec 19 17:11:14 systemd[1]: coreos-unique-boot.service: Failed with result 'exit-code'.
Dec 19 17:11:14 systemd[1]: Failed to start Ensure Unique `boot` Filesystem Label.
Dec 19 17:11:14 systemd[1]: coreos-unique-boot.service: Triggering OnFailure= dependencies.
Dec 19 17:11:14 systemd[1]: Starting Wait for udev To Complete Device Initialization...
Dec 19 17:11:14 udevadm[591]: systemd-udev-settle.service is deprecated. Please fix multipathd-configure.service not to pull it in.
Dec 19 17:11:14 systemd[1]: systemd-udev-settle.service: Main process exited, code=killed, status=15/TERM
Dec 19 17:11:14 systemd[1]: systemd-udev-settle.service: Failed with result 'signal'.
Dec 19 17:11:14 systemd[1]: Stopped Wait for udev To Complete Device Initialization.

Generating "/run/initramfs/rdsosreport.txt"


Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot
after mounting them and attach it to a bug report.


:/#

I am adding some OCP/RHCOS information to this issue because it is hard to describe the usecase without it. Hope it is fine.

In order for customers/partners to be able to install day0 kernel modules in OCP, those drivers needs to be present at the node's boot time.

In assisted-installer we use a discovery image which is usually just a live RHCOS + a node-discovery agent sending node information to the service prior to the cluster installation.

In order to allow day0 kernel module, those drivers needs to be present in the discovery image (which can be a live ISO or a disk image) since the discovery phase require connection to the service, therefore, network drivers.

In case of using a disk-image, a user/partner will need to build a custom disk-image using image-builder/osbuild, boot from it, connect to the service and then install RHCOS to the second disk so after the reboot nodes can reach machine-config-service on the bootstrap node and finish the installation.

The issue with this approach is that looks like some code was added to prevent the node from rebooting in such case because of "another filesystem labeled boot".

Adding link describing the full process: https://github.com/ybettan/image-composer/blob/main/CUSTOMIZE_RHCOS.md#prerequisites-when-installing-from-a-disk-image

System details

No response

Additional information

Note: this is a temporary approach until we are able to have custom live ISOs.

@ybettan
Copy link
Author

ybettan commented Dec 21, 2023

FYI @dustymabe @jlebon

@ybettan
Copy link
Author

ybettan commented Jan 3, 2024

This is how the disks looks like after running coreos-installer install.
I am booted from /dev/vdb and trying to install RHCOS on /dev/vda
The partitions on /dev/vda were created by coreos-installer install.

[core@rhcos-disk-image-manual-installation ~]$ lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sr0     11:0    1 1024M  0 rom
vda    252:0    0   30G  0 disk
├─vda1 252:1    0    1M  0 part
├─vda2 252:2    0  127M  0 part
├─vda3 252:3    0  384M  0 part
└─vda4 252:4    0    3G  0 part
vdb    252:16   0   30G  0 disk
├─vdb1 252:17   0    1M  0 part
├─vdb2 252:18   0  127M  0 part
├─vdb3 252:19   0  384M  0 part
└─vdb4 252:20   0 29.5G  0 part /var
                                /sysroot/ostree/deploy/rhcos/var
                                /usr
                                /etc
                                /
                                /sysroot

@dustymabe
Copy link
Member

dustymabe commented Jan 3, 2024

This check is implemented in coreos-unique-boot.service and runs on first boot (Ignition boot).

Some context on this check can be found in #976 and coreos/coreos-installer#658

@dustymabe
Copy link
Member

A little hacky (and completely untested) but you might be able to just make the installed system ignore the vdb disk completely with something like libata.force=2.00:disable (see https://unix.stackexchange.com/a/104177). You'll have to make sure vdb is indeed 2.00.

@ybettan
Copy link
Author

ybettan commented Jan 8, 2024

A little hacky (and completely untested) but you might be able to just make the installed system ignore the vdb disk completely with something like libata.force=2.00:disable (see https://unix.stackexchange.com/a/104177). You'll have to make sure vdb is indeed 2.00.

Thank you @dustymabe. I will try this one out.

@jlebon
Copy link
Member

jlebon commented Jan 8, 2024

That would take care of the kernel + userspace side, but even GRUB won't know which one is the right one when booting. The more comprehensive fix for this is to bind the bootfs at install time: coreos/coreos-installer#798.

@dustymabe
Copy link
Member

That would take care of the kernel + userspace side, but even GRUB won't know which one is the right one when booting. The more comprehensive fix for this is to bind the bootfs at install time: coreos/coreos-installer#798.

but won't our bootuuid.cfg make GRUB choose the right one? I guess though that only kicks in on the 2nd and subsequent boots.

@jlebon
Copy link
Member

jlebon commented Jan 9, 2024

One more hacky thing you could do is before rebooting, relabel the bootfs to e.g. old-boot or something (tune2fs -L old-boot /dev/vdb3). That way, GRUB can't get confused.

(Would be nice if we could do that for the rootfs too. That would drop the requirement on the karg entirely. But XFS doesn't support changing label online, which makes this harder.)

@ybettan
Copy link
Author

ybettan commented Jan 16, 2024

Thanks @jlebon.

This sounds hacky indeed. I can try that but that would definitely require changes to Assisted.
Any estimation of when/if coreos/coreos-installer#798 will be imlemented?

@cgwalters
Copy link
Member

FWIW, bootc install does this UUID setup automatically, as does Anaconda when doing the container install path.

@jlebon
Copy link
Member

jlebon commented Jan 16, 2024

@ybettan At least they'd only be temporary until we get custom live ISOs.

I think we could and should support this too, but it'd still be super confusing to have two filesystems labeled boot and two labeled root. More practically, it'd require ripping out all uses of /dev/disk/by-label/[rb]oot (or alternatively, add a udev rule to symlink the right one with higher priority).

In that sense, the libata.force trick while hacky does yield a saner environment.

@ybettan
Copy link
Author

ybettan commented Jan 17, 2024

FWIW, bootc install does this UUID setup automatically, as does Anaconda when doing the container install path.

Booting from a disk image is mainly a temporary solution until we have live ISOs. Are their plans for OCP to start using bootc install rather than coreos-installer install at some point?

@ybettan
Copy link
Author

ybettan commented Jan 17, 2024

In that sense, the libata.force trick while hacky does yield a saner environment.

@jlebon Didn't we say that it won't work as GRUB won't know which disk to boot?

@jlebon
Copy link
Member

jlebon commented Jan 17, 2024

You'd need both. Relabeling boot would address the GRUB issue, and the karg would address the rest of the OS.

@justjokiing
Copy link

One more hacky thing you could do is before rebooting, relabel the bootfs to e.g. old-boot or something (tune2fs -L old-boot /dev/vdb3). That way, GRUB can't get confused.

This worked perfectly for me. I was able to copy over my data and migrate to the new disk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants