Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T5593: Further shrink VyOS imagesize, part 1/2 #209

Closed
wants to merge 2 commits into from

Conversation

Apachez-
Copy link

Further shrink VyOS imagesize (ISO) by moving Linux kernel-files out of the filesystem.squashfs and into the live-directory of the ISO.

Note! There is a part 2/2 of this for vyos-build that must be merged at the same time.

For this to work following files have been modified:

  • scripts/install/install-image-new
    So the Linux-kernel files are copied to the persistent boot-directory during install.

  • scripts/install/install-image-existing
    So the Linux-kernel files are copied to the persistent boot-directory during upgrade.

Smoketest results:

DEBUG - vyos@vyos:~$ echo EXITCODE:$?
DEBUG - echo EXITCODE:$?
DEBUG - EXITCODE:0
 INFO - Smoketest finished successfully!
 INFO - Powering off system
DEBUG - vyos@vyos:~$ poweroff now
 INFO - Shutting down virtual machine
 INFO - Waiting for shutdown...
DEBUG - poweroff now
 INFO - Waiting for shutdown...
DEBUG - poweroff now
 INFO - VM is shut down!
 INFO - Cleaning up
 INFO - Removing disk file: testinstall-20230926-233250-071e.img

For more information see task:

@Apachez-
Copy link
Author

Related (and must be merged at the same time):

PR created for part 2/2 (vyos-build): vyos/vyos-build#427

@c-po c-po requested review from a team, dmbaturin, sarthurdev, zdc, jestabro, sever-sever and c-po and removed request for a team September 28, 2023 05:17
jestabro
jestabro previously approved these changes Sep 28, 2023
@jestabro jestabro dismissed their stale review September 28, 2023 13:37

meant to add comment

@jestabro jestabro self-requested a review September 28, 2023 13:37
Copy link
Contributor

@jestabro jestabro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An iso built from this and the companion PR will not boot in KVM or serial console; running make test shows:

DEBUG - error: no suitable video mode found.
DEBUG - error: no suitable video mode found.
DEBUG - Booting in blind modeBooting in blind mode

@Apachez-
Copy link
Author

Booting in blind modeBooting in blind mode

This exists already in the regular nightly builds:

https://github.com/vyos/vyos-rolling-nightly-builds/actions/runs/6332921937/job/17200227229#step:10:27110

@Apachez-
Copy link
Author

Using VirtualBox 7.0.10 at host Ubuntu 23.04 (Ubuntu Linux kernel 6.2.0-33-generic #33-Ubuntu SMP PREEMPT_DYNAMIC Tue Sep 5 14:49:19 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux).

Booting installation with VyOS 1.5-rolling-202309240023 works without problems.

Upgrading above to the build including the 2 commits (part 1 and 2) aka the same as smoketests were runned at when creating the PR's works without problem but fails during boot:

root@vyos:/tmp# add system image /tmp/vyos-1337-230927-2-amd64.iso
Checking SHA256 checksums of files on the ISO image... OK.
Done!
What would you like to name this image? [1337-230927-2]: 
OK.  This image will be named: 1337-230927-2
Installing "1337-230927-2" image.
Copying new release files...
Would you like to save the current configuration 
directory and config file? (Yes/No) [Yes]: 
Copying current configuration...
Would you like to save the SSH host keys from your 
current configuration? (Yes/No) [Yes]: 
Copying SSH keys...
Running post-install script...
Setting up grub configuration...
Done.

After reboot:

error: file '/boot/1337-230927-2/vmlinuz' not found.
error: you need to load the kernel first.

Press any key to continue...

and then thrown back to the grub selector.

Doing a fresh install using vyos-1337-230927-2-amd64.iso fails to boot aswell (from the ISO).

Conclusion:

Something odd is going on...

Comparing the directories for when upgrade was runned:

root@vyos:/usr/lib/live/mount/persistence/boot# ls -la ./1.5-rolling-202309230021/
total 383480
drwxr-xr-x  5 root root      4096 Sep 23 11:26 .
drwxr-xr-x  6 root root      4096 Sep 28 19:20 ..
-r--r--r--  1 root root 352346112 Sep 23 03:42 1.5-rolling-202309230021.squashfs
-rw-r--r--  1 root root    155136 Sep 19 21:04 config-6.1.54-amd64-vyos
drwxr-xr-x  2 root root      4096 Sep 23 11:26 grub
lrwxrwxrwx  1 root root        28 Sep 23 03:42 initrd.img -> initrd.img-6.1.54-amd64-vyos
-rw-r--r--  1 root root  30435573 Sep 23 03:42 initrd.img-6.1.54-amd64-vyos
drwxr-xr-x 10 root root      4096 Sep 23 11:31 rw
-rw-r--r--  1 root root   3440054 Sep 19 21:04 System.map-6.1.54-amd64-vyos
lrwxrwxrwx  1 root root        25 Sep 23 03:42 vmlinuz -> vmlinuz-6.1.54-amd64-vyos
-rw-r--r--  1 root root   6280288 Sep 19 21:04 vmlinuz-6.1.54-amd64-vyos
drwxr-xr-x  3 root root      4096 Sep 23 11:31 work
root@vyos:/usr/lib/live/mount/persistence/boot# ls -la ./1337-230927-2/
total 327668
drwxr-xr-x 4 root root      4096 Sep 28 19:20 .
drwxr-xr-x 6 root root      4096 Sep 28 19:20 ..
-r--r--r-- 1 root root 335511552 Sep 27 01:06 1337-230927-2.squashfs
lrwxrwxrwx 1 root root        28 Sep 27 01:06 initrd.img -> initrd.img-6.1.55-amd64-vyos
drwxr-xr-x 4 root root      4096 Sep 28 19:20 rw
lrwxrwxrwx 1 root root        25 Sep 27 01:06 vmlinuz -> vmlinuz-6.1.55-amd64-vyos
drwxr-xr-x 3 root root      4096 Sep 28 19:20 work

Vs a clean installation:

Stuck at boot menu (no errors just thrown back to the boot menu with VyOS logo).

Mounting the ISO manually and looking at /boot/grub/grub.cfg:

# Live boot
menuentry "Live system (amd64-vyos) - KVM console" --hotkey=l {
	linux	/live/vmlinuz-6.1.55-amd64-vyos boot=live components hostname=vyos username=live nopersistence noautologin nonetworking union=overlay console=ttyS0,115200 console=tty0 net.ifnames=0 biosdevname=0 findiso=${iso_path}
	initrd	/live/initrd.img-6.1.55-amd64-vyos
}
menuentry "Live system (amd64-vyos fail-safe mode)" {
	linux	/live/vmlinuz-6.1.55-amd64-vyos live components memtest noapic noapm nodma nomce nolapic nomodeset nosmp nosplash vga=normal console=ttyS0,115200 console=tty0 net.ifnames=0 biosdevname=0
	initrd	/live/initrd.img-6.1.55-amd64-vyos
}

# Installer (if any)
source /boot/grub/install_start.cfg
menuentry "Live system (amd64-vyos) - Serial console"  {
	linux	/live/vmlinuz-6.1.55-amd64-vyos boot=live components hostname=vyos username=live nopersistence noautologin nonetworking union=overlay console=tty0 console=ttyS0,115200 net.ifnames=0 biosdevname=0 findiso=${iso_path}
	initrd	/live/initrd.img-6.1.55-amd64-vyos
}

However when in the boot menu pressing TAB to alter the bootstring the "vmlinuz-6.1.55-amd64-vyos" is nowhere to be seen?

Anyone who want to explain that to me? :-)

That the boot fails from ISO could of course be due to that symlinks are supported by the Rock Ridge ISO format while the bios of VirtualBox and others perhaps only uses Joliet which doesnt support symlinks?

That would explain why the smoketests and booting through QEMU works.

However even if the symlinks are removed (as in real files are placed in /live-directory of the ISO instead of the symlinks) there is still an issue with the copy script thats supposed to copy the Linux kernel files from the ISO to the persistent boot directory.

That is the copy script for a new installation obviously works (otherwise the smoketests would have failed to start) while the one for an upgrade is the one that for whatever reason fails.

Lets pause these PR's for now and Ill return when I have explored this some more...

@Apachez-
Copy link
Author

Hmm, enabling EFI in VirtualBox makes the clean install to be successful including booting afterwards (as in install image was successful including reboot to the installed image).

@Apachez-
Copy link
Author

Regarding the differences in boot:

It seems like when EFI is used then /boot/grub/grub.cfg defines the boot menu.

# Live boot
menuentry "Live system (amd64-vyos) - KVM console" --hotkey=l {
	linux	/live/vmlinuz-6.1.55-amd64-vyos boot=live components hostname=vyos username=live nopersistence noautologin nonetworking union=overlay console=ttyS0,115200 console=tty0 net.ifnames=0 biosdevname=0 findiso=${iso_path}
	initrd	/live/initrd.img-6.1.55-amd64-vyos
}
menuentry "Live system (amd64-vyos fail-safe mode)" {
	linux	/live/vmlinuz-6.1.55-amd64-vyos live components memtest noapic noapm nodma nomce nolapic nomodeset nosmp nosplash vga=normal console=ttyS0,115200 console=tty0 net.ifnames=0 biosdevname=0
	initrd	/live/initrd.img-6.1.55-amd64-vyos
}

# Installer (if any)
source /boot/grub/install_start.cfg
menuentry "Live system (amd64-vyos) - Serial console"  {
	linux	/live/vmlinuz-6.1.55-amd64-vyos boot=live components hostname=vyos username=live nopersistence noautologin nonetworking union=overlay console=tty0 console=ttyS0,115200 net.ifnames=0 biosdevname=0 findiso=${iso_path}
	initrd	/live/initrd.img-6.1.55-amd64-vyos
}

But when classic BIOS is used then /isolinux/live.cfg defines the boot menu.

label live-amd64-vyos
	menu label ^Live system (amd64-vyos) - KVM console
	menu default
	linux /live/vmlinuz
	initrd /live/initrd.img
	append boot=live components hostname=vyos username=live nopersistence noautologin nonetworking union=overlay console=ttyS0,115200 console=tty0 net.ifnames=0 biosdevname=0

label live-amd64-vyos-failsafe
	menu label Live system (amd64-vyos fail-safe mode)
	linux /live/vmlinuz
	initrd /live/initrd.img
	append live components memtest noapic noapm nodma nomce nolapic nomodeset nosmp nosplash vga=normal console=ttyS0,115200 console=tty0 net.ifnames=0 biosdevname=0

label live-amd64-vyos-serial
	menu label ^Live system (amd64-vyos) - Serial console
	linux /live/vmlinuz
	initrd /live/initrd.img
	append boot=live components hostname=vyos username=live nopersistence noautologin nonetworking union=overlay console=tty0 console=ttyS0,115200 net.ifnames=0 biosdevname=0

@Apachez-
Copy link
Author

PR updated for part 2/2 (vyos-build): vyos/vyos-build#427

@Apachez-
Copy link
Author

I think I have figured out why a fresh install works but not an update of existing install.

That is because when you in the existing install run add system image it will use the install-image-existing script found in /opt/vyatta/sbin which doesnt contain the code to copy the Linux kernel files from the /live-directory to the persistent boot-directory.

While when doing a fresh install you boot from the ISO which do contain the updated scripts.

So any suggestions on how to resolve this "chicken or the egg" situation?

One suggestion is to alter the op-mode for add system image so instead of using the local copy of /opt/vyatta/sbin/install-image-existing it would mount the ISO (as loopdevice) and use the install-image-existing found inside the ISO instead.

This updated op-mode could then be backported so users with lets say LTS 1.3.4 could first update to LTS 1.3.5 and then make the jump to LTS 1.4.0 or 1.5-rolling.

For the PR's regarding T5593 this would mean that first a nightly with the updated op-mode must be available (along with some note in forum and blog) and then from there merge the PR's for T5593.

@Apachez-
Copy link
Author

Update commited to fix so VyOS now can do both fresh install and upgrade from previous install (if image where these commits are merged were used for the previous install).

Note! Since VyOS prior to these PR's (involving T5593) cannot properly upgrade a fresh install is needed OR that the op-mode add system image is updated to use the install-image-existing from the ISO rather than the locally installed version.

Updated files:

scripts/install/install-image-new

scripts/install/install-image-existing

@Apachez-
Copy link
Author

Created https://vyos.dev/T5622 which must first be resolved before T5593 can get successfully merged.

@jestabro
Copy link
Contributor

jestabro commented Sep 29, 2023

@Apachez- the issue that you have encountered is similar to one we are acutely familiar with in the context of
vyos/vyos-1x#1768
and the reason for the revisions currently being made for its completion. Workarounds such as the above were considered in that context and (at least in that setting) rejected.

I think these investigations are worthwhile, however, a few important points:

(1) seamless upgrades of VyOS images is a notable and fundamental feature of VyOS and any disturbance beyond the occasional workaround for a broken migration should be avoided. You'll recall how disruptive was the case of the change in coreutils (T5267, T5520) --- note that a breakage in the continuity of the upgrade path never goes away: at some point, a user with an iso of that period will try to upgrade and require the workarounds as outlined for users on the forum and phabricator (as you had helped with).

(2) as mentioned in the task, the image-tools revision is being readied to move out of draft; fundamental changes to legacy tools (which we are all anxious to get rid of) will conflict with that. These investigations are useful and can be done in parallel or subsequent to that change.

(3) introducing a workaround that may later need to be revisited or reverted will leave a lasting footprint as in (1), and needs to be carefully considered.

Finally, asking a user to just do a fresh install OR download a special tool/script to update is a burden; again, something similar was considered for 1768 and rejected --- had that not been the case, we would have merged it with a notice of the breaking change.

@jestabro
Copy link
Contributor

(edits above for clarity)

@Apachez-
Copy link
Author

  1. Yes and in this case the change occurs so early so there are no migrationscripts that can resolve the issue.

I think this should be designed for future updates where the add system image can utilize updated scripts provided by the ISO itself compared to today where you end up in a "chicken or the egg" situation. And only use the local scripts as "fallback".

  1. Yes, this finding might affect that work aswell.

  2. A really dirty solution would be to provide the design changes of add system image and call that VyOS 2.0 and by that "sorry mate but you have to do a fresh install for 2.0 to work properly".

A less distruptive workaround would be to provide an update script which could be runned prior to upgrading the image (basically overwriting the local editions of install-image-existing). But that should too be incorporated in a future redesign of how the add system image works. For example that both the ISO and an (signed) "update.sh" is fetched (unless the update script can look for the update to run first by mounting the ISO as loopdevice). But then we are back to the "chicken or the egg" scenario...

The pentalty for now will be that we are stuck with 33MB of redundant junk in the ISO as long as the upgrade script cant be fixed.

@jestabro
Copy link
Contributor

jestabro commented Sep 29, 2023

I agree that we should keep that 33MB and your careful investigation in mind: at the moment, I will happily pay 33MB to avoid disgruntled users, our time providing workarounds, and the break in a strong feature of VyOS.

One idea that has been discussed, similar to your suggestion, is the possibility of providing a tool that updates earlier images of a system, overwriting their file systems with newer image tools --- this would only be provided as an option, with severe warnings of router destruction, for use at the discretion of an expert. etc. etc. ... it should never be required or default to introduce a breaking change in upgrade continuity. This was tested out of curiosity a few months ago, but would only be introduced after guaranteeing a seamless non-interactive update path, if at all.

@Apachez-
Copy link
Author

I suppose you already considered the below but I got a suggestion on how to resolve this issue:

TLDR:

Before production users can upgrade into LTS 1.4.0 they first must (if on 1.2.x) upgrade to 1.2.10 or (if on 1.3.x) upgrade to 1.3.10.

From the 1.x.10 version they can upgrade into LTS 1.4.0.

Another option for production users is to do a fresh install using LTS 1.4.0.

The above will be stated in the release notes for LTS 1.4.0.

For 1.5-rolling users prior to 2023-09-30 they must first upgrade to 1.5-rolling 2023-10-01 before going to whatever 1.5-rolling version they prefer (which is 2023-10-01 or newer).

Another option for the rolling usesr is to do a fresh install using 1.5-rolling 2023-10-01.

The above will be stated on blog and forum.

Longer edition:

  1. The python-edition of install/upgrade scripts gets upgraded to use the method in PR209 (vyatta-cfg-system):
find ${boot_dir} -maxdepth 1 \( -type f \( -name "config-*" -o -name "initrd.img" -o -name "initrd.img-*" -o -name "System.map-*" -o -name "vmlinuz" -o -name "vmlinuz-*" \) -o -type l \( -name "initrd.img" -o -name "vmlinuz" \) \)
  1. This switch to python-edition is what happens in 1.2.10 and 1.3.10 (the 1.x.10 versions ONLY contains that change compared to previous version of each tree).

  2. LTS 1.4.0 and 1.5-rolling 2023-10-01 gets the new python-edition aswell.

  3. Seemless upgrade path with the only bump that 1.2.x users must first upgrade to 1.2.10 and 1.3.x users must first upgrade to 1.3.10 before going for LTS 1.4.0 or 1.5-rolling 2023-10-01 or newer.

  • 1.2.9-S1 -> 1.2.10 -> LTS 1.4.0

  • LTS 1.3.4 -> LTS 1.3.10 -> LTS 1.4.0

  • 1.5-rolling 2023-09-29 -> 1.5-rolling 2023-10-01 -> 1.5-rolling...

  1. Then for LTS 1.4.1 and 1.5-rolling 2023-10-02 the T5593 fix gets fully implemented (since now the new python edition exists who will do the magic T5593 needs during install and upgrades) and 34MB can be shaved from the ISO size.

  2. Happy users and smaller imagesize?

@jestabro
Copy link
Contributor

jestabro commented Sep 29, 2023

So, maintaining a non-intrusive upgrade path is a high bar --- any requirement of the user for workarounds or extra steps would be considered a critical failing of that feature.

Secondly, I can imagine that some of the changes planned for development in the near future could easily add or subtract several 10s of MB, so let us not worry about the difference now, and certainly not in the face of the high bar set by our requirement.

Now, I very much appreciate interesting workarounds, but it is not worth jeopardizing any stability; better to focus on increasing stability under the design of evolving legacy -> modern --- that to me is the more challenging technical problem, and one of the many cool things about the VyOS project ...

@Apachez-
Copy link
Author

I would still consider it to be a flaw that the upgrade script is dependent on whats currently installed within the installation rather than whats provided through the ISO itself.

Even if other improvements will shave off another 10's of MB then there will still exist ~33MB of junk which could have been shaved off aswell as long as the upgrade scripts wasnt locked in a position where they doesnt seem to be able to be upgraded without some additional step (either if that step is to run a specific script or to be forced to pass a specific version).

That is I really hope that the new python-based upgrade script will have a capability to utilize whats provided by the ISO itself (to be upgraded into) in terms of scripting.

@Apachez-
Copy link
Author

Apachez- commented Jan 5, 2024

Perhaps this can be reopened for VyOS 2.0 whenever that happens?

The point of T5593 is to remove redudant information from the ISO itself and by that shrink it with a great amount.

@Apachez- Apachez- deleted the T5593 branch January 5, 2024 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants