Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aarch64 failing to upgrade with /boot filesystem full #1637

Closed
dustymabe opened this issue Dec 19, 2023 · 28 comments · Fixed by ostreedev/ostree#3130
Closed

aarch64 failing to upgrade with /boot filesystem full #1637

dustymabe opened this issue Dec 19, 2023 · 28 comments · Fixed by ostreedev/ostree#3130

Comments

@dustymabe
Copy link
Member

We've seen this before but our mitigation of autopruning old deployments has been working well. For some reason I think we've hit a corner case in the autopruning code and it's not kicking in on certain upgrade paths. This was caught in our extended upgrade tests: kola-upgrade#2487.

Here is what the failure looks like:

[core@qemu0 ~]$ rpm-ostree status 
State: idle
AutomaticUpdatesDriver: Zincati
  DriverState: active; periodically polling for updates (last checked Tue 2023-12-19 16:05:45 UTC)
Deployments:
_ fedora:fedora/aarch64/coreos/next
                  Version: 39.20231204.1.0 (2023-12-05T11:06:12Z)
                   Commit: 23ab01229a932eaefa10288dca1448cfbaf368138b163ee54e0fcac045a0001b
             GPGSignature: Valid signature by E8F23996F23218640CB44CBE75CF5AC418B8E74C

  fedora:fedora/aarch64/coreos/next
                  Version: 38.20230902.1.1 (2023-09-08T04:56:41Z)
                   Commit: 52ac14f114b6c535ab1a5c89c3897a35a850aab15e0f41aedeaee9326a10b24a
             GPGSignature: Valid signature by 6A51BBABBA3D5467B6171221809A8D7CEB10B464
[core@qemu0 ~]$ rpm -q ostree
ostree-2023.7-2.fc39.aarch64
[core@qemu0 ~]$ df -kh /boot/
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda3       350M  224M  103M  69% /boot
[core@qemu0 ~]$ du -sh /boot/ostree/*        
122M    /boot/ostree/fedora-coreos-061fe0d85629f0fc67f295fbd867c050a5cd6c3bb1a90afa93ee70f6abf2613f
103M    /boot/ostree/fedora-coreos-df484abdb771e2dee2cdb4d3418d0b7e6252217388bbc5b4049d20f734106715
[core@qemu0 ~]$ 
[core@qemu0 ~]$ sudo systemctl stop zincati
[core@qemu0 ~]$ sudo rpm-ostree rebase fedora-compose:
Rebasing to fedora-compose:fedora/aarch64/coreos/next
_ Receiving objects; 95% (6100/6396) 28.0_MB/s 280.0_MB                                                                                                                                                                                     1367 metadata, 5410 content objects fetched; 294403 KiB transferred in 12 seconds; 576.2_MB content written
Receiving objects; 95% (6100/6396) 28.0_MB/s 280.0_MB... done
Staging deployment... done
Freed: 927.0_MB (pkgcache branches: 0)
Upgraded:
  aardvark-dns 1.8.0-1.fc39 -> 1.9.0-1.fc39
  amd-gpu-firmware 20231111-1.fc39 -> 20231211-1.fc39
  atheros-firmware 20231111-1.fc39 -> 20231211-1.fc39
  bootupd 0.2.12-2.fc39 -> 0.2.16-2.fc39
  brcmfmac-firmware 20231111-1.fc39 -> 20231211-1.fc39
  chrony 4.4-1.fc39 -> 4.5-1.fc39
  container-selinux 2:2.224.0-1.fc39 -> 2:2.226.0-1.fc39
  containerd 1.6.19-2.fc39 -> 1.6.23-2.fc39
  coreos-installer 0.18.0-1.fc39 -> 0.18.0-2.fc39
  coreos-installer-bootinfra 0.18.0-1.fc39 -> 0.18.0-2.fc39
  criu 3.18-3.fc39 -> 3.19-2.fc39
  criu-libs 3.18-3.fc39 -> 3.19-2.fc39
  curl 8.2.1-3.fc39 -> 8.2.1-4.fc39
  elfutils-default-yama-scope 0.190-1.fc39 -> 0.190-4.fc39
  elfutils-libelf 0.190-1.fc39 -> 0.190-4.fc39
  elfutils-libs 0.190-1.fc39 -> 0.190-4.fc39
  fwupd 1.9.9-1.fc39 -> 1.9.10-1.fc39
  glib2 2.78.1-1.fc39 -> 2.78.3-1.fc39
  glibc 2.38-11.fc39 -> 2.38-14.fc39
  glibc-common 2.38-11.fc39 -> 2.38-14.fc39
  glibc-gconv-extra 2.38-11.fc39 -> 2.38-14.fc39
  glibc-minimal-langpack 2.38-11.fc39 -> 2.38-14.fc39
  gnutls 3.8.2-1.fc39 -> 3.8.2-2.fc39
  grub2-common 1:2.06-109.fc39 -> 1:2.06-110.fc39
  grub2-efi-aa64 1:2.06-109.fc39 -> 1:2.06-110.fc39
  grub2-tools 1:2.06-109.fc39 -> 1:2.06-110.fc39
  grub2-tools-minimal 1:2.06-109.fc39 -> 1:2.06-110.fc39
  intel-gpu-firmware 20231111-1.fc39 -> 20231211-1.fc39
  kernel 6.6.3-200.fc39 -> 6.6.6-200.fc39
  kernel-core 6.6.3-200.fc39 -> 6.6.6-200.fc39
  kernel-modules 6.6.3-200.fc39 -> 6.6.6-200.fc39
  kernel-modules-core 6.6.3-200.fc39 -> 6.6.6-200.fc39
  libatomic 13.2.1-4.fc39 -> 13.2.1-6.fc39
  libcurl-minimal 8.2.1-3.fc39 -> 8.2.1-4.fc39
  libgcc 13.2.1-4.fc39 -> 13.2.1-6.fc39
  libnfsidmap 1:2.6.4-0.fc39 -> 1:2.6.4-0.rc2.fc39
  libsolv 0.7.25-1.fc39 -> 0.7.27-1.fc39
  libstdc++ 13.2.1-4.fc39 -> 13.2.1-6.fc39
  linux-firmware 20231111-1.fc39 -> 20231211-1.fc39
  linux-firmware-whence 20231111-1.fc39 -> 20231211-1.fc39
  mt7xxx-firmware 20231111-1.fc39 -> 20231211-1.fc39
  netavark 1.8.0-2.fc39 -> 1.9.0-1.fc39
  nfs-utils-coreos 1:2.6.4-0.fc39 -> 1:2.6.4-0.rc2.fc39
  nvidia-gpu-firmware 20231111-1.fc39 -> 20231211-1.fc39
  podman 5:4.7.2-1.fc39 -> 5:4.8.1-1.fc39
  podman-plugins 5:4.7.2-1.fc39 -> 5:4.8.1-1.fc39
  realtek-firmware 20231111-1.fc39 -> 20231211-1.fc39
  rpm-ostree 2023.10-3.fc39 -> 2023.11-1.fc39
  rpm-ostree-libs 2023.10-3.fc39 -> 2023.11-1.fc39
  rpm-sequoia 1.5.0-1.fc39 -> 1.5.0-2.fc39
  skopeo 1:1.13.3-1.fc39 -> 1:1.14.0-1.fc39
  tpm2-tss 4.0.1-4.fc39 -> 4.0.1-6.fc39
  vim-data 2:9.0.2120-1.fc39 -> 2:9.0.2167-1.fc39
  vim-minimal 2:9.0.2120-1.fc39 -> 2:9.0.2167-1.fc39
Added:
  amd-ucode-firmware-20231211-1.fc39.noarch
  tpm2-tss-fapi-4.0.1-6.fc39.aarch64
Changes queued for next boot. Run "systemctl reboot" to start a reboot
[core@qemu0 ~]$ sudo ostree admin finalize-staged     
Copying /etc changes: 7 modified, 0 removed, 33 added
error: Installing kernel: Copying t6002-j375d.dtb: regfile copy: No space left on device
[core@qemu0 ~]$ df -kh /boot/                         
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda3       350M  346M     0 100% /boot

I think this is a corner case where somehow the autopruning logic
doesn't kick in. For example if I run the update on a system of mine
that has been following and auto-updating every week then somehow the
logic does kick in:

core@rpi4:~$ rpm-ostree status
State: idle
AutomaticUpdatesDriver: Zincati
  DriverState: active; periodically polling for updates (last checked Tue 2023-12-19 14:24:26 UTC)
Deployments:
● fedora:fedora/aarch64/coreos/next
                  Version: 39.20231204.1.0 (2023-12-05T11:06:12Z)
               BaseCommit: 23ab01229a932eaefa10288dca1448cfbaf368138b163ee54e0fcac045a0001b
             GPGSignature: Valid signature by E8F23996F23218640CB44CBE75CF5AC418B8E74C
          LayeredPackages: tailscale

  fedora:fedora/aarch64/coreos/next
                  Version: 39.20231119.1.0 (2023-11-20T20:07:31Z)
               BaseCommit: 14228363436e852e6525d73d949381aaf59bb47b1cf465dc1aeecff6bd6650bb
             GPGSignature: Valid signature by E8F23996F23218640CB44CBE75CF5AC418B8E74C
          LayeredPackages: tailscale
core@rpi4:~$
core@rpi4:~$
core@rpi4:~$
core@rpi4:~$ df -kh /boot/
Filesystem      Size  Used Avail Use% Mounted on
/dev/mmcblk2p3  350M  242M   86M  74% /boot
core@rpi4:~$ cd /boot/ostree/
core@rpi4:/boot/ostree$ ls
fedora-coreos-061fe0d85629f0fc67f295fbd867c050a5cd6c3bb1a90afa93ee70f6abf2613f  fedora-coreos-80c79c63ee1629d7eab5aa43ea45daede7213e5788794f394168524d46bf919b
core@rpi4:/boot/ostree$ du -sh ./*
122M    ./fedora-coreos-061fe0d85629f0fc67f295fbd867c050a5cd6c3bb1a90afa93ee70f6abf2613f
120M    ./fedora-coreos-80c79c63ee1629d7eab5aa43ea45daede7213e5788794f394168524d46bf919b
core@rpi4:/boot/ostree$ 
core@rpi4:/boot/ostree$ sudo systemctl stop zincati
core@rpi4:/boot/ostree$ sudo rpm-ostree rebase fedora-compose:
Rebasing to fedora-compose:fedora/aarch64/coreos/next
⠴ Receiving objects; 98% (6593/6714) 7.0 MB/s 292.4 MB                                                                                                                                                                                      1361 metadata, 5405 content objects fetched; 294262 KiB transferred in 61 seconds; 576.0 MB content written
Receiving objects; 98% (6593/6714) 7.0 MB/s 292.4 MB... done
Checking out tree 639b5b1... done
Enabled rpm-md repositories: fedora-cisco-openh264 updates fedora tailscale-stable updates-archive
Updating metadata for 'updates'... done
Updating metadata for 'fedora'... done
Updating metadata for 'tailscale-stable'... done
Updating metadata for 'updates-archive'... done
Importing rpm-md... done
rpm-md repo 'fedora-cisco-openh264' (cached); generated: 2023-03-14T10:57:01Z solvables: 4
rpm-md repo 'updates'; generated: 2023-12-19T01:03:31Z solvables: 14089
rpm-md repo 'fedora'; generated: 2023-11-01T00:12:29Z solvables: 61705
rpm-md repo 'tailscale-stable'; generated: 2023-12-15T19:51:27Z solvables: 90
rpm-md repo 'updates-archive'; generated: 2023-12-19T01:15:32Z solvables: 17377
Resolving dependencies... done
Will download: 1 package (29.7 MB)
Downloading from 'tailscale-stable'... done
Importing packages... done
Checking out packages... done
Running pre scripts... done
Running post scripts... done
Running posttrans scripts... done
Writing rpmdb... done
Writing OSTree commit... done
Staging deployment... done
Freed: 463.7 MB (pkgcache branches: 1)
Upgraded:
  aardvark-dns 1.8.0-1.fc39 -> 1.9.0-1.fc39
  amd-gpu-firmware 20231111-1.fc39 -> 20231211-1.fc39
  atheros-firmware 20231111-1.fc39 -> 20231211-1.fc39
  bootupd 0.2.12-2.fc39 -> 0.2.16-2.fc39
  brcmfmac-firmware 20231111-1.fc39 -> 20231211-1.fc39
  chrony 4.4-1.fc39 -> 4.5-1.fc39
  container-selinux 2:2.224.0-1.fc39 -> 2:2.226.0-1.fc39
  containerd 1.6.19-2.fc39 -> 1.6.23-2.fc39
  coreos-installer 0.18.0-1.fc39 -> 0.18.0-2.fc39
  coreos-installer-bootinfra 0.18.0-1.fc39 -> 0.18.0-2.fc39
  criu 3.18-3.fc39 -> 3.19-2.fc39
  criu-libs 3.18-3.fc39 -> 3.19-2.fc39
  curl 8.2.1-3.fc39 -> 8.2.1-4.fc39
  elfutils-default-yama-scope 0.190-1.fc39 -> 0.190-4.fc39
  elfutils-libelf 0.190-1.fc39 -> 0.190-4.fc39
  elfutils-libs 0.190-1.fc39 -> 0.190-4.fc39
  fwupd 1.9.9-1.fc39 -> 1.9.10-1.fc39
  glib2 2.78.1-1.fc39 -> 2.78.3-1.fc39
  glibc 2.38-11.fc39 -> 2.38-14.fc39
  glibc-common 2.38-11.fc39 -> 2.38-14.fc39
  glibc-gconv-extra 2.38-11.fc39 -> 2.38-14.fc39
  glibc-minimal-langpack 2.38-11.fc39 -> 2.38-14.fc39
  gnutls 3.8.2-1.fc39 -> 3.8.2-2.fc39
  grub2-common 1:2.06-109.fc39 -> 1:2.06-110.fc39
  grub2-efi-aa64 1:2.06-109.fc39 -> 1:2.06-110.fc39
  grub2-tools 1:2.06-109.fc39 -> 1:2.06-110.fc39
  grub2-tools-minimal 1:2.06-109.fc39 -> 1:2.06-110.fc39
  intel-gpu-firmware 20231111-1.fc39 -> 20231211-1.fc39
  kernel 6.6.3-200.fc39 -> 6.6.6-200.fc39
  kernel-core 6.6.3-200.fc39 -> 6.6.6-200.fc39
  kernel-modules 6.6.3-200.fc39 -> 6.6.6-200.fc39
  kernel-modules-core 6.6.3-200.fc39 -> 6.6.6-200.fc39
  libatomic 13.2.1-4.fc39 -> 13.2.1-6.fc39
  libcurl-minimal 8.2.1-3.fc39 -> 8.2.1-4.fc39
  libgcc 13.2.1-4.fc39 -> 13.2.1-6.fc39
  libnfsidmap 1:2.6.4-0.fc39 -> 1:2.6.4-0.rc2.fc39
  libsolv 0.7.25-1.fc39 -> 0.7.27-1.fc39
  libstdc++ 13.2.1-4.fc39 -> 13.2.1-6.fc39
  linux-firmware 20231111-1.fc39 -> 20231211-1.fc39
  linux-firmware-whence 20231111-1.fc39 -> 20231211-1.fc39
  mt7xxx-firmware 20231111-1.fc39 -> 20231211-1.fc39
  netavark 1.8.0-2.fc39 -> 1.9.0-1.fc39
  nfs-utils-coreos 1:2.6.4-0.fc39 -> 1:2.6.4-0.rc2.fc39
  nvidia-gpu-firmware 20231111-1.fc39 -> 20231211-1.fc39
  podman 5:4.7.2-1.fc39 -> 5:4.8.1-1.fc39
  podman-plugins 5:4.7.2-1.fc39 -> 5:4.8.1-1.fc39
  realtek-firmware 20231111-1.fc39 -> 20231211-1.fc39
  rpm-ostree 2023.10-3.fc39 -> 2023.11-1.fc39
  rpm-ostree-libs 2023.10-3.fc39 -> 2023.11-1.fc39
  rpm-sequoia 1.5.0-1.fc39 -> 1.5.0-2.fc39
  skopeo 1:1.13.3-1.fc39 -> 1:1.14.0-1.fc39
  tailscale 1.54.1-1 -> 1.56.1-1
  tpm2-tss 4.0.1-4.fc39 -> 4.0.1-6.fc39
  vim-data 2:9.0.2120-1.fc39 -> 2:9.0.2167-1.fc39
  vim-minimal 2:9.0.2120-1.fc39 -> 2:9.0.2167-1.fc39
Added:
  amd-ucode-firmware-20231211-1.fc39.noarch
  tpm2-tss-fapi-4.0.1-6.fc39.aarch64
Changes queued for next boot. Run "systemctl reboot" to start a reboot
core@rpi4:/boot/ostree$
core@rpi4:/boot/ostree$ sudo ostree admin finalize-staged
Copying /etc changes: 7 modified, 0 removed, 31 added
Insufficient space left in bootfs; updating bootloader in two steps
Bootloader updated; bootconfig swap: yes; bootversion: boot.0.1, deployment count change: -1
Bootloader updated; bootconfig swap: yes; bootversion: boot.1.1, deployment count change: 1
core@rpi4:/boot/ostree$ rpm-ostree status
State: idle
AutomaticUpdatesDriver: Zincati
  DriverState: inactive
Deployments:
  fedora-compose:fedora/aarch64/coreos/next
                  Version: 39.20231217.1.1 (2023-12-19T06:19:47Z)
               BaseCommit: 639b5b1a12b49977d98de014c82e3017f0ceb6c054b0dc53e82cdfd6c587f95e
             GPGSignature: Valid signature by E8F23996F23218640CB44CBE75CF5AC418B8E74C
                     Diff: 55 upgraded, 2 added
          LayeredPackages: tailscale

● fedora:fedora/aarch64/coreos/next
                  Version: 39.20231204.1.0 (2023-12-05T11:06:12Z)
               BaseCommit: 23ab01229a932eaefa10288dca1448cfbaf368138b163ee54e0fcac045a0001b
             GPGSignature: Valid signature by E8F23996F23218640CB44CBE75CF5AC418B8E74C
          LayeredPackages: tailscale

Notice the Insufficient space left in bootfs; updating bootloader in two steps.

@dustymabe
Copy link
Member Author

Since we're getting pretty far into the holiday season rather than try to debug this further I think the simplest thing to do is to pin the kernel (which should cause no new space in boot to get used) and figure out where the bug is after the new year.

@dustymabe
Copy link
Member Author

I think the simplest thing to do is to pin the kernel (which should cause no new space in boot to get used)

Well I think I was wrong on this front. Even with the same kernel I'm seeing two entries in the /boot filesystem and 244M (122*2) space getting used.

@dustymabe
Copy link
Member Author

I think the simplest thing to do is to pin the kernel (which should cause no new space in boot to get used)

Well I think I was wrong on this front. Even with the same kernel I'm seeing two entries in the /boot filesystem and 244M (122*2) space getting used.

on a test system this is the only difference between the two /boot/ostree/*/ directories:

[core@cosa-devsh ~]$ diff -ur 1.txt 2.txt 
--- 1.txt       2023-12-19 16:45:20.080950992 +0000
+++ 2.txt       2023-12-19 16:45:53.270974947 +0000
@@ -1,5 +1,5 @@
 311bc3ca405d4047c938237f1ecc19f5  ./.vmlinuz-6.6.6-200.fc39.aarch64.hmac
-f8404227f6951488a19a48a5bd592c1d  ./initramfs-6.6.6-200.fc39.aarch64.img
+5e2e4eeb3a7066607e3152e978072b32  ./initramfs-6.6.6-200.fc39.aarch64.img
 f2230e1d27b62f8ed08a2b916138266d  ./dtb/qcom/sc7180-trogdor-quackingstick-r0-lte.dtb
 b503e94e075498907eb6cf23d381e315  ./dtb/qcom/qcs404-evb-1000.dtb
 2646000546b573032c7e10a0d909d767  ./dtb/qcom/ipq5332-rdp468.dtb

@dustymabe
Copy link
Member Author

Interesting.. If I just install a package versus rebasing there is no new entry in /boot/ostree/ and the usage remains low:

[core@cosa-devsh ~]$ sudo rpm-ostree install htop -y 
Checking out tree 639b5b1... done
Enabled rpm-md repositories: fedora-cisco-openh264 updates fedora updates-archive
Updating metadata for 'fedora-cisco-openh264'... done
Updating metadata for 'updates'... done
Updating metadata for 'fedora'... done
Updating metadata for 'updates-archive'... done
Importing rpm-md... done
rpm-md repo 'fedora-cisco-openh264'; generated: 2023-03-14T10:57:01Z solvables: 4
rpm-md repo 'updates'; generated: 2023-12-19T01:03:31Z solvables: 14089
rpm-md repo 'fedora'; generated: 2023-11-01T00:12:29Z solvables: 61705
rpm-md repo 'updates-archive'; generated: 2023-12-19T01:15:32Z solvables: 17377
Resolving dependencies... done
Will download: 1 package (188.0_kB)
Downloading from 'fedora'... done
Importing packages... done
Checking out packages... done
Running pre scripts... done
Running post scripts... done
Running posttrans scripts... done
Writing rpmdb... done
Writing OSTree commit... done
Staging deployment... done
Added:
  htop-3.2.2-3.fc39.aarch64
Changes queued for next boot. Run "systemctl reboot" to start a reboot
[core@cosa-devsh ~]$ 
[core@cosa-devsh ~]$ sudo ostree admin finalize-staged           
Copying /etc changes: 6 modified, 0 removed, 38 added
Bootloader updated; bootconfig swap: yes; bootversion: boot.0.1, deployment count change: 1
[core@cosa-devsh ~]$ 
[core@cosa-devsh ~]$ df -kh /boot/
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda3       350M  122M  205M  38% /boot
[core@cosa-devsh ~]$ 
[core@cosa-devsh ~]$ ls /boot/ostree/  
fedora-coreos-3b245a827bf3afef5d8fa52dbd28332644377f45cfad5b7b7a8bc1ba9648d8a7

@c4rt0
Copy link
Member

c4rt0 commented Dec 19, 2023

[core@cosa-devsh ~]$ sudo ostree admin finalize-staged

Just a few days ago @jbtrystram was also hitting an issue with the Ostree-finalize-staged

@travier
Copy link
Member

travier commented Dec 19, 2023

Arg, this makes me realize that I've added the amd-ucode-firmware package to all architectures but it's only useful on x86_64.

@dustymabe
Copy link
Member Author

Arg, this makes me realize that I've added the amd-ucode-firmware package to all architectures but it's only useful on x86_64.

The package description does mention ARM SEV?

[core@cosa-devsh ~]$ rpm -qi amd-ucode-firmware 
Name        : amd-ucode-firmware
Version     : 20231211
Release     : 1.fc39
Architecture: noarch
Install Date: Mon Dec 18 12:06:56 2023
Group       : Unspecified
Size        : 241846
License     : Redistributable, no modification permitted
Signature   : RSA/SHA256, Wed Dec 13 18:45:14 2023, Key ID 75cf5ac418b8e74c
Source RPM  : linux-firmware-20231211-1.fc39.src.rpm
Build Date  : Wed Dec 13 16:48:12 2023
Build Host  : buildvm-x86-26.iad2.fedoraproject.org
Packager    : Fedora Project
Vendor      : Fedora Project
URL         : http://www.kernel.org/
Bug URL     : https://bugz.fedoraproject.org/linux-firmware
Summary     : Microcode updates for AMD CPUs
Description :
Microcode updates for AMD CPUs, ARM SEV amd TEE.

@travier
Copy link
Member

travier commented Dec 19, 2023

Oh, never mind, Jonathan correctly fixed it in coreos/fedora-coreos-config#2760

@travier
Copy link
Member

travier commented Dec 19, 2023

I think it's this SEV: https://www.amd.com/en/developer/sev.html and it's only on x86_64 AFAIK

@cgwalters
Copy link
Member

Hmm, it looks like the way we gather journals with kola, we lose all journal messages between the reboot request and the next boot. And that's where we'd potentially see more information about this error beyond what's captured by ostree-boot-complete.service.

@travier
Copy link
Member

travier commented Dec 19, 2023

The test almost fully fills the disk: https://github.com/ostreedev/ostree/pull/2847/files#diff-2122c6b56458bc3c16e273279584cd76af1974e19060bfa87eb1217f8a67b82bR30, which might explain why we missed the case where the partition is not full enough to fail right away but after the first kernel/initrd copy.

@dustymabe
Copy link
Member Author

Hmm, it looks like the way we gather journals with kola, we lose all journal messages between the reboot request and the next boot. And that's where we'd potentially see more information about this error beyond what's captured by ostree-boot-complete.service.

That would be nice to fix somehow, but I don't think that would help us much here. I reproduced this on a system and the full journal didn't give much more than this:

error: Installing kernel: Copying t6002-j375d.dtb: regfile copy: No space left on device

cgwalters added a commit to cgwalters/ostree that referenced this issue Dec 19, 2023
To aid debugging issues like coreos/fedora-coreos-tracker#1637

If we're hitting this path where we think we have enough space,
let's log what we calculated here to aid in diagnosing why we
may later fail with ENOSPC.
cgwalters added a commit to cgwalters/ostree that referenced this issue Dec 19, 2023
To aid debugging issues like coreos/fedora-coreos-tracker#1637

If we're hitting this path where we think we have enough space,
let's log what we calculated here to aid in diagnosing why we
may later fail with ENOSPC.
@cgwalters
Copy link
Member

Hmm, so we didn't see any of the other log messages there? I just did ostreedev/ostree#3123 which would be interesting to see the output of here.

(It'd still be handy to have a "continuous" stream tracking git main that we do some CI on like this, then we could just merge that PR and get relatively quick feedback too)

cgwalters added a commit to cgwalters/ostree that referenced this issue Dec 19, 2023
To aid debugging issues like coreos/fedora-coreos-tracker#1637

If we're hitting this path where we think we have enough space,
let's log what we calculated here to aid in diagnosing why we
may later fail with ENOSPC.
@dustymabe
Copy link
Member Author

Thanks!

cgwalters added a commit to cgwalters/ostree that referenced this issue Dec 20, 2023
To aid debugging issues like coreos/fedora-coreos-tracker#1637

If we're hitting this path where we think we have enough space,
let's log what we calculated here to aid in diagnosing why we
may later fail with ENOSPC.
@travier travier added the meeting topics for meetings label Dec 20, 2023
@dustymabe
Copy link
Member Author

We discussed this in the community meeting today:

dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Jan 5, 2024
Contains the fix for the corner case issue described in
coreos/fedora-coreos-tracker#1637
dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Jan 5, 2024
Contains the fix for the corner case issue described in
coreos/fedora-coreos-tracker#1637
dustymabe added a commit to coreos/fedora-coreos-config that referenced this issue Jan 5, 2024
We need this hack again to work around a new corner case
in the /boot ENOSPC wars.

See coreos/fedora-coreos-tracker#1637

(cherry picked from commit 09fbb20)
dustymabe added a commit to coreos/fedora-coreos-config that referenced this issue Jan 5, 2024
Contains the fix for the corner case issue described in
coreos/fedora-coreos-tracker#1637
dustymabe added a commit to dustymabe/fedora-coreos-streams that referenced this issue Jan 10, 2024
jlebon pushed a commit to coreos/fedora-coreos-streams that referenced this issue Jan 10, 2024
@prestist
Copy link
Contributor

This was discussed in the meeting today: see below:
AGREED: continue with the proposed mitigation/fix for this is in https://github.com/coreos/fedora-coreos-tracker/issues/1637#issuecomment-1878186381 which we are actively executing.

@prestist prestist removed the meeting topics for meetings label Jan 10, 2024
@dustymabe dustymabe added status/pending-testing-release Fixed upstream. Waiting on a testing release. status/pending-next-release Fixed upstream. Waiting on a next release. labels Jan 10, 2024
@dustymabe
Copy link
Member Author

dustymabe commented Jan 10, 2024

ostreedev/ostree#3130 merged and made it into ostree-2023.8-3.fc39.

As an extra mitigation we also removed some dtb files in coreos/fedora-coreos-config#2788 and got that in the three recent most releases (stable 39.20231204.3.3, testing 39.20240104.2.0, next 39.20240104.1.0).

@dustymabe
Copy link
Member Author

The fix for this went into next stream release 39.20240104.1.0. Please try out the new release and report issues.

@dustymabe
Copy link
Member Author

The fix for this went into testing stream release 39.20240104.2.0. Please try out the new release and report issues.

@dustymabe dustymabe added status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. and removed status/pending-testing-release Fixed upstream. Waiting on a testing release. status/pending-next-release Fixed upstream. Waiting on a next release. labels Jan 10, 2024
@dustymabe
Copy link
Member Author

The fix for this went into stable stream release 39.20240104.3.0.

@dustymabe dustymabe removed the status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. label Jan 17, 2024
dustymabe added a commit to dustymabe/fedora-coreos-streams that referenced this issue Jan 17, 2024
dustymabe added a commit to coreos/fedora-coreos-streams that referenced this issue Jan 17, 2024
@dustymabe
Copy link
Member Author

Add the barrier to stable in coreos/fedora-coreos-streams@a01649c

dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Feb 7, 2024
The software fixing coreos/fedora-coreos-tracker#1637
is now in all streams and barriers have been added. We can drop this.
jlebon pushed a commit to coreos/fedora-coreos-config that referenced this issue Feb 7, 2024
The software fixing coreos/fedora-coreos-tracker#1637
is now in all streams and barriers have been added. We can drop this.
aaradhak pushed a commit to aaradhak/fedora-coreos-config that referenced this issue Mar 18, 2024
We need this hack again to work around a new corner case
in the /boot ENOSPC wars.

See coreos/fedora-coreos-tracker#1637
aaradhak pushed a commit to aaradhak/fedora-coreos-config that referenced this issue Mar 18, 2024
Contains the fix for the corner case issue described in
coreos/fedora-coreos-tracker#1637
aaradhak pushed a commit to aaradhak/fedora-coreos-config that referenced this issue Mar 18, 2024
The software fixing coreos/fedora-coreos-tracker#1637
is now in all streams and barriers have been added. We can drop this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants