Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCI upgrade fails with Processing tar via ostree: Failed to commit tar: ExitStatus(unix_wait_status(256)) #4820

Closed
p5 opened this issue Feb 8, 2024 · 8 comments
Labels

Comments

@p5
Copy link

p5 commented Feb 8, 2024

Describe the bug

Since the upgrade from 2023.11.1 to 2024.2-2, my image-based Fedora 39 system is failing to rpm-ostree upgrade to the latest version. I do not know if this is directly related to the rpm-ostree package upgrade, or is a coincidence that it started happening on the image containing the new version.

Error:

error: Importing: Parsing layer blob sha256:dd5afed243054023811f2d6f521aaec4f3393dcf6aac3f032ebbdf7f92172989: error: ostree-tar: Processing deferred hardlink var/cache/akmods/nvidia/.last.log: Failed to find object: No such file or directory: var: Processing tar via ostree: Failed to commit tar: ExitStatus(unix_wait_status(256))

My system does have Nvidia drivers pre-packaged and included in the OCI images, which appears to be what is causing the problems.

Reproduction steps

  1. Rebase to ostree-unverified-registry:ghcr.io/rsturla/eternal-linux/lumina@sha256:3de893fb538794b4126d34170ad26885dae27ec7632da94b5db36413d0945ce2
  2. Run rpm-ostree rebase ostree-image-signed:registry:ghcr.io/rsturla/eternal-linux/lumina:39-nvidia
  3. Notice errors during rebase

Have followed these steps in a VM and it should be reproducible (although time consuming).
Please do not try these steps in a production system as there are automations in place that will change your flatpaks on boot.

Expected behavior

The upgrade should succeed without errors

Actual behavior

The upgrade failed with error: ostree-tar: Processing deferred hardlink var/cache/akmods/nvidia/.last.log: Failed to find object: No such file or directory: var: Processing tar via ostree: Failed to commit tar: ExitStatus(unix_wait_status(256))

System details

rpm-ostree db diff (between known-good and bad versions)

ostree diff commit from: rollback deployment (ccbf28bd0d8f55270636766e32034a703469a41fc0eb31208d8d440a60a79fce)
ostree diff commit to:   booted deployment (a35d537a3b45e532ee7142bed135c5ae902a46e24846c569d946d501f70fb1da)
Upgraded:
  aardvark-dns 1.9.0-1.fc39 -> 1.10.0-1.fc39
  alsa-lib 1.2.10-3.fc39 -> 1.2.11-2.fc39
  alsa-ucm 1.2.10-3.fc39 -> 1.2.11-2.fc39
  alsa-utils 1.2.10-1.fc39 -> 1.2.11-1.fc39
  docker-ce 3:25.0.2-1.fc39 -> 3:25.0.3-1.fc39
  docker-ce-cli 1:25.0.2-1.fc39 -> 1:25.0.3-1.fc39
  docker-ce-rootless-extras 25.0.2-1.fc39 -> 25.0.3-1.fc39
  efivar-libs 38-8.fc39 -> 39-1.fc39
  google-chrome-stable 121.0.6167.139-1 -> 121.0.6167.160-1
  grub2-common 1:2.06-110.fc39 -> 1:2.06-116.fc39
  grub2-efi-ia32 1:2.06-110.fc39 -> 1:2.06-116.fc39
  grub2-efi-x64 1:2.06-110.fc39 -> 1:2.06-116.fc39
  grub2-pc 1:2.06-110.fc39 -> 1:2.06-116.fc39
  grub2-pc-modules 1:2.06-110.fc39 -> 1:2.06-116.fc39
  grub2-tools 1:2.06-110.fc39 -> 1:2.06-116.fc39
  grub2-tools-minimal 1:2.06-110.fc39 -> 1:2.06-116.fc39
  inih 57-2.fc39 -> 58-1.fc39
  initscripts-service 10.19-2.fc39 -> 10.20-1.fc39
  libfprint 1.94.5-3.fc39 -> 1.94.6-1.fc39
  lmdb-libs 0.9.31-2.fc39 -> 0.9.32-1.fc39
  pipewire 1.0.2-1.fc39 -> 1.0.3-1.fc39
  pipewire-alsa 1.0.2-1.fc39 -> 1.0.3-1.fc39
  pipewire-gstreamer 1.0.2-1.fc39 -> 1.0.3-1.fc39
  pipewire-jack-audio-connection-kit 1.0.2-1.fc39 -> 1.0.3-1.fc39
  pipewire-jack-audio-connection-kit-libs 1.0.2-1.fc39 -> 1.0.3-1.fc39
  pipewire-libs 1.0.2-1.fc39 -> 1.0.3-1.fc39
  pipewire-pulseaudio 1.0.2-1.fc39 -> 1.0.3-1.fc39
  pipewire-utils 1.0.2-1.fc39 -> 1.0.3-1.fc39
  python3-rpm 4.19.1-1.fc39 -> 4.19.1-2.fc39
  rpm 4.19.1-1.fc39 -> 4.19.1-2.fc39
  rpm-build-libs 4.19.1-1.fc39 -> 4.19.1-2.fc39
  rpm-libs 4.19.1-1.fc39 -> 4.19.1-2.fc39
  rpm-ostree 2023.11-1.fc39 -> 2024.2-2.fc39
  rpm-ostree-libs 2023.11-1.fc39 -> 2024.2-2.fc39
  rpm-plugin-selinux 4.19.1-1.fc39 -> 4.19.1-2.fc39
  rpm-sign-libs 4.19.1-1.fc39 -> 4.19.1-2.fc39

rpm-ostree --version

rpm-ostree:
 Version: '2024.2'
 Git: 3d9a8755ddd96395a5c1d02b42243eb54ea01193
 Features:
  - rust
  - compose
  - container
  - fedora-integration

Additional information

Error log:

rpm-ostree update
note: automatic updates (stage) are enabled
Pulling manifest: ostree-image-signed:docker://ghcr.io/rsturla/eternal-linux/lumina:39-nvidia
Importing: ostree-image-signed:docker://ghcr.io/rsturla/eternal-linux/lumina:39-nvidia (digest: sha256:aa3d48058eb601e9700695579758fd4da70e1eff72f90db6173f898c9981630e)
ostree chunk layers already present: 65
custom layers already present: 8
custom layers needed: 24 (1.6 GB)
error: Importing: Parsing layer blob sha256:dd5afed243054023811f2d6f521aaec4f3393dcf6aac3f032ebbdf7f92172989: error: ostree-tar: Processing deferred hardlink var/cache/akmods/nvidia/.last.log: Failed to find object: No such file or directory: var: Processing tar via ostree: Failed to commit tar: ExitStatus(unix_wait_status(256))
@p5
Copy link
Author

p5 commented Feb 8, 2024

And if it helps to identify the issue, I am able to successfully rollback to the pinned deployment from 4 or 5 days ago and upgrade to today's image, but cannot upgrade from yesterday to today's image.
The db diff is posted in the "System details" section of the issue.

@jmarrero
Copy link
Member

jmarrero commented Feb 8, 2024

Can you share the Dockerfile or the lines installing the nvidia akmod packages?

@p5
Copy link
Author

p5 commented Feb 8, 2024

Installing the Nvidia akmods is done in this Dockerfile:
https://github.com/rsturla/eternal-main/blob/main/Containerfile.nvidia

And building the akmod package is done from this one:
https://github.com/rsturla/akmods/blob/main/nvidia/Containerfile

Please see the rsturla/akmods//nvidia/scripts directory for the build and install scripts for both Dockerfiles.

Neither repos have had changes for weeks, which is why I am looking at the dependencies that were different between the good and bad images.

@cgwalters cgwalters added bug container-native triaged This issue was triaged labels Feb 8, 2024
@cgwalters
Copy link
Member

Hmm, this may be a regression from ostreedev/ostree-rs-ext#569

@cgwalters
Copy link
Member

OK yeah, digging into your image the problem here is:

6481d0e877ab58ab3ec0fe15705b851cdc5b96e0860f1f5f722cfbf810fa2447.txt:drwxr-xr-x 981/973           0 2024-02-08 07:18 var/cache/akmods/
6481d0e877ab58ab3ec0fe15705b851cdc5b96e0860f1f5f722cfbf810fa2447.txt:drwxr-xr-x 0/0               0 2024-02-08 07:18 var/cache/akmods/nvidia/
6481d0e877ab58ab3ec0fe15705b851cdc5b96e0860f1f5f722cfbf810fa2447.txt:-rw-r--r-- 0/0         8666839 2024-02-08 07:18 var/cache/akmods/nvidia/.last.log
6481d0e877ab58ab3ec0fe15705b851cdc5b96e0860f1f5f722cfbf810fa2447.txt:hrw-r--r-- 0/0               0 2024-02-08 07:18 var/cache/akmods/nvidia/545.29.06-3-for-6.7.3-200.fc39.x86_64.log link to var/cache/akmods/nvidia/.last.log
6481d0e877ab58ab3ec0fe15705b851cdc5b96e0860f1f5f722cfbf810fa2447.txt:-rw-r--r-- 0/0        44630425 2024-02-08 07:18 var/cache/akmods/nvidia/kmod-nvidia-6.7.3-200.fc39.x86_64-545.29.06-3.fc39.x86_64.rpm
6481d0e877ab58ab3ec0fe15705b851cdc5b96e0860f1f5f722cfbf810fa2447.txt:-rw-r--r-- 0/0             244 2024-02-08 07:18 var/cache/akmods/nvidia-vars

Note the hardlink.

A simple minimal reproducer is:

FROM quay.io/fedora/fedora-coreos:stable
RUN mkdir -p /var/lib && echo hello world > /var/lib/foo && ln /var/lib/foo{,2}

So...technically this is an ostree-ext bug and it is almost certainly a regression from ostreedev/ostree-rs-ext#569

However, there's a simple workaround (and arguably what you want anyways): rm /var/cache/akmods -rf...ah, which I think you are trying to do around here https://github.com/rsturla/eternal-main/blob/124d16308317c76af3d2914bc4741974b35323a4/Containerfile.nvidia#L27
except the problem is you likely aren't doing podman build --squash so you're just carrying around the dead weight from the COPY above.

@cgwalters
Copy link
Member

Moving this one to ostreedev/ostree-rs-ext#598

@p5
Copy link
Author

p5 commented Feb 8, 2024

Thank you for looking into this issue. For now I'll be switching to Buildah to allow the squashing.
It's clear I don't fully understand Docker layers and I wouldn't have ran into this issue if I optimised them correctly.

Looking into each individual file and layers they are managed by is a good way to debug. I'll be doing that from now on :)

If you've moved the issue to the ostree-rs-ext repo, I'm happy for this issue to be closed.

@cgwalters
Copy link
Member

I believe this is fixed now, please reopen if not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants