Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors encountered when upgrading incus with sudo apt upgrade #997

Closed
1 task done
vikrantrathore opened this issue Jul 15, 2024 · 6 comments
Closed
1 task done
Labels
Incomplete Waiting on more information from reporter

Comments

@vikrantrathore
Copy link

vikrantrathore commented Jul 15, 2024

Required information

  • Distribution: Ubuntu
  • Distribution version: 24.04.4
  • The output of "incus info" (it faled after upgrade, ork after resart):
driver: lxc | qemu
 driver_version: 6.0.1 | 9.0.1
 firewall: nftables
 kernel: Linux
 kernel_architecture: x86_64
 kernel_features:
   idmapped_mounts: "true"
   netnsid_getifaddrs: "true"
   seccomp_listener: "true"
   seccomp_listener_continue: "true"
   uevent_injection: "true"
   unpriv_binfmt: "false"
   unpriv_fscaps: "true"
 kernel_version: 5.15.0-113-generic
 lxc_features:
   cgroup2: "true"
   core_scheduling: "true"
   devpts_fd: "true"
   idmapped_mounts_v2: "true"
   mount_injection_file: "true"
   network_gateway_device_route: "true"
   network_ipvlan: "true"
   network_l2proxy: "true"
   network_phys_macvlan_mtu: "true"
   network_veth_router: "true"
   pidfd: "true"
   seccomp_allow_deny_syntax: "true"
   seccomp_notify: "true"
   seccomp_proxy_send_notify_fd: "true"
 os_name: Ubuntu
 os_version: "22.04"
 project: default
 server: incus
 server_clustered: true
 server_event_mode: full-mesh
 server_name: insan02
 server_pid: 1407
 server_version: "6.3"
 storage: btrfs
 storage_version: 5.16.2
 storage_supported_drivers:
 - name: dir
   version: "1"
   remote: false
 - name: lvm
   version: 2.03.11(2) (2021-01-08) / 1.02.175 (2021-01-08) / 4.45.0
   remote: false
 - name: lvmcluster
   version: 2.03.11(2) (2021-01-08) / 1.02.175 (2021-01-08) / 4.45.0
   remote: true
 - name: btrfs
   version: 5.16.2
   remote: false

Issue description

Incus upgrade fails on updating it with sudo apt upgrade on Ubuntu. This problem persisted with every upgrade. The incus is upgraded from zabbly apt repositories.

Steps to reproduce

  1. Run sudo apt upgrade
  2. After waiting for substantial amount of time shows the following error
See "systemctl status incus.service" and "journalctl -xeu incus.service" for details.
dpkg: error processing package incus-base (--configure):
 installed incus-base package post-installation script subprocess returned error exit status 1
dpkg: dependency problems prevent configuration of incus:
 incus depends on incus-base (= 1:6.3-202407130507-ubuntu22.04); however:
  Package incus-base is not configured yet.

dpkg: error processing package incus (--configure):
 dependency problems - leaving unconfigured
  1. Run sudo apt upgrade again and it installs the upgrade .
  2. Restart the machine to make incus work again.

Information to attach

  • Any relevant kernel output (dmesg)
[3013479.164106] systemd[1]: systemd 249.11-0ubuntu3.12 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
[3013479.183788] systemd[1]: Detected architecture x86-64.
[3013479.252691] systemd[1]: Configuration file /run/systemd/system/netplan-ovs-cleanup.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
[3013479.344532] systemd[1]: Stopping Journal Service...
[3013479.344562] systemd-journald[554]: Received SIGTERM from PID 1 (systemd).
[3013479.345022] systemd[1]: Stopping Open Virtual Network host control daemon...
[3013479.345496] systemd[1]: Stopping Open Virtual Network central control daemon...
[3013479.346252] systemd[1]: Stopping Open vSwitch Record Hostname...
[3013479.346441] systemd[1]: Stopping PackageKit Daemon...
[3013479.346457] systemd[1]: systemd-networkd-wait-online.service: Deactivated successfully.
[3013479.346593] systemd[1]: Stopped Wait for Network to be Configured.
[3013479.346669] systemd[1]: Stopping Wait for Network to be Configured...
[3013479.346825] systemd[1]: Stopping Network Configuration...
[3013479.347965] systemd[1]: ovs-record-hostname.service: Deactivated successfully.
[3013479.348254] systemd[1]: Stopped Open vSwitch Record Hostname.
[3013479.348684] systemd[1]: Stopping Network Name Resolution...
[3013479.349172] systemd[1]: Stopping Network Time Synchronization...
[3013479.353163] systemd[1]: Stopping Disk Manager...
[3013479.353304] systemd[1]: Stopping Daemon for power management...
[3013479.354235] systemd[1]: upower.service: Deactivated successfully.
[3013479.354527] systemd[1]: Stopped Daemon for power management.
[3013479.354551] systemd[1]: upower.service: Consumed 8min 43.099s CPU time.
[3013479.356240] systemd[1]: Starting Daemon for power management...
[3013479.361890] systemd[1]: systemd-journald.service: Deactivated successfully.
[3013479.362318] systemd[1]: Stopped Journal Service.
[3013479.362384] systemd[1]: systemd-journald.service: Consumed 9min 57.595s CPU time.
[3013479.364488] systemd[1]: Starting Journal Service...
[3013479.367608] systemd[1]: packagekit.service: Deactivated successfully.
[3013479.367955] systemd[1]: Stopped PackageKit Daemon.
[3013479.367997] systemd[1]: packagekit.service: Consumed 28.303s CPU time.
[3013479.369102] systemd[1]: Starting PackageKit Daemon...
[3013479.376547] systemd[1]: systemd-timesyncd.service: Deactivated successfully.
[3013479.376872] systemd[1]: Stopped Network Time Synchronization.
[3013479.376916] systemd[1]: systemd-timesyncd.service: Consumed 7.831s CPU time.
[3013479.377695] systemd[1]: systemd-resolved.service: Deactivated successfully.
[3013479.378005] systemd[1]: Stopped Network Name Resolution.
[3013479.378031] systemd[1]: systemd-resolved.service: Consumed 17.220s CPU time.
[3013479.378715] systemd[1]: udisks2.service: Deactivated successfully.
[3013479.379023] systemd[1]: Stopped Disk Manager.
[3013479.379045] systemd[1]: udisks2.service: Consumed 4.704s CPU time.
[3013479.380791] systemd[1]: Starting Network Time Synchronization...
[3013479.381756] systemd[1]: Starting Disk Manager...
[3013479.381980] systemd[1]: Started Journal Service.
[3013479.783673] No such timeout policy "ovs_test_tp"
[3013479.783676] Failed to associated timeout policy `ovs_test_tp'
@stgraber
Copy link
Member

Hmm, so this is confusing. You're saying this is 24.04 but then all logs point to the system being 22.04. You're also reporting that the upgrade failed and hung but incus is running and correctly reporting the version as 6.3?

What happens if you do apt dist-upgrade again or potentially dpkg --configure -a if the former is failing.

@stgraber stgraber added the Incomplete Waiting on more information from reporter label Jul 15, 2024
@vikrantrathore
Copy link
Author

No Probably wrote by mistake its 22.04.4, as mentioned in the issue when I run it again it works. Issue is this problem comes every time an upgrade is done remotely using ansible and brings down the whole cluster. I am able to upgrade and then after restart incus works fine.

@stgraber
Copy link
Member

Ah, it's a cluster upgrade, for clusters you must always update all servers at the same time otherwise the first server to update will notice it's ahead of the others and will hang there waiting for the rest to match its version before continuing with its startup.

@knutov
Copy link

knutov commented Jul 15, 2024

I'm wondering too, how exactly to do upgrade of cluster right?

@stgraber
Copy link
Member

@stgraber
Copy link
Member

I've been upgrading production clusters first on LXD and now on Incus for the past 5-6 years and never had an issue so long as you do make sure that everything is clean in incus cluster list prior to the upgrade and you make sure that all servers are updating at the same time.

As mentioned in the documentation, the servers will basically check a stable database table (one we can never change the schema of) to compare their own DB and API version with the rest of the clusters, if they notice they're behind, they'll refuse to start, if they notice they're ahead, they'll enter a loop waiting for all other servers to reach the same version.

Then as soon as all servers reach the same DB and API version, the startup sequence continues on all servers at the same time. The leader then goes on to apply any schema updates needed and the remaining servers perform any local data migration needed, then the cluster API becomes available to users again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Incomplete Waiting on more information from reporter
Development

No branches or pull requests

3 participants