cannot checkpoint a vnc server container #2459

coldbloodx · 2024-08-01T07:07:20Z

Description
vnc server container cannot be checkpointed.

Steps to reproduce the issue:

create a vnc server image with ubuntu 24.04 base image just install below packages:
apt install -y xfce4 xfce4-session xfce4-terminal tightvncserver xauth
init vnc server related files, password and xstartup script, here is my xstart script

[root@laworks .vnc]# cat xstartup
#!/bin/sh

xrdb "$HOME/.Xresources"
xsetroot -solid grey
x-terminal-emulator -geometry 80x24+10+10 -ls -title "$VNCDESKTOP Desktop" &
x-window-manager &
# Fix to make GNOME work
export XKL_XMODMAP_DISABLE=1
/etc/X11/Xsession

3.run container with below docker command:

[root@laworks ~]# docker run -d -v /root:/root -e USER=root  --net=host  --name cstest ubuntuvnc:v1 bash -c 'vncserver; tail -f /dev/null' 
 [root@laworks ~]# docker ps
CONTAINER ID   IMAGE          COMMAND                  CREATED       STATUS       PORTS     NAMES
3c0043e01e67   ubuntuvnc:v1   "bash -c 'vncserver;…"   3 hours ago   Up 3 hours             cstest

connect to the vncserver by vncviewer, open a terminal in the viewer, then run below commd

Describe the results you received:
checkpoint above vncserver container, get error like below

[root@laworks ~]# docker checkpoint create cstest cp001
Error response from daemon: Cannot checkpoint container cstest: runc did not terminate successfully: exit status 1: criu failed: type NOTIFY errno 0 path= /run/containerd/io.containerd.runtime.v2.task/moby/3c0043e01e6729d2a9745d26bbc2a3baa3faddb4c169cb217d777dc24fe868d0/criu-dump.log: unknown

check criu-dump.log

(00.392350) sockets: Searching for socket 0x79317 family 1
(00.392376) sockets: No filter for socket
(00.392378) unix: Dumping unix socket at 6
(00.392379) unix:       Dumping: ino 496407 peer_ino 0 family    1 type    1 state 10 name /root/.cache/ibus/dbus-mHfDLJSx
(00.392384) unix:       Dumped: id 0x29 ino 496407 peer 0 type 1 state 10 name 32 bytes
(00.392394) 77977 fdinfo 7: pos:                0 flags:                2/0x1
(00.392402) Error (criu/files-ext.c:94): Can't dump file 7 of that type [600] (anon anon_inode:[pidfd])   ---> this line.
(00.392410) ----------------------------------------
(00.392424) Error (criu/cr-dump.c:1674): Dump files (pid: 77977) failed with -1
(00.392434) Waiting for 77977 to trap
(00.392453) Daemon 77977 exited trapping
(00.392459) Sent msg to daemon 3 0 0

full logs will be attached in this ticket.

Describe the results you expected:
container should be checkpointed successfully.

Additional information you deem important (e.g. issue happens only occasionally):
below are process info of my vncserver container.
I've tried this case several times, and got none succeed with docker.

[root@laworks ~]# ps -ewf |grep shim
root       77840       1  0 11:53 ?        00:00:01 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 3c0043e01e6729d2a9745d26bbc2a3baa3faddb4c169cb217d777dc24fe868d0 -address /run/containerd/containerd.sock
root       97514   64766  0 14:34 pts/2    00:00:00 grep --color=auto shim
[root@laworks ~]# pstree -cups 77840
systemd(1)───containerd-shim(77840)─┬─tail(77860)─┬─Xtightvnc(77884)
                                    │             ├─ibus-daemon(77977)─┬─ibus-dconf(77981)─┬─{ibus-dconf}(77986)
                                    │             │                    │                   ├─{ibus-dconf}(77987)
                                    │             │                    │                   ├─{ibus-dconf}(77989)
                                    │             │                    │                   └─{ibus-dconf}(77991)
                                    │             │                    ├─ibus-engine-sim(78023)─┬─{ibus-engine-sim}(78024)
                                    │             │                    │                        ├─{ibus-engine-sim}(78025)
                                    │             │                    │                        └─{ibus-engine-sim}(78026)
                                    │             │                    ├─ibus-extension-(77983)─┬─{ibus-extension-}(77997)
                                    │             │                    │                        ├─{ibus-extension-}(78001)
                                    │             │                    │                        ├─{ibus-extension-}(78004)
                                    │             │                    │                        └─{ibus-extension-}(78013)
                                    │             │                    ├─ibus-ui-gtk3(77982)─┬─{ibus-ui-gtk3}(77998)
                                    │             │                    │                     ├─{ibus-ui-gtk3}(78000)
                                    │             │                    │                     ├─{ibus-ui-gtk3}(78006)
                                    │             │                    │                     ├─{ibus-ui-gtk3}(78011)
                                    │             │                    │                     └─{ibus-ui-gtk3}(78015)
                                    │             │                    ├─{ibus-daemon}(77978)
                                    │             │                    ├─{ibus-daemon}(77979)
                                    │             │                    └─{ibus-daemon}(77990)
                                    │             ├─ibus-x11(77985)─┬─{ibus-x11}(77996)
                                    │             │                 ├─{ibus-x11}(77999)
                                    │             │                 └─{ibus-x11}(78005)
                                    │             ├─x-window-manage(77894)
                                    │             ├─xfce4-terminal(77893)─┬─bash(78020)───sleep(97530)
                                    │             │                       ├─{xfce4-terminal}(77914)
                                    │             │                       ├─{xfce4-terminal}(77915)
                                    │             │                       ├─{xfce4-terminal}(77944)
                                    │             │                       └─{xfce4-terminal}(78019)
                                    │             └─xstartup(77890)
                                    ├─{containerd-shim}(77841)
                                    ├─{containerd-shim}(77842)
                                    ├─{containerd-shim}(77843)
                                    ├─{containerd-shim}(77844)
                                    ├─{containerd-shim}(77845)
                                    ├─{containerd-shim}(77846)
                                    ├─{containerd-shim}(77847)
                                    ├─{containerd-shim}(77848)
                                    ├─{containerd-shim}(77866)
                                    ├─{containerd-shim}(78098)
                                    └─{containerd-shim}(81087)

CRIU logs and information:

CRIU full dump/restore logs:

[criu-dump.log](https://github.com/user-attachments/files/16451457/criu-dump.log)

Output of `criu --version`:

[root@laworks ~]# criu --version
Version: 3.19

Output of `criu check --all`:

[root@laworks ~]# criu check --all
Warn  (criu/cr-check.c:1346): Nftables based locking requires libnftables and set concatenations support
Looks good but some kernel features are missing
which, depending on your process tree, may cause
dump or restore failure.

Additional environment details:

The text was updated successfully, but these errors were encountered:

adrianreber · 2024-08-01T07:13:10Z

This sounds like it might be solved with #2449.

coldbloodx · 2024-08-01T07:21:12Z

This sounds like it might be solved with #2449.

checked #2449, it is still in draft state, when will the fix be merged? I'd like to try it immediately.

adrianreber · 2024-08-01T07:28:22Z

This sounds like it might be solved with #2449.

checked #2449, it is still in draft state when will the fix be merged? I'd like to try it immediately.

This is almost impossible to predict. But you can try it already now and build CRIU yourself with the patches from #2449 applied to see if it actually solves your problem.

coldbloodx · 2024-08-01T07:33:51Z

@adrianreber
Thanks bro, I'll try it later

Fixes: checkpoint-restore#2258 checkpoint-restore#2459 Signed-off-by: Bhavik Sachdev <[email protected]>

Process file descriptors (pidfds) were introduced to provide a stable handle on a process. They solve the problem of pid recycling. For a detailed explanation, see https://lwn.net/Articles/801319/ and http://www.corsix.org/content/what-is-a-pidfd Before Linux 6.9, anonymous inodes were used for the implementation of pidfds. So, we detect them in a fashion similiar to other fd types that use anonymous inodes by calling `readlink()`. After 6.9, pidfs (a file system for pidfds) was introduced. After this change, pidfs inodes have no file type in st_mode in userspace. (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/pidfs.c?h=v6.11-rc2#n285) We use `PID_FS_MAGIC` to detect pidfds for kernel >= 6.9 For pidfds that refer to dead processes, we lose the pid of the process as the Pid and NSpid fields in /proc/<pid>/fdinfo/<pidfd> change to -1. So, we create a temporary process for each unique inode and open pidfds that refer to this process. After all pidfds have been opened we kill this temporary process. Fixes: checkpoint-restore#2258 checkpoint-restore#2459 Signed-off-by: Bhavik Sachdev <[email protected]>

Process file descriptors (pidfds) were introduced to provide a stable handle on a process. They solve the problem of pid recycling. For a detailed explanation, see https://lwn.net/Articles/801319/ and http://www.corsix.org/content/what-is-a-pidfd Before Linux 6.9, anonymous inodes were used for the implementation of pidfds. So, we detect them in a fashion similiar to other fd types that use anonymous inodes by calling `readlink()`. After 6.9, pidfs (a file system for pidfds) was introduced. In 6.9 `S_ISREG()` returned true for pidfds, but this again changed with 6.10. (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/pidfs.c?h=v6.11-rc2#n285) After this change, pidfs inodes have no file type in st_mode in userspace. We use `PID_FS_MAGIC` to detect pidfds for kernel >= 6.9 Hence, check for pidfds occurs before the check for regular files. For pidfds that refer to dead processes, we lose the pid of the process as the Pid and NSpid fields in /proc/<pid>/fdinfo/<pidfd> change to -1. So, we create a temporary process for each unique inode and open pidfds that refer to this process. After all pidfds have been opened we kill this temporary process. Fixes: checkpoint-restore#2258 checkpoint-restore#2459 Signed-off-by: Bhavik Sachdev <[email protected]>

Process file descriptors (pidfds) were introduced to provide a stable handle on a process. They solve the problem of pid recycling. For a detailed explanation, see https://lwn.net/Articles/801319/ and http://www.corsix.org/content/what-is-a-pidfd Before Linux 6.9, anonymous inodes were used for the implementation of pidfds. So, we detect them in a fashion similiar to other fd types that use anonymous inodes by calling `readlink()`. After 6.9, pidfs (a file system for pidfds) was introduced. In 6.9 `S_ISREG()` returned true for pidfds, but this again changed with 6.10. (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/pidfs.c?h=v6.11-rc2#n285) After this change, pidfs inodes have no file type in st_mode in userspace. We use `PID_FS_MAGIC` to detect pidfds for kernel >= 6.9 Hence, check for pidfds occurs before the check for regular files. For pidfds that refer to dead processes, we lose the pid of the process as the Pid and NSpid fields in /proc/<pid>/fdinfo/<pidfd> change to -1. So, we create a temporary process for each unique inode and open pidfds that refer to this process. After all pidfds have been opened we kill this temporary process. This commit does not include support for pidfds that point to a specific thread, i.e pidfds opened with `PIDFD_THREAD` flag. Fixes: checkpoint-restore#2258 checkpoint-restore#2459 Signed-off-by: Bhavik Sachdev <[email protected]>

github-actions · 2024-09-01T00:57:30Z

A friendly reminder that this issue had no activity for 30 days.

coldbloodx · 2024-09-14T09:13:39Z

already successfully dump/restore GUI applications on a remote vnc icewm desktop.

Process file descriptors (pidfds) were introduced to provide a stable handle on a process. They solve the problem of pid recycling. For a detailed explanation, see https://lwn.net/Articles/801319/ and http://www.corsix.org/content/what-is-a-pidfd Before Linux 6.9, anonymous inodes were used for the implementation of pidfds. So, we detect them in a fashion similiar to other fd types that use anonymous inodes by calling `readlink()`. After 6.9, pidfs (a file system for pidfds) was introduced. In 6.9 `S_ISREG()` returned true for pidfds, but this again changed with 6.10. (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/pidfs.c?h=v6.11-rc2#n285) After this change, pidfs inodes have no file type in st_mode in userspace. We use `PID_FS_MAGIC` to detect pidfds for kernel >= 6.9 Hence, check for pidfds occurs before the check for regular files. For pidfds that refer to dead processes, we lose the pid of the process as the Pid and NSpid fields in /proc/<pid>/fdinfo/<pidfd> change to -1. So, we create a temporary process for each unique inode and open pidfds that refer to this process. After all pidfds have been opened we kill this temporary process. This commit does not include support for pidfds that point to a specific thread, i.e pidfds opened with `PIDFD_THREAD` flag. Fixes: checkpoint-restore#2258 checkpoint-restore#2459 Signed-off-by: Bhavik Sachdev <[email protected]>

bsach64 added a commit to bsach64/criu that referenced this issue Aug 3, 2024

criu: Support C/R of pidfds

ede55ce

Fixes: checkpoint-restore#2258 checkpoint-restore#2459 Signed-off-by: Bhavik Sachdev <[email protected]>

bsach64 added a commit to bsach64/criu that referenced this issue Aug 8, 2024

criu: Support C/R of pidfds

8bc6449

Fixes: checkpoint-restore#2258 checkpoint-restore#2459 Signed-off-by: Bhavik Sachdev <[email protected]>

github-actions bot added the stale-issue label Sep 1, 2024

coldbloodx closed this as completed Sep 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cannot checkpoint a vnc server container #2459

cannot checkpoint a vnc server container #2459

coldbloodx commented Aug 1, 2024

adrianreber commented Aug 1, 2024

coldbloodx commented Aug 1, 2024 •

edited

Loading

adrianreber commented Aug 1, 2024

coldbloodx commented Aug 1, 2024

github-actions bot commented Sep 1, 2024

coldbloodx commented Sep 14, 2024

cannot checkpoint a vnc server container #2459

cannot checkpoint a vnc server container #2459

Comments

coldbloodx commented Aug 1, 2024

adrianreber commented Aug 1, 2024

coldbloodx commented Aug 1, 2024 • edited Loading

adrianreber commented Aug 1, 2024

coldbloodx commented Aug 1, 2024

github-actions bot commented Sep 1, 2024

coldbloodx commented Sep 14, 2024

coldbloodx commented Aug 1, 2024 •

edited

Loading