Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tmpcopyup runc-specific extension prevents container from starting up #32

Closed
vrubiolo opened this issue Feb 10, 2021 · 30 comments
Closed

Comments

@vrubiolo
Copy link

My systemd-based LXC container fails to start with the following error:

lxc ac70e773b77df4a731f1c5d420baaa0c247515b3e696b8c4385f6cbe6a596b68 20210210222236.599 DEBUG    storage - storage/storage.c:storage_query:233 - Detected rootfs type "dir"
lxc ac70e773b77df4a731f1c5d420baaa0c247515b3e696b8c4385f6cbe6a596b68 20210210222236.601 DEBUG    conf - conf.c:lxc_mount_rootfs:1262 - Mounted rootfs "/var/lib/containers/storage/overlay/1164bb7a6218ebd528572647a87e0335ca248c2e363c75da2d250f85072e1cf4/merged" onto "/usr/local/lib/lxc/rootfs" with options "(null)"
lxc ac70e773b77df4a731f1c5d420baaa0c247515b3e696b8c4385f6cbe6a596b68 20210210222236.618 DEBUG    conf - conf.c:mount_entry:2009 - Mounted "proc" on "/usr/local/lib/lxc/rootfs//proc" with filesystem type "proc"
lxc ac70e773b77df4a731f1c5d420baaa0c247515b3e696b8c4385f6cbe6a596b68 20210210222236.625 DEBUG    conf - conf.c:mount_entry:2009 - Mounted "tmpfs" on "/usr/local/lib/lxc/rootfs//dev" with filesystem type "tmpfs"
lxc ac70e773b77df4a731f1c5d420baaa0c247515b3e696b8c4385f6cbe6a596b68 20210210222236.656 DEBUG    conf - conf.c:mount_entry:2009 - Mounted "devpts" on "/usr/local/lib/lxc/rootfs//dev/pts" with filesystem type "devpts"
lxc ac70e773b77df4a731f1c5d420baaa0c247515b3e696b8c4385f6cbe6a596b68 20210210222236.681 DEBUG    conf - conf.c:mount_entry:2009 - Mounted "mqueue" on "/usr/local/lib/lxc/rootfs//dev/mqueue" with filesystem type "mqueue"
lxc ac70e773b77df4a731f1c5d420baaa0c247515b3e696b8c4385f6cbe6a596b68 20210210222236.685 DEBUG    conf - conf.c:mount_entry:2009 - Mounted "sysfs" on "/usr/local/lib/lxc/rootfs//sys" with filesystem type "sysfs"
lxc ac70e773b77df4a731f1c5d420baaa0c247515b3e696b8c4385f6cbe6a596b68 20210210222236.740 ERROR    utils - utils.c:safe_mount:1197 - Invalid argument - Failed to mount "tmpfs" onto "/usr/local/lib/lxc/rootfs//run"
lxc ac70e773b77df4a731f1c5d420baaa0c247515b3e696b8c4385f6cbe6a596b68 20210210222236.742 ERROR    conf - conf.c:mount_entry:1940 - Invalid argument - Failed to mount "tmpfs" on "/usr/local/lib/lxc/rootfs//run"
lxc ac70e773b77df4a731f1c5d420baaa0c247515b3e696b8c4385f6cbe6a596b68 20210210222236.743 ERROR    conf - conf.c:lxc_setup:3338 - Failed to setup mount entries
lxc ac70e773b77df4a731f1c5d420baaa0c247515b3e696b8c4385f6cbe6a596b68 20210210222236.744 ERROR    start - start.c:do_start:1267 - Failed to setup container "ac70e773b77df4a731f1c5d420baaa0c247515b3e696b8c4385f6cbe6a596b68"
lxc ac70e773b77df4a731f1c5d420baaa0c247515b3e696b8c4385f6cbe6a596b68 20210210222236.757 ERROR    sync - sync.c:__sync_wait:36 - An error occurred in another process (expected sequence number 5)
lxc ac70e773b77df4a731f1c5d420baaa0c247515b3e696b8c4385f6cbe6a596b68 20210210222236.765 ERROR    start - start.c:__lxc_start:2082 - Failed to spawn container "ac70e773b77df4a731f1c5d420baaa0c247515b3e696b8c4385f6cbe6a596b68"
lxc ac70e773b77df4a731f1c5d420baaa0c247515b3e696b8c4385f6cbe6a596b68 20210210222236.766 WARN     start - start.c:lxc_abort:1012 - No such process - Failed to send SIGKILL via pidfd 17 for process 511687

Looking at the LXC config which is generated, I see the following:

lxc.mount.entry = tmpfs /var/lib/containers/storage/overlay/b302f8e1e4ccdd5a182653c2664f57f00fdf0f3544af7026848e8844944d98d7/merged/run tmpfs rw,rprivate,noexec,nosuid,nodev,tmpcopyup,create=dir
lxc.mount.entry = tmpfs /var/lib/containers/storage/overlay/b302f8e1e4ccdd5a182653c2664f57f00fdf0f3544af7026848e8844944d98d7/merged/run/lock tmpfs rw,rprivate,noexec,nosuid,nodev,tmpcopyup,create=dir
lxc.mount.entry = tmpfs /var/lib/containers/storage/overlay/b302f8e1e4ccdd5a182653c2664f57f00fdf0f3544af7026848e8844944d98d7/merged/tmp tmpfs rw,rprivate,noexec,nosuid,nodev,tmpcopyup,create=dir
lxc.mount.entry = tmpfs /var/lib/containers/storage/overlay/b302f8e1e4ccdd5a182653c2664f57f00fdf0f3544af7026848e8844944d98d7/merged/var/log/journal tmpfs rw,rprivate,noexec,nosuid,nodev,tmpcopyup,create=dir

It looks like the tmpcopyup mount option is a non-OCI runc extension and this is what is making the mount fail:

If that is true, would there be a workaround? Are you running systemd-based containers on your cluster?

Here are the logs (this is a shareable testcase, I can provide more if needed, including the rootfs
02.10_22.36.27.zip

Edit: I see tmpcopyup is also implemeneted by crun

@r10r
Copy link
Contributor

r10r commented Feb 11, 2021

Thanks for the report. Unfortunately the dump is missing the output for the failing container ac70e773b77df4a731f1c5d420baaa0c247515b3e696b8c4385f6cbe6a596b68

@r10r
Copy link
Contributor

r10r commented Feb 11, 2021

Can you please attach the dump containing the output for container ac70e773b77df4a731f1c5d420baaa0c247515b3e696b8c4385f6cbe6a596b68 thanks.
Or can just reproduce it and create another dump.

@vrubiolo
Copy link
Author

vrubiolo commented Feb 11, 2021

Can you please attach the dump containing the output for container ac70e773b77df4a731f1c5d420baaa0c247515b3e696b8c4385f6cbe6a596b68 thanks.
Or can just reproduce it and create another dump.

Sorry, I had put the output of one container and given you a dump of another (also failing). Let me redo it and redump.

Edit: done. new logs uploaded + issue quotes updated as well

@r10r
Copy link
Contributor

r10r commented Feb 11, 2021

Thanks! I'll take a look at it.

@vrubiolo
Copy link
Author

Thanks! I'll take a look at it.

Thank you. I'm looking forward to your analysis, esp for a workaround as this is blocking my deployments right now.

@r10r
Copy link
Contributor

r10r commented Feb 11, 2021

Please try the branch https://github.com/Drachenfels-GmbH/crio-lxc/tree/mount-option-filtering

NOTE: Since the branch is based on the dev branch you also have to rebuild liblxc from upstream master.

@r10r
Copy link
Contributor

r10r commented Feb 11, 2021

@vrubiolo Can you share your kubernetes deployment file ? I want to find out what triggers cri-o to set these mount options.

@vrubiolo
Copy link
Author

@vrubiolo Can you share your kubernetes deployment file ? I want to find out what triggers cri-o to set these mount options.

I am actually not using a deployment file per se, just running via kubectl run --restart=Never --image=localhost/selftest rp-selftest

Let me know if you want a full dump of the manifest once it is running.

@r10r
Copy link
Contributor

r10r commented Feb 11, 2021

Let me know if you want a full dump of the manifest once it is running.

Yes please!

@r10r
Copy link
Contributor

r10r commented Feb 11, 2021

Did you try the fixed branch ?

@vrubiolo
Copy link
Author

Did you try the fixed branch ?

Not yet, I should be able to do that today.

@vrubiolo
Copy link
Author

Here is the YAML output of kubectl get all, let me know if you want more information:
get-all-yaml.zip

@r10r
Copy link
Contributor

r10r commented Feb 11, 2021

Thanks. At first sight this doesn't reveal any further information.
I think cri-o is doing some magic init command detection for the container.command here.
I'll see if I can set up a test case to verify that.

Btw. updating to the mount-option-filtering branch is really easy:

  • Checkout lxc from upstream master, build and install
  • Checkout the mount-option-filtering branch build and install (ignore complaints about crio-lxc-start being in use, only the crio-lxc binary must be updated)

After that you're done. No need to stop crio or shutdown the cluster.

@vrubiolo
Copy link
Author

Thanks. At first sight this doesn't reveal any further information.
I think cri-o is doing some magic init command detection for the container.command here.

From what I had found, it does so when it detects sth like /run but I might be wrong: https://github.com/cri-o/cri-o/blob/f9f73da6d6787565660510ec3a6c6a93ecf55d06/server/container_create_linux.go#L160

I'll see if I can set up a test case to verify that.

Great, that would be nice indeed!

Btw. updating to the mount-option-filtering branch is really easy:

* Checkout lxc from upstream master, build and install

* Checkout the `mount-option-filtering` branch build and install (ignore complaints about `crio-lxc-start` being in use, only the `crio-lxc` binary must be updated)

After that you're done. No need to stop crio or shutdown the cluster.

Great. I will try this out then.

@vrubiolo
Copy link
Author

Btw. updating to the mount-option-filtering branch is really easy:

* Checkout lxc from upstream master, build and install

* Checkout the `mount-option-filtering` branch build and install (ignore complaints about `crio-lxc-start` being in use, only the `crio-lxc` binary must be updated)

After that you're done. No need to stop crio or shutdown the cluster.

Ok, I just tried it and things are definitely better, we don't stumble on the tmpfs issue anymore. We fail on the cgroup2 mount however:

lxc b690e79e6a3186fa88c9a3d6900a7f175e432ce5c82982d452ab0ad00fc0e39e 20210211165657.782 DEBUG    conf - conf.c:mount_entry:2055 - Mounted "proc" on "/usr/local/lib/lxc/rootfs//proc" with filesystem type "proc"
lxc b690e79e6a3186fa88c9a3d6900a7f175e432ce5c82982d452ab0ad00fc0e39e 20210211165657.782 DEBUG    conf - conf.c:mount_entry:2055 - Mounted "tmpfs" on "/usr/local/lib/lxc/rootfs//dev" with filesystem type "tmpfs"
lxc b690e79e6a3186fa88c9a3d6900a7f175e432ce5c82982d452ab0ad00fc0e39e 20210211165657.782 DEBUG    conf - conf.c:mount_entry:2055 - Mounted "devpts" on "/usr/local/lib/lxc/rootfs//dev/pts" with filesystem type "devpts"
lxc b690e79e6a3186fa88c9a3d6900a7f175e432ce5c82982d452ab0ad00fc0e39e 20210211165657.782 DEBUG    conf - conf.c:mount_entry:2055 - Mounted "mqueue" on "/usr/local/lib/lxc/rootfs//dev/mqueue" with filesystem type "mqueue"
lxc b690e79e6a3186fa88c9a3d6900a7f175e432ce5c82982d452ab0ad00fc0e39e 20210211165657.782 DEBUG    conf - conf.c:mount_entry:2055 - Mounted "sysfs" on "/usr/local/lib/lxc/rootfs//sys" with filesystem type "sysfs"
lxc b690e79e6a3186fa88c9a3d6900a7f175e432ce5c82982d452ab0ad00fc0e39e 20210211165657.782 DEBUG    conf - conf.c:mount_entry:2055 - Mounted "tmpfs" on "/usr/local/lib/lxc/rootfs//run" with filesystem type "tmpfs"
lxc b690e79e6a3186fa88c9a3d6900a7f175e432ce5c82982d452ab0ad00fc0e39e 20210211165657.783 DEBUG    conf - conf.c:mount_entry:2055 - Mounted "tmpfs" on "/usr/local/lib/lxc/rootfs//run/lock" with filesystem type "tmpfs"
lxc b690e79e6a3186fa88c9a3d6900a7f175e432ce5c82982d452ab0ad00fc0e39e 20210211165657.783 DEBUG    conf - conf.c:mount_entry:2055 - Mounted "tmpfs" on "/usr/local/lib/lxc/rootfs//tmp" with filesystem type "tmpfs"
lxc b690e79e6a3186fa88c9a3d6900a7f175e432ce5c82982d452ab0ad00fc0e39e 20210211165657.783 DEBUG    conf - conf.c:mount_entry:2055 - Mounted "tmpfs" on "/usr/local/lib/lxc/rootfs//var/log/journal" with filesystem type "tmpfs"
lxc b690e79e6a3186fa88c9a3d6900a7f175e432ce5c82982d452ab0ad00fc0e39e 20210211165657.783 ERROR    utils - utils.c:safe_mount:1203 - Invalid argument - Failed to mount "cgroup2" onto "/usr/local/lib/lxc/rootfs//sys/fs/cgroup"
lxc b690e79e6a3186fa88c9a3d6900a7f175e432ce5c82982d452ab0ad00fc0e39e 20210211165657.783 ERROR    conf - conf.c:mount_entry:1986 - Invalid argument - Failed to mount "cgroup2" on "/usr/local/lib/lxc/rootfs//sys/fs/cgroup"
lxc b690e79e6a3186fa88c9a3d6900a7f175e432ce5c82982d452ab0ad00fc0e39e 20210211165657.783 ERROR    conf - conf.c:lxc_setup:3447 - Failed to setup mount entries
lxc b690e79e6a3186fa88c9a3d6900a7f175e432ce5c82982d452ab0ad00fc0e39e 20210211165657.783 ERROR    start - start.c:do_start:1261 - Failed to setup container "b690e79e6a3186fa88c9a3d6900a7f175e432ce5c82982d452ab0ad00fc0e39e"
lxc b690e79e6a3186fa88c9a3d6900a7f175e432ce5c82982d452ab0ad00fc0e39e 20210211165657.783 ERROR    sync - sync.c:sync_wait:36 - An error occurred in another process (expected sequence number 5)
lxc b690e79e6a3186fa88c9a3d6900a7f175e432ce5c82982d452ab0ad00fc0e39e 20210211165657.784 ERROR    start - start.c:__lxc_start:2069 - Failed to spawn container "b690e79e6a3186fa88c9a3d6900a7f175e432ce5c82982d452ab0ad00fc0e39e"
lxc b690e79e6a3186fa88c9a3d6900a7f175e432ce5c82982d452ab0ad00fc0e39e 20210211165657.784 WARN     start - start.c:lxc_abort:1012 - No such process - Failed to send SIGKILL via pidfd 16 for process 1939661

I am not sure what this means since I had enabled cgroupsv2 as per your instructions:

$ cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-linux root=UUID=559c8e62-62d0-4f52-a3b3-5526fffcc2d5 rw net.ifnames=0 rootflags=compress-force=zstd systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all

It looks like another case of a custom option, mount indeed reports something unknown in dmesg:

# dmesg | tail
[74287.670363] cgroup2: Unknown parameter 'private'
[74302.600968] audit: type=1334 audit(1613062945.709:15213): prog-id=7177 op=LOAD
[74302.600973] audit: type=1334 audit(1613062945.712:15214): prog-id=7177 op=UNLOAD
[74302.609624] cgroup2: Unknown parameter 'private'
[74314.601654] audit: type=1334 audit(1613062957.706:15215): prog-id=7178 op=LOAD
[74314.601789] audit: type=1334 audit(1613062957.709:15216): prog-id=7178 op=UNLOAD
[74314.609332] cgroup2: Unknown parameter 'private'
[74327.640657] audit: type=1334 audit(1613062970.742:15217): prog-id=7179 op=LOAD
[74327.640805] audit: type=1334 audit(1613062970.742:15218): prog-id=7179 op=UNLOAD
[74327.644713] cgroup2: Unknown parameter 'private'

Cluster logs:
02.11_16.58.53.zip

@r10r
Copy link
Contributor

r10r commented Feb 11, 2021

Same issue as with tmpfs. Wait a minute.

@r10r
Copy link
Contributor

r10r commented Feb 11, 2021

Please git pull and reinstall crio-lxc.

@r10r
Copy link
Contributor

r10r commented Feb 11, 2021

It should work now. The mount option set for cgroup2 is private not rprivate.

@vrubiolo
Copy link
Author

vrubiolo commented Feb 11, 2021

Thanks for the additional fix, we now get much further, past the mount issue.

It still fails however, this time w/ a 255 exit code:

lxc b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2 20210211221420.688 INFO     conf - conf.c:lxc_transient_proc:3020 - Caller's PID is 1; /proc/self points to 1
lxc b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2 20210211221420.688 DEBUG    conf - conf.c:lxc_setup_devpts_child:1533 - No new devpts instance will be mounted since no pts devices are requested
lxc b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2 20210211221420.688 INFO     conf - conf.c:lxc_setup_ttys:910 - Finished setting up 0 /dev/tty<N> device(s)
lxc b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2 20210211221420.688 DEBUG    conf - conf.c:setup_caps:2465 - Capabilities have been setup
lxc b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2 20210211221420.688 NOTICE   conf - conf.c:lxc_setup:3552 - The container "b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2" is set up
lxc b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2 20210211221420.689 DEBUG    start - start.c:lxc_spawn:1924 - Preserved time namespace via fd 21
lxc b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2 20210211221420.689 NOTICE   utils - utils.c:lxc_drop_groups:1350 - Dropped supplimentary groups
lxc b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2 20210211221420.689 NOTICE   start - start.c:start:2155 - Exec'ing "/.crio-lxc/init"
lxc b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2 20210211221420.689 NOTICE   start - start.c:post_start:2166 - Started "/.crio-lxc/init" with pid "516243"
lxc b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2 20210211221420.689 NOTICE   start - start.c:signal_handler:404 - Received 17 from pid 516242 instead of container init 516243
lxc 20210211221420.689 DEBUG    commands - commands.c:lxc_cmd_rsp_recv:172 - Response data length for command "get_state" is 0
lxc 20210211221420.689 DEBUG    commands - commands.c:lxc_cmd_get_state:821 - Container "b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2" is in "RUNNING" state
lxc 20210211221420.689 DEBUG    commands - commands.c:lxc_cmd_rsp_recv:172 - Response data length for command "get_init_pid" is 0
{"l":"info","cmd":"create","cid":"b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2","pid":516237,"t":"20210211221420.689","c":"create.go:115","m":"init process is running, container is created"}
{"l":"debug","cmd":"create","cid":"b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2","file":"/usr/local/bin/crio-lxc-backup.sh","t":"20210211221420.690","c":"cli.go:308","m":"execute create hook"}
{"l":"debug","cmd":"create","cid":"b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2","duration":28.625221,"t":"20210211221420.695","c":"cli.go:227","m":"cmd completed"}
lxc b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2 20210211221420.700 DEBUG    commands - commands.c:lxc_cmd_rsp_recv:172 - Response data length for command "get_state" is 0
lxc b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2 20210211221420.700 DEBUG    commands - commands.c:lxc_cmd_get_state:821 - Container "b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2" is in "RUNNING" state
lxc b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2 20210211221420.700 DEBUG    commands - commands.c:lxc_cmd_rsp_recv:172 - Response data length for command "get_init_pid" is 0
{"l":"info","cmd":"state","cid":"b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2","pid":516237,"status":"created","t":"20210211221420.700","c":"runtime.go:550","m":"container state"}
{"l":"debug","cmd":"state","cid":"b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2","duration":1.296478,"t":"20210211221420.700","c":"cli.go:227","m":"cmd completed"}
{"l":"info","cmd":"start","cid":"b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2","t":"20210211221420.709","c":"runtime.go:448","m":"notify init to start container process"}
lxc b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2 20210211221420.711 DEBUG    commands - commands.c:lxc_cmd_rsp_recv:172 - Response data length for command "get_state" is 0
lxc b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2 20210211221420.711 DEBUG    commands - commands.c:lxc_cmd_get_state:821 - Container "b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2" is in "RUNNING" state
lxc b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2 20210211221420.711 DEBUG    commands - commands.c:lxc_cmd_rsp_recv:172 - Response data length for command "get_init_pid" is 0
lxc b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2 20210211221420.712 DEBUG    commands - commands.c:lxc_cmd_rsp_recv:172 - Response data length for command "get_init_pid" is 0
{"l":"debug","cmd":"start","cid":"b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2","duration":2.740617,"t":"20210211221420.712","c":"cli.go:227","m":"cmd completed"}
lxc b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2 20210211221420.722 DEBUG    commands - commands.c:lxc_cmd_rsp_recv:172 - Response data length for command "get_state" is 0
lxc b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2 20210211221420.722 DEBUG    commands - commands.c:lxc_cmd_get_state:821 - Container "b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2" is in "RUNNING" state
lxc b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2 20210211221420.723 DEBUG    commands - commands.c:lxc_cmd_rsp_recv:172 - Response data length for command "get_init_pid" is 0
{"l":"info","cmd":"state","cid":"b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2","pid":516237,"status":"running","t":"20210211221420.723","c":"runtime.go:550","m":"container state"}
{"l":"debug","cmd":"state","cid":"b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2","duration":4.44571,"t":"20210211221420.723","c":"cli.go:227","m":"cmd completed"}
lxc b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2 20210211221420.784 DEBUG    start - start.c:signal_handler:422 - Container init process 516243 exited
lxc b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2 20210211221420.785 INFO     error - error.c:lxc_error_set_and_log:28 - Child <516243> ended on error (255)
lxc b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2 20210211221420.785 DEBUG    network - network.c:lxc_delete_network:3987 - Deleted network devices
{"l":"info","cmd":"state","cid":"b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2","pid":516237,"status":"stopped","t":"20210211221420.808","c":"runtime.go:550","m":"container state"}
{"l":"debug","cmd":"state","cid":"b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2","duration":0.919083,"t":"20210211221420.808","c":"cli.go:227","m":"cmd completed"}
{"l":"info","cmd":"state","cid":"b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2","pid":516237,"status":"stopped","t":"20210211221420.812","c":"runtime.go:550","m":"container state"}
{"l":"debug","cmd":"state","cid":"b6bbe23a9bd2366078113688cf21115f36f85895b669e0ea36d20d7b2fd533e2","duration":0.831701,"t":"20210211221420.813","c":"cli.go:227","m":"cmd completed"}
lxc d6077b9371884b0a027aebce07f56d265c22d8b332589220bfff5d6b191648fe 20210211221421.150 DEBUG    commands - commands.c:lxc_cmd_rsp_recv:172 - Response data length for command "get_state" is 0
lxc d6077b9371884b0a027aebce07f56d265c22d8b332589220bfff5d6b191648fe 20210211221421.150 DEBUG    commands - commands.c:lxc_cmd_get_state:821 - Container "d6077b9371884b0a027aebce07f56d265c22d8b332589220bfff5d6b191648fe" is in "RUNNING" state
{"l":"info","cmd":"kill","cid":"d6077b9371884b0a027aebce07f56d265c22d8b332589220bfff5d6b191648fe","signum":15,"t":"20210211221421.015","c":"runtime.go:359","m":"killing container process"}

So it looks like the containerized environment was setup and that /sbin/init could run. However it failed w/ error code 255 and I am now trying to investigate. Note that the same container runs fine using lxc on my other machine (in priviledged mode)

The LXC source has:

/*---------------------------------------------------------------------------*/
/* lxc_error_set_and_log
 * function is here to convert
 * the reported status to an exit code as detailed here:
 *
 *   0-126       exit code of the application
 *   128+n       signal n received by the application
 *   255         lxc error
 */
int lxc_error_set_and_log(int pid, int status)

so it might be an LXC issue itself but I am unable to get more information about it.

Another possibility is that this is a systemd error but I can't see any logs whatsoever (kubectl logs rp-selftest returns nothing) nor do I see a console enabled in the lxc config file lxc.console.path = none.

Is there a way to have output here to know what went wrong? I'd also like to pass additional flags to systemd but I am not sure how to do that in a containerized environment (this is done via the kernel cmdline on a real system IIUC).

Logs: 02.11_22.16.05.zip

@r10r
Copy link
Contributor

r10r commented Feb 11, 2021

I'd also like to pass additional flags to systemd but I am not sure how to do that in a containerized environment

environment variables man 1 systemd

@vrubiolo
Copy link
Author

I'd also like to pass additional flags to systemd but I am not sure how to do that in a containerized environment

environment variables man 1 systemd

Good point, thank you.

But then, where does the output goes if there is no console from what I see (I could be wrong)?

@r10r
Copy link
Contributor

r10r commented Feb 11, 2021

The output from the container init process goes to stdio (stderr, stdout) .
You'll see the output in kubectl logs ...

The container process inherits stdio from crio-lxc-start which inherits it from crio-lxc which inherits it from conmon (which is started by cri-o). Conmon then writes it to the container logfiles in/var/log/pods/...

@vrubiolo
Copy link
Author

The output from the container init process goes to stdio (stderr, stdout) .
You'll see the output in kubectl logs ...

That's my issue, there are no logs I can see here ...:

[vagrant@vagrant-k8s ~]$ k get pods
NAME                                  READY   STATUS             RESTARTS   AGE
cilium-66pbl                          1/1     Running            0          8d
cilium-operator-65c5fc987f-79hxb      1/1     Running            0          8d
coredns-74ff55c5b-5spgp               0/1     CrashLoopBackOff   353        4d15h
coredns-74ff55c5b-qqplv               0/1     CrashLoopBackOff   353        8d
etcd-vagrant-k8s                      1/1     Running            0          8d
kuard                                 1/1     Running            0          7d22h
kube-apiserver-vagrant-k8s            1/1     Running            0          8d
kube-controller-manager-vagrant-k8s   1/1     Running            0          8d
kube-proxy-4wfjm                      1/1     Running            0          8d
kube-scheduler-vagrant-k8s            1/1     Running            0          8d
rp-selftest                           0/1     Error              0          8h

[vagrant@vagrant-k8s ~]$ k logs rp-selftest 
[nothing]

The container process inherits stdio from crio-lxc-start which inherits it from crio-lxc which inherits it from conmon (which is started by cri-o). Conmon then writes it to the container logfiles in/var/log/pods/...

Thanks for the explanation. I don't see anything under /var/log/pods/... either. That's my issue for debugging ... :

# crictl ps -a | grep rp-selftest
ea7fbf657eada       localhost/selftest@sha256:159f158159d399bcbc83fc250d385097655ba5018de26541f78bcc5f58a9c0f1   9 hours ago          Exited              rp-selftest               0                   fbb0e5a590f42
[root@vagrant-k8s pods]# cd /var/log/pods/kube-system_rp-selftest_f796ce48-c892-4908-b2c6-99ab651312bf/
[root@vagrant-k8s kube-system_rp-selftest_f796ce48-c892-4908-b2c6-99ab651312bf]# ls -al
total 0
drwxr-xr-x 1 root root  158 Feb 11 22:51 .
drwxr-xr-x 1 root root 2028 Feb 11 22:51 ..
-rw------- 1 root root    0 Feb 11 22:51 fbb0e5a590f422235a1842dcd064e466b64e4259137a4e2aad7f293f36405589.log
drwxr-xr-x 1 root root   10 Feb 11 22:51 rp-selftest
[root@vagrant-k8s kube-system_rp-selftest_f796ce48-c892-4908-b2c6-99ab651312bf]# cat fbb0e5a590f422235a1842dcd064e466b64e4259137a4e2aad7f293f36405589.log 
[root@vagrant-k8s kube-system_rp-selftest_f796ce48-c892-4908-b2c6-99ab651312bf]# cd rp-selftest/
[root@vagrant-k8s rp-selftest]# ls -al
total 0
drwxr-xr-x 1 root root  10 Feb 11 22:51 .
drwxr-xr-x 1 root root 158 Feb 11 22:51 ..
-rw------- 1 root root   0 Feb 11 22:51 0.log

[root@vagrant-k8s rp-selftest]# cat 0.log 
[root@vagrant-k8s rp-selftest]#

Would you have another suggestion here?

@vrubiolo
Copy link
Author

Would you have another suggestion here?

Pretty please ? 🙏

@r10r
Copy link
Contributor

r10r commented Feb 12, 2021

Pretty please ? 🙏

No begging please. ;) https://www.youtube.com/watch?v=-ZRUaDGW7WQ

I'll take look at it.

@vrubiolo
Copy link
Author

Pretty please ? pray

No begging please. ;) https://www.youtube.com/watch?v=-ZRUaDGW7WQ

Good quote, indeed (sorry if I conveyed unintentional context, I just meant to ask, not to be insistent).

I'll take look at it.

Thanks for any pointers you may have, it's the systemd early startup part which I am unfamiliar with.

@r10r
Copy link
Contributor

r10r commented Feb 12, 2021

Running the container in privileged mode privileged: true (see manifest below) should work fine for crio-lxc from the mount-option-filtering branch. This should work as long as your systemd version is recent enough.

Documentation

I recommend to read both documents here to get better understanding of the security implications:

Example

I created an ubuntu 20:04 container from an lxc template:

c=test-systemd 
lxc-create -n $c--template download -- --dist ubuntu --release focal --arch amd64
n=$(buildah from scratch)
p=$(buildah mount $n)
cd $p
rsync -avH /usr/local/var/lib/lxc/$c/rootfs/ .
cd
buildah commit working-container ubuntu-focal:0.1

Now I can run the container in privileged mode using the following manifest:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: systemd-test
spec:
  selector:
    matchLabels:
      app: systemd-test
  serviceName: "systemd-test"
  replicas: 1 
  template:
    metadata:
      labels:
        app: systemd-test
    spec:
      containers:
        - name: systemd-test
          image: localhost/ubuntu-focal:0.1
          imagePullPolicy: Always
          command:
            - /sbin/init
          workingDir: /
          env:
            # PATH and TERM are set automatically
          securityContext:
            privileged: true
[root@k8s-cluster8-controller sgg-k8s]# kubectl create -f systemd-test.yml 
statefulset.apps/systemd-test created
[root@k8s-cluster8-controller sgg-k8s]# kubectl exec -it pod/systemd-test-0   -- /bin/bash
root@systemd-test-0:/# ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  3.6  0.0 103008 10852 ?        Ss   16:58   0:00 /sbin/init
root          21  3.6  0.0  35300 10356 ?        S<s  16:58   0:00 /lib/systemd/systemd-journald
systemd+      32  1.6  0.0  18604  6680 ?        Ss   16:58   0:00 /lib/systemd/systemd-networkd
systemd+      55  2.0  0.1  24164 12720 ?        Ss   16:58   0:00 /lib/systemd/systemd-resolved
root          57  0.0  0.0   9420  2960 ?        Ss   16:58   0:00 /usr/sbin/cron -f
message+      58  0.3  0.0   7416  4184 ?        Ss   16:58   0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root          61  2.0  0.1  31620 18428 ?        Ss   16:58   0:00 /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
syslog        62  0.0  0.0 224356  4364 ?        Ssl  16:58   0:00 /usr/sbin/rsyslogd -n -iNONE
root          63  1.6  0.0  16680  6584 ?        Ss   16:58   0:00 /lib/systemd/systemd-logind
root         247  0.0  0.0   8436  1756 tty1     Ss+  16:58   0:00 /sbin/agetty -o -p -- \u --noclear tty1 linux
root         248  0.0  0.0   4248  3436 ?        S    16:58   0:00 /bin/bash
root         255  0.0  0.0   5904  2940 ?        R+   16:58   0:00 ps aux

@vrubiolo
Copy link
Author

Thank you! 👍

This is what I was missing here (we also run our lxc containers privileged but I had forgotten that because I have all my lxc-XXX commands aliases to prefix them w/ sudo). Great point to also override the image entry point via the manifest, this is much quicker than editing the image configuration itself via buildah

Let me try your suggestion and report back !

@vrubiolo
Copy link
Author

vrubiolo commented Feb 14, 2021

Hi Ruben,

I just wanted to chime back to say thanks for the latest manifest and confirm running in privileged mode allowed me to start my existing systemd-based container in Kubernetes. Thank you for the links you pointed at (I had read the systemd container interface when I was debugging my startup issues, a very good read indeed).

This is great 🎆, thank you for your excellent work done on this project!

My next step will be to stress-test this container, then run Sonobuoy over the sample cluster + expand the cluster capacity to more nodes at the same time I split our current monolith into more containers, now that Kubernetes is up and running. I will also look into picking a distribution for the nodes OS (likely in the Redhat family, as we work w/ those right now). I will let you know how it goes !

@r10r r10r transferred this issue from another repository Mar 24, 2021
@r10r r10r transferred this issue from another repository Apr 8, 2021
@r10r
Copy link
Contributor

r10r commented Apr 30, 2021

This is related to #31

@r10r r10r closed this as completed Apr 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants