You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi guys,
on a GPU-powered kubernetes node (v. 1.29.10), with the nvidia runtime set as default runtime, the containers kept crashing in an infinite loop.
Rigenerating the patched containerd.toml configuration file with:
nvidia-ctk runtime configure --runtime=containerd
I've realized there's a drift with the configuration proposed within the containerd package of this module. The rendered configuration by nvidia-ctk is:
As you can see there's the option SystemdCgroup = true incuded under the section [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]. Also, the runc snippet is not removed. I propose to add at least the missing plugin option in the upstream config.toml.j2 jinja template.
Tested on:
Fury 1.29 legacy
On prem module v1.31.4
Nvidia container toolkit v1.14.6
Node with Nvidia 1080Ti
The text was updated successfully, but these errors were encountered:
Hi guys,
on a GPU-powered kubernetes node (v. 1.29.10), with the nvidia runtime set as default runtime, the containers kept crashing in an infinite loop.
Rigenerating the patched
containerd.toml
configuration file with:I've realized there's a drift with the configuration proposed within the containerd package of this module. The rendered configuration by
nvidia-ctk
is:As you can see there's the option
SystemdCgroup = true
incuded under the section[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
. Also, the runc snippet is not removed. I propose to add at least the missing plugin option in the upstreamconfig.toml.j2
jinja template.Tested on:
The text was updated successfully, but these errors were encountered: