Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix github runner gvisor failure and add gocache #977

Merged
merged 1 commit into from
Apr 25, 2024
Merged

Conversation

JooyoungPark73
Copy link
Contributor

@JooyoungPark73 JooyoungPark73 commented Apr 22, 2024

We face gVisor runner failing all the time.
There are two reasons:

Wrong endpoint caused whole error, and container not being cleaned up properly

W0424 11:02:04.475265  259894 cleanupnode.go:99] [reset] Failed to remove containers: [failed to stop running pod I0424: output: I0424 11:01:52.572194  260177 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:52.579298  260177 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"I0424\": not found" podSandboxID="I0424"
time="2024-04-24T11:01:52Z" level=fatal msg="stopping the pod sandbox \"I0424\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"I0424\": not found"
: exit status 1, failed to stop running pod 11:01:52.417320: output: I0424 11:01:52.707750  260250 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:52.711341  260250 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"11:01:52.417320\": not found" podSandboxID="11:01:52.417320"
time="2024-04-24T11:01:52Z" level=fatal msg="stopping the pod sandbox \"11:01:52.417320\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"11:01:52.417320\": not found"
[reset] Deleting contents of directories: [/etc/kubernetes/manifests /var/lib/kubelet /etc/kubernetes/pki]
: exit status 1, failed to stop running pod 260012: output: I0424 11:01:52.833727  260320 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:52.837811  260320 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"260012\": not found" podSandboxID="260012"
time="2024-04-24T11:01:52Z" level=fatal msg="stopping the pod sandbox \"260012\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"260012\": not found"
: exit status 1, failed to stop running pod util_unix.go:103]: output: I0424 11:01:52.942331  260371 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:52.946834  260371 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"util_unix.go:103]\": not found" podSandboxID="util_unix.go:103]"
time="2024-04-24T11:01:52Z" level=fatal msg="stopping the pod sandbox \"util_unix.go:103]\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"util_unix.go:103]\": not found"
: exit status 1, failed to stop running pod "Using: output: I0424 11:01:53.049928  260431 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:53.055111  260431 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"\\\"Using\": not found" podSandboxID="\"Using"
time="2024-04-24T11:01:53Z" level=fatal msg="stopping the pod sandbox \"\\\"Using\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"\\\"Using\": not found"
: exit status 1, failed to stop running pod this: output: I0424 11:01:53.188970  260495 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:53.[19](https://github.com/vhive-serverless/vHive/actions/runs/8815318800/job/24197027374#step:10:20)2391  260495 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"this\": not found" podSandboxID="this"
time="2024-04-24T11:01:53Z" level=fatal msg="stopping the pod sandbox \"this\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"this\": not found"
: exit status 1, failed to stop running pod endpoint: output: I0424 11:01:53.299874  260564 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:53.303466  260564 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"endpoint\": not found" podSandboxID="endpoint"
time="[20](https://github.com/vhive-serverless/vHive/actions/runs/8815318800/job/24197027374#step:10:21)24-04-24T11:01:53Z" level=fatal msg="stopping the pod sandbox \"endpoint\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"endpoint\": not found"
: exit status 1, failed to stop running pod is: output: I0424 11:01:53.405669  260629 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:53.410281  260629 remote_runtime.go:[22](https://github.com/vhive-serverless/vHive/actions/runs/8815318800/job/24197027374#step:10:23)2] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"is\": not found" podSandboxID="is"
time="2024-04-24T11:01:53Z" level=fatal msg="stopping the pod sandbox \"is\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"is\": not found"
: exit status 1, failed to stop running pod deprecated,: output: I0424 11:01:53.513228  260677 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E04[24](https://github.com/vhive-serverless/vHive/actions/runs/8815318800/job/24197027374#step:10:25) 11:01:53.516442  260677 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"deprecated,\": not found" podSandboxID="deprecated,"
time="2024-04-24T11:01:53Z" level=fatal msg="stopping the pod sandbox \"deprecated,\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"deprecated,\": not found"
: exit status 1, failed to stop running pod please: output: I0424 11:01:53.624314  260748 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:53.628372  260748 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"please\": not found" podSandboxID="please"
time="2024-04-24T11:01:53Z" level=fatal msg="stopping the pod sandbox \"please\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"please\": not found"
: exit status 1, failed to stop running pod consider: output: I0424 11:01:53.731128  260819 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:53.735064  260819 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"consider\": not found" podSandboxID="consider"
time="2024-04-24T11:01:53Z" level=fatal msg="stopping the pod sandbox \"consider\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"consider\": not found"
: exit status 1, failed to stop running pod using: output: I0424 11:01:53.832924  260872 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:53.836874  260872 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"using\": not found" podSandboxID="using"
time="2024-04-24T11:01:53Z" level=fatal msg="stopping the pod sandbox \"using\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"using\": not found"
: exit status 1, failed to stop running pod full: output: I0424 11:01:53.927486  260933 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:53.931793  260933 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"full\": not found" podSandboxID="full"
time="2024-04-24T11:01:53Z" level=fatal msg="stopping the pod sandbox \"full\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"full\": not found"
: exit status 1, failed to stop running pod URL: output: I0424 11:01:54.036985  261003 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:54.040244  261003 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"URL\": not found" podSandboxID="URL"
time="2024-04-24T11:01:54Z" level=fatal msg="stopping the pod sandbox \"URL\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"URL\": not found"
: exit status 1, failed to stop running pod format": output: I0424 11:01:54.143087  261054 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:54.149392  261054 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"format\\\"\": not found" podSandboxID="format\""
time="2024-04-24T11:01:54Z" level=fatal msg="stopping the pod sandbox \"format\\\"\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"format\\\"\": not found"
: exit status 1, failed to stop running pod endpoint="/etc/vhive-cri/vhive-cri.sock": output: I0424 11:01:54.2703[25](https://github.com/vhive-serverless/vHive/actions/runs/8815318800/job/24197027374#step:10:26)  261170 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:54.273350  261170 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"endpoint=\\\"/etc/vhive-cri/vhive-cri.sock\\\"\": not found" podSandboxID="endpoint=\"/etc/vhive-cri/vhive-cri.sock\""
time="2024-04-24T11:01:54Z" level=fatal msg="stopping the pod sandbox \"endpoint=\\\"/etc/vhive-cri/vhive-cri.sock\\\"\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"endpoint=\\\"/etc/vhive-cri/vhive-cri.sock\\\"\": not found"
: exit status 1, failed to stop running pod URL="unix:///etc/vhive-cri/vhive-cri.sock": output: I0424 11:01:54.370869  [26](https://github.com/vhive-serverless/vHive/actions/runs/8815318800/job/24197027374#step:10:27)1262 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:54.374940  261262 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"URL=\\\"unix:///etc/vhive-cri/vhive-cri.sock\\\"\": not found" podSandboxID="URL=\"unix:///etc/vhive-cri/vhive-cri.sock\""
time="2024-04-24T11:01:54Z" level=fatal msg="stopping the pod sandbox \"URL=\\\"unix:///etc/vhive-cri/vhive-cri.sock\\\"\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"URL=\\\"unix:///etc/vhive-cri/vhive-cri.sock\\\"\": not found"
: exit status 1]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/super-admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]

Also, the file system removal is not being properly working.

Cleaning /run/gvisor-containerd/gvisor-containerd.sock /run/gvisor-containerd/gvisor-containerd.sock.ttrpc /run/gvisor-containerd/io.containerd.runtime.v1.linux /run/gvisor-containerd/io.containerd.runtime.v2.task
rm: cannot remove '/run/gvisor-containerd/io.containerd.runtime.v2.task/default/16/rootfs': Device or resource busy
rm: cannot remove '/run/gvisor-containerd/io.containerd.runtime.v2.task/default/15/rootfs': Device or resource busy
rm: cannot remove '/run/gvisor-containerd/io.containerd.runtime.v2.task/default/14/rootfs': Device or resource busy
rm: cannot remove '/run/gvisor-containerd/io.containerd.runtime.v2.task/default/13/rootfs': Device or resource busy
rm: cannot remove '/run/gvisor-containerd/io.containerd.runtime.v2.task/default/12/rootfs': Device or resource busy
rm: cannot remove '/run/gvisor-containerd/io.containerd.runtime.v2.task/default/11/rootfs': Device or resource busy
rm: cannot remove '/run/gvisor-containerd/io.containerd.runtime.v2.task/default/10/rootfs': Device or resource busy
rm: cannot remove '/run/gvisor-containerd/io.containerd.runtime.v2.task/default/9/rootfs': Device or resource busy
rm: cannot remove '/run/gvisor-containerd/io.containerd.runtime.v2.task/default/8/rootfs': Device or resource busy
Cleaning /var/lib/gvisor-containerd/containerd

I also added go action caching (but minor)
I add github runner to check the go.modfile and get the go version automatically. Except for build test.

@JooyoungPark73 JooyoungPark73 force-pushed the runner-gocache branch 12 times, most recently from 3215f90 to 71b2519 Compare April 24, 2024 12:34
@JooyoungPark73 JooyoungPark73 changed the title add go cache to github runner fix github runner gvisor failure and add gocache Apr 24, 2024
@JooyoungPark73 JooyoungPark73 requested a review from lrq619 April 24, 2024 12:45
@lrq619 lrq619 merged commit 617f6c7 into main Apr 25, 2024
23 checks passed
@lrq619 lrq619 deleted the runner-gocache branch April 25, 2024 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants