Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added WSL2 fix for kubelet #2390

Closed

Conversation

networkop
Copy link
Contributor

This PR fixes #2323.

@k8s-ci-robot
Copy link
Contributor

Welcome @networkop!

It looks like this is your first PR to kubernetes-sigs/kind 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/kind has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 27, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @networkop. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
  • If you have done the above and are still having issues with the CLA being reported as unsigned, please log a ticket with the Linux Foundation Helpdesk: https://support.linuxfoundation.org/
  • Should you encounter any issues with the Linux Foundation Helpdesk, send a message to the backup e-mail support address at: [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 27, 2021
@networkop networkop changed the title Added WSL2 workaround for kubelet Added WSL2 fix for kubelet Jul 27, 2021
@networkop
Copy link
Contributor Author

signed the CLA

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jul 27, 2021
@@ -204,6 +204,10 @@ fix_cgroup() {
# See: https://man7.org/linux/man-pages/man7/cgroups.7.html
local current_cgroup
current_cgroup=$(grep -E '^[^:]*:([^:]*,)?cpu(,[^,:]*)?:.*' /proc/self/cgroup | cut -d: -f3)
# on WSL2 systems /sys/fs/cgroup/systemd dir is missing which eventually leads to kubelet failing to start.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it WSL2 or just distros without systemd?
Are there WSL2 distros with systemd?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this stage I'm only certain about WSL2. I can reword this to say "and potentially other distros without systemd"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

# on WSL2 systems /sys/fs/cgroup/systemd dir is missing which eventually leads to kubelet failing to start.
# see: https://github.com/kubernetes-sigs/kind/issues/2323
# the following mkdir command is a common workaround, see: https://github.com/microsoft/WSL/issues/4189#issuecomment-758439957
mkdir -p /sys/fs/cgroup/systemd/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically I don't think this (cgroup controller heirarchy) has to be mounted at /sys/fs/cgroup though I've not ever encountered it mounted elsewhere.

see: https://man7.org/linux/man-pages/man7/cgroups.7.html

This directory would also normally be configured like mount -t cgroup -o none,name=systemd cgroup /sys/fs/cgroup/systemd

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually already depend on it being under /sys/fs/cgroup below though, regarding the first point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... And it's not just kind assuming this in the ecosystem https://kubernetes.slack.com/archives/C0BP8PW9G/p1627412176214900

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome, that's good to know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as for the mounting, I believe this will be done by this line
The current placement of this mkdir command is deliberate so that it gets picked up by the cgroup_subsystems on the next line and then passed to the mount_kubelet_cgroup_root "/kubelet" "${subsystem}"

At least, this is how it works for me. I've added just this one line and this allowed for the cluster to start up and results in the following mount being created by the entrypoint

root@k8s-guide-control-plane:/# mount | grep kubelet | head -n 1
cgroup on /sys/fs/cgroup/cpuset/kubelet type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)

networkop added a commit to networkop/tkng-labs that referenced this pull request Jul 28, 2021
@networkop networkop marked this pull request as draft July 29, 2021 13:29
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 29, 2021
@networkop
Copy link
Contributor Author

something's not right, investigating

@BenTheElder BenTheElder self-assigned this Jul 29, 2021
@networkop networkop marked this pull request as ready for review August 1, 2021 15:44
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 1, 2021
@networkop
Copy link
Contributor Author

I think I've finally found a stable workaround which works across reboots and docker restarts.
The reason why it didn't quite work previously was that any modifications made by the entrypoint (e.g. mkdir -p /sys/fs/cgroup/systemd/kubelet) were getting wiped out when the systemd started. Mounting to self like it's done here didn't work either.

The conclusion is that any changes done by the entrypoint to any path under /sys/fs/cgroup/systemd would not survive /sbin/init on WSL2.

So the alternative is to add an extra ExecStartPre to kubelet's drop-in unit and create /sys/fs/cgroup/systemd/kubelet when kubelet starts.

@aojea
Copy link
Contributor

aojea commented Aug 1, 2021

/ok-to-test
/lgtm
Thanks

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 1, 2021
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 1, 2021
@stealthybox
Copy link

I was able to reproduce the failed log message on WSL2.
I can confirm running the ExecStartPre during cluster creation fixes my problem:

docker exec -it kind-control-plane /bin/sh -euc "if [ ! -f /sys/fs/cgroup/cgroup.controllers ] && [ ! -d /sys/fs/cgroup/systemd/kubelet ]; then mkdir -p /sys/fs/cgroup/systemd/kubelet; fi"

/lgtm

@BenTheElder
Copy link
Member

BenTheElder commented Aug 25, 2021

I'm taking a break from some other things I've been on and doing some catchup today #2408 will go in first as it's an earlier PR Edit: Not caffeinated, it's not. However I've already pushed that image and it shouldn't take long to get back to this one.

@BenTheElder BenTheElder force-pushed the bug-wsl-cgroup-kubelet branch from 75cc065 to de1f245 Compare August 25, 2021 19:50
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 25, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: networkop
To complete the pull request process, please ask for approval from aojea after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@BenTheElder
Copy link
Member

reabsed, squashed, pushed base image. pushing node image.

@aojea
Copy link
Contributor

aojea commented Aug 25, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 25, 2021
@aojea
Copy link
Contributor

aojea commented Aug 25, 2021

/retest

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 25, 2021
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@BenTheElder
Copy link
Member

/retest

@BenTheElder
Copy link
Member

the docker rootless CI actions flake seems somewhat concerning, running it again and the prow / 1.19 CI

@aojea
Copy link
Contributor

aojea commented Aug 26, 2021

the docker rootless CI actions flake seems somewhat concerning, running it again and the prow / 1.19 CI

@BenTheElder #2421

@powerman
Copy link

Can someone test it using environment from #2440, to make sure it actually fixes it? Or tell me how to test it myself…

@aojea
Copy link
Contributor

aojea commented Sep 18, 2021

/retest

@aojea aojea mentioned this pull request Sep 18, 2021
@aojea aojea added this to the v0.12.0 milestone Oct 8, 2021
@aojea
Copy link
Contributor

aojea commented Oct 8, 2021

I'll incorporate this in the base bump #2465 for 0.12

@BenTheElder
Copy link
Member

rebased version merged in #2465, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WSL2 ERROR: failed to create cluster
6 participants