-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GA support for Windows hostprocess containers #7857
Conversation
- Add windows calico-node daemonset with hostprocess containers. - Add windows kube-proxy daemonset using sig-windows hostprocess container image. - Add optional (required only for windows) fields KUBERNETES_SERVICE_CIDR and KUBERNETES_DNS_SERVERS to kubernetes-services-endpoint configmap. Manually tested with a clean installation and VXLAN setup and images from projectcalico/calico#7857. Still a work in progress. Pending: - Tests for windows calico-node rendering (pkg/render/windows_test.go) - Possibly make CNI BIN and CONF dirs configurable for Windows in the installation - Possibly roll out our own kube-proxy image instead of using the one from sig-windows - Possibly adding
- Add windows calico-node daemonset with hostprocess containers. - Add windows kube-proxy daemonset using sig-windows hostprocess container image. - Add optional (required only for windows) fields KUBERNETES_SERVICE_CIDR and KUBERNETES_DNS_SERVERS to kubernetes-services-endpoint configmap. Manually tested with a clean installation and VXLAN setup and images from projectcalico/calico#7857. Still a work in progress. Pending: - Tests for windows calico-node rendering (pkg/render/windows_test.go) - Possibly make CNI BIN and CONF dirs configurable for Windows in the installation - Possibly roll out our own kube-proxy image instead of using the one from sig-windows - Possibly adding an initContainer to clean up "old" style non-HPC Calico installations before proceeding
- Add windows calico-node daemonset with hostprocess containers. - Add windows kube-proxy daemonset using sig-windows hostprocess container image. - Add optional (required only for windows) fields KUBERNETES_SERVICE_CIDR and KUBERNETES_DNS_SERVERS to kubernetes-services-endpoint configmap. Manually tested with a clean installation and VXLAN setup and images from projectcalico/calico#7857. Still a work in progress. Pending: - Tests for windows calico-node rendering (pkg/render/windows_test.go) - Possibly make CNI BIN and CONF dirs configurable for Windows in the installation - Possibly roll out our own kube-proxy image instead of using the one from sig-windows - Possibly adding an initContainer to clean up "old" style non-HPC Calico installations before proceeding
693c0f2
to
dbf4a29
Compare
cni-plugin/pkg/install/install.go
Outdated
// In the case of Windows HostProcess containers this prepends the CONTAINER_SANDBOX_MOUNT_POINT env variable | ||
// for other operating systems or if the sandbox env variable is not set it returns the standard mount points | ||
// see https://kubernetes.io/docs/tasks/configure-pod-container/create-hostprocess-pod/#volume-mounts | ||
func getHostPath(path string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you only support Containerd 1.7+ this shouldn't be needed and would clean up this section of code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so then just referring to the paths as /host/foo
would work on Windows without any need for change? Thanks, I hadn't tried that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, if they are volumemounts to the /host/
location that should work. This is why the In-Cluster config now works 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsturtevant I got rid of this getHostPath()
function on my latest commit, though I still needed to use the $CONTAINER_SANDBOX_MOUNT_POINT
env var when referring to the binaries inside the container, which I thought wouldn't be needed. Will tag you on the relevant chunk of code...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick update: I ended up implementing support for containerd v1.6, so we're using $CONTAINER_SANDBOX_MOUNT_POINT
just about everywhere that's necessary. Everything's working on my tests so far, and that way we can support a containerd version that's widely adopted right now...
containerBinDir := "/opt/cni/bin/" | ||
// The binaries dir in the container needs to be prepended by the CONTAINER_SANDBOX_MOUNT_POINT env var on Windows Host Process Containers | ||
// see https://kubernetes.io/docs/tasks/configure-pod-container/create-hostprocess-pod/#containerd-v1-7-and-greater | ||
if runtime.GOOS == "windows" { | ||
containerBinDir = os.Getenv("CONTAINER_SANDBOX_MOUNT_POINT") + "/" + containerBinDir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsturtevant I still needed to use the $CONTAINER_SANDBOX_MOUNT_POINT
env var when referring to the binaries inside the container here, which from what I read on https://kubernetes.io/docs/tasks/configure-pod-container/create-hostprocess-pod/#containerd-v1-7-and-greater shouldn't be needed, right? Could you please take a look and let me know if I missed something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does the yaml look like for your deployment? I don't see it in the PR here. Could it be that the Working directory is set to something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've mostly been working with the tigera operator with the changes from this PR: tigera/operator#2732, but the deployment also works with this manifest: https://gist.github.com/coutinhop/7528503be5d0aeb22001d69d82fa61ed
No working dir on the install-cni initContainer, but I did include $env:CONTAINER_SANDBOX_MOUNT_POINT
in the path for command
in it, could that cause issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No working dir on the install-cni initContainer, but I did include
$env:CONTAINER_SANDBOX_MOUNT_POINT
in the path forcommand
in it, could that cause issues?
yes I think that would do it. could you try with something like command: ["opt/cni/bin/install.exe"]
or /opt/cni/bin/install.exe
depending on where those files live?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I've tried it (removing mentions to $env:CONTAINER_SANDBOX_MOUNT_POINT
), but I got
'/opt/cni/bin/install.exe' is not recognized as an internal or external command
I see that the yaml for kube-proxy also still uses $env:CONTAINER_SANDBOX_MOUNT_POINT
https://github.com/kubernetes-sigs/sig-windows-tools/blob/3ed28295f8694dffd03e144af4423137b2b8c30b/hostprocess/calico/kube-proxy/kube-proxy.yml#L25
Apparently the binaries (and scripts, I couldn't manage to make the uninstall-calico initContainer work either) are not in these absolute paths, I'm guessing some path in c:/hpc/...
would work, but then why not just use $env:CONTAINER_SANDBOX_MOUNT_POINT
, right?
This is by far not a deal breaker, stuff was working so I wouldn't block anything by it (unless you see $env:CONTAINER_SANDBOX_MOUNT_POINT
being dropped at some point and these containers as they are becoming unsupported).
One other quick question: is the presence of $env:CONTAINER_SANDBOX_MOUNT_POINT
a reliable indicator that we're inside a host process container? We've used it in a couple of places, I wonder if we should look into changing that...
Once again, thanks so much!
- Add windows calico-node daemonset with hostprocess containers. - Add windows kube-proxy daemonset using sig-windows hostprocess container image. - Add optional (required only for windows) fields KUBERNETES_SERVICE_CIDR and KUBERNETES_DNS_SERVERS to kubernetes-services-endpoint configmap. Manually tested with a clean installation and VXLAN setup and images from projectcalico/calico#7857. Still a work in progress. Pending: - Tests for windows calico-node rendering (pkg/render/windows_test.go) - Possibly make CNI BIN and CONF dirs configurable for Windows in the installation - Possibly roll out our own kube-proxy image instead of using the one from sig-windows - Possibly adding an initContainer to clean up "old" style non-HPC Calico installations before proceeding
Build on the changes from projectcalico#7260: - Add a Windows cni-plugin hostprocess image, which will install the cni binaries and config on the host. - Add a Windows node hostprocess image, which no longer needs to match the hosts' Windows version (e.g. only 1809 needs to be used and is supported on 2022).
base for windows HPC images. Add uninstall-calico powershell script to be run in initContainer, which will clean up non-HPC Calico installations from the host before starting Calico Windows HPC. Fix bug in confd-service.ps1
HPC when using containerd v1.7+
before substitution (this helps with operator testing).
1ac6289
to
e202970
Compare
…h containerd v1.6+ (instead of requiring v1.7+).
…d with '$env:CONTAINER_SANDBOX_MOUNT_POINT'). Fix logic regarding CNI paths in uninstall-calico-hpc.ps1.
cccded6
to
c81279d
Compare
Fix bug in GetHostPath()
- Add windows calico-node daemonset with hostprocess containers. - Add windows kube-proxy daemonset using sig-windows hostprocess container image. - Add optional (required only for windows) fields KUBERNETES_SERVICE_CIDR and KUBERNETES_DNS_SERVERS to kubernetes-services-endpoint configmap. Manually tested with a clean installation and VXLAN setup and images from projectcalico/calico#7857. Still a work in progress. Pending: - Tests for windows calico-node rendering (pkg/render/windows_test.go) - Possibly make CNI BIN and CONF dirs configurable for Windows in the installation - Possibly roll out our own kube-proxy image instead of using the one from sig-windows - Possibly adding an initContainer to clean up "old" style non-HPC Calico installations before proceeding
…ags() to winutils.go and use them instead of upstream's InClusterConfig(), BuildConfigFromFlags() in order to get Windows HPC to work with containerd v1.6. (tests intentionally left untouched in order to try to catch discrepancies in behavior from upstream if/when they happen). Fix PATH in Windows dockerfiles.
Use winutils.GetHostPath() on file paths in felix param_types.go. Add resolving of $env:CONTAINER_SANDBOX_MOUNT_POINT to winutils.GetHostPath().
- clarify HPC env var comments - add logging when skipping chmod on windows - fix kubeconfigPath prefix issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Remove calico windows upgrader code * Add support for Windows hostprocess containers - Add windows calico-node daemonset with hostprocess containers. Manually tested with a clean installation and VXLAN setup and images from projectcalico/calico#7857. * Add uninstall-calico initContainer to the calico windows daemonset * Make CNI conf filename configurable for Windows * Don't render windows daemonset if disabled in installation resource. * Fix windows HPC support checking and add tests for it * Make windows HPC requirements be containerd v1.6 (instead of v1.7) and fix file paths to make them compatible with v1.6 * Add nodes/status permissions to cni-plugin clusterrole. Add externalnetworks plugin to node clusterrole when on EE. Fix CNI paths and template in order for HPC to work with containerd v1.6. * Move windows daemonset rendering from core-controller into its own windows-controller. Remove ServiceCIDRs and DNSServers from k8s endpoint configmap. Move ServiceCIDRs to InstallationSpec and auto-detect dns servers from the cluster in windows-controller. Add getWindowsBackend() function to windows rendering, to deduplicate this calculation. Remove containerd v1.7+ requirement for windows daemonset (HPC) as it is no longer necessary. Fix mixups of calls to cniDirectories() in windows rendering. * Add old calico-windows-upgrade clean up and make default WindowsDataplane disabled on every provider except AKS. * Fix openshift and RKE2 dns service values on windows_controller * Remove finalizer logic from windows controller and rendering. * Further add windows rendering UTs and fix some edge cases. * Further add windows render UTs * Address review comments: Fill in windows default cni paths in installation resource Remove 'terminating' logic from windows objects Misc name changes and fixes * Further address review comments: - Only populate Installation.WindowsNodes when windows is enabled - Use same serviceaccount/clusterroles/bindings for windows and linux calico-node - Remove unused env vars from calico-node-windows containers - Test changes to accomodate code changes * Fix WindowsNodeSpec defaults * Fix path env vars * - Further address review comments - Move validation out of windows_controller - Remove unnecessary env vars from windows node containers - Test adjustments
The windows upgrade code was cleaned up in projectcalico#7857, but there were leftover calls to it in node-service.ps1, leading to errors: flag provided but not defined: -should-install-windows-upgrade In addition to that, the token refresher, when run in a non-HPC scenario, would fail (on top of being unnecessary in non-HPC since that already requires the kubeconfig to be copied into the windows host): 2024-07-16 20:38:31.133 [FATAL][5064] cni-config-monitor/token_watch.go 59: Failed to read service account namespace file error=open /var/run/secrets/kubernetes.io/serviceaccount/namespace: The system cannot find the path specified. These were masked by the fact that logs are complex to get in non-HPC windows (requires RDP access into the host machine, so e2e failures retrieve nothing), and kubectl get nodes would show the windows nodes as 'Ready' even though these errors were occurring.
The windows upgrade code was cleaned up in projectcalico#7857, but there were leftover calls to it in node-service.ps1, leading to errors: flag provided but not defined: -should-install-windows-upgrade In addition to that, the token refresher, when run in a non-HPC scenario, would fail (on top of being unnecessary in non-HPC since that already requires the kubeconfig to be copied into the windows host): 2024-07-16 20:38:31.133 [FATAL][5064] cni-config-monitor/token_watch.go 59: Failed to read service account namespace file error=open /var/run/secrets/kubernetes.io/serviceaccount/namespace: The system cannot find the path specified. These were masked by the fact that logs are complex to get in non-HPC windows (requires RDP access into the host machine, so e2e failures retrieve nothing), and kubectl get nodes would show the windows nodes as 'Ready' even though these errors were occurring.
The windows upgrade code was cleaned up in projectcalico#7857, but there were leftover calls to it in node-service.ps1, leading to errors: flag provided but not defined: -should-install-windows-upgrade In addition to that, the token refresher, when run in a non-HPC scenario, would fail (on top of being unnecessary in non-HPC since that already requires the kubeconfig to be copied into the windows host): 2024-07-16 20:38:31.133 [FATAL][5064] cni-config-monitor/token_watch.go 59: Failed to read service account namespace file error=open /var/run/secrets/kubernetes.io/serviceaccount/namespace: The system cannot find the path specified. These were masked by the fact that logs are complex to get in non-HPC windows (requires RDP access into the host machine, so e2e failures retrieve nothing), and kubectl get nodes would show the windows nodes as 'Ready' even though these errors were occurring.
Build on the changes from #7260:
Manually tested with an operator image from tigera/operator#2732 and a VXLAN setup.
Many thanks to @jsturtevant and @fabi200123
Description
Related issues/PRs
Todos
Release Note
Reminder for the reviewer
Make sure that this PR has the correct labels and milestone set.
Every PR needs one
docs-*
label.docs-pr-required
: This change requires a change to the documentation that has not been completed yet.docs-completed
: This change has all necessary documentation completed.docs-not-required
: This change has no user-facing impact and requires no docs.Every PR needs one
release-note-*
label.release-note-required
: This PR has user-facing changes. Most PRs should have this label.release-note-not-required
: This PR has no user-facing changes.Other optional labels:
cherry-pick-candidate
: This PR should be cherry-picked to an earlier release. For bug fixes only.needs-operator-pr
: This PR is related to install and requires a corresponding change to the operator.