Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calico Windows HostProcess manifests do not authorize with the projected service token, they use KUBECONFIG env var incorrectly #7445

Closed
doctorpangloss opened this issue Mar 9, 2023 · 4 comments

Comments

@doctorpangloss
Copy link

doctorpangloss commented Mar 9, 2023

The calico-windows-configmap omits KUBECONFIG: "C:\\CalicoWindows\\calico-kube-config". calico-node.exe api client doesn't look for the projected service token on Windows in the hostprocess environment, because config.ps1 always set KUBECONFIG.

If it happens to be set to the node's joined kubeconfig, and mutual trust is used to join nodes, you will see the errors in this ticket.

If it isn't set, it will default to C:\\CalicoWindows\\calico-kube-config which will take precedence over the service account token.

Thus the service account token is never used in windows hpc calico.

Adding KUBECONFIG: "C:\\CalicoWindows\\calico-kube-config" on machines with KUBECONFIG set resolves the issue, but only accidentally. Eventually the token in that file will expire.

Related: #7337
Might be related: #5910

Expected Behavior

Calico for Windows hostprocess containers install method should use its projected service account correctly.

Current Behavior

Instead, when KUBECONFIG is set, node uses the node's kubeconfig, which has a different user without access to Calico resources. This is unexpected.

install logs:

Environment variable KUBE_NETWORK is not set. Setting it to the default value: Calico.*
Environment variable CALICO_NETWORKING_BACKEND is already set: windows-bgp
Environment variable K8S_SERVICE_CIDR is already set: 10.152.184.0/24
Environment variable DNS_NAME_SERVERS is already set: 10.152.184.10
Environment variable DNS_SEARCH is not set. Setting it to the default value: svc.cluster.local
Environment variable CALICO_DATASTORE_TYPE is not set. Setting it to the default value: kubernetes
Environment variable KUBECONFIG is already set: C:/k/config
Environment variable ETCD_ENDPOINTS is not set. Setting it to the default value: 
Environment variable ETCD_KEY_FILE is not set. Setting it to the default value: 
Environment variable ETCD_CERT_FILE is not set. Setting it to the default value: 
Environment variable ETCD_CA_CERT_FILE is not set. Setting it to the default value: 
Environment variable CNI_BIN_DIR is already set: c:\opt\cni\bin
Environment variable CNI_CONF_DIR is already set: c:\etc\cni\net.d
Environment variable CNI_CONF_FILENAME is not set. Setting it to the default value: 10-calico.conf
Environment variable CNI_IPAM_TYPE is not set. Setting it to the default value: calico-ipam
Environment variable VXLAN_VNI is not set. Setting it to the default value: 4096
Environment variable VXLAN_MAC_PREFIX is already set: 0E-2A
Environment variable VXLAN_ADAPTER is not set. Setting it to the default value: 
Environment variable NODENAME is already set: appmana-006
Environment variable CALICO_K8S_NODE_REF is not set. Setting it to the default value: appmana-006
Environment variable STARTUP_VALID_IP_TIMEOUT is not set. Setting it to the default value: 90
Environment variable IP is not set. Setting it to the default value: autodetect
Environment variable CALICO_LOG_DIR is not set. Setting it to the default value: C:\CalicoWindows\logs
Environment variable FELIX_LOGSEVERITYFILE is not set. Setting it to the default value: none
Environment variable FELIX_LOGSEVERITYSYS is not set. Setting it to the default value: none
Root dir c:\CalicoWindows exists. Removing Calico CNI plugin if installed...
Removing Calico CNI conf file at c:\etc\cni\net.d\10-calico.conf ...
Removing Calico CNI binaries at c:\opt\cni\bin/calico*.exe ...
WARNING: The names of some imported commands from the module 'helper' include unapproved verbs that might make them 
less discoverable. To find the commands with unapproved verbs, run the Import-Module command again with the Verbose 
parameter. For a list of approved verbs, type Get-Verb.
Unzip Calico for Windows release...
Setup Calico for Windows...
Install script is running in a HostProcess container. This namespace is kube-system
Install script is running in a HostProcess container, using mounted serviceaccount ca cert and token.
Using existing kubeconfig at c:\\k\\config for API server host and port. server: https://10.2.0.19:6443
Backend networking is windows-bgp

Start Calico for Windows install...

Setting environment variables if not set...
Environment variable KUBE_NETWORK is already set: Calico.*
Environment variable CALICO_NETWORKING_BACKEND is already set: windows-bgp
Environment variable K8S_SERVICE_CIDR is already set: 10.152.184.0/24
Environment variable DNS_NAME_SERVERS is already set: 10.152.184.10
Environment variable DNS_SEARCH is already set: svc.cluster.local
Environment variable CALICO_DATASTORE_TYPE is already set: kubernetes
Environment variable KUBECONFIG is already set: C:/k/config
Environment variable ETCD_ENDPOINTS is not set. Setting it to the default value: 
Environment variable ETCD_KEY_FILE is not set. Setting it to the default value: 
Environment variable ETCD_CERT_FILE is not set. Setting it to the default value: 
Environment variable ETCD_CA_CERT_FILE is not set. Setting it to the default value: 
Environment variable CNI_BIN_DIR is already set: c:\opt\cni\bin
Environment variable CNI_CONF_DIR is already set: c:\etc\cni\net.d
Environment variable CNI_CONF_FILENAME is already set: 10-calico.conf
Environment variable CNI_IPAM_TYPE is already set: calico-ipam
Environment variable VXLAN_VNI is already set: 4096
Environment variable VXLAN_MAC_PREFIX is already set: 0E-2A
Environment variable VXLAN_ADAPTER is not set. Setting it to the default value: 
Environment variable NODENAME is already set: appmana-006
Environment variable CALICO_K8S_NODE_REF is already set: appmana-006
Environment variable STARTUP_VALID_IP_TIMEOUT is already set: 90
Environment variable IP is already set: autodetect
Environment variable CALICO_LOG_DIR is already set: C:\CalicoWindows\logs
Environment variable FELIX_LOGSEVERITYFILE is already set: none
Environment variable FELIX_LOGSEVERITYSYS is already set: none
Validating configuration...
Copying CNI binaries to c:\opt\cni\bin
Writing CNI configuration to c:\etc\cni\net.d\10-calico.conf.
Wrote CNI configuration.
CONTAINER_SANDBOX_MOUNT_POINT is set, skipping service installation

Calico for Windows installed

node logs:

2023-03-09 12:21:46.701 [WARNING][2200] startup/utils.go 49: Terminating
Calico node initialisation failed, will retry...
2023-03-09 12:21:47.763 [INFO][2820] startup/startup.go 427: Early log level set to info
2023-03-09 12:21:47.764 [INFO][2820] startup/utils.go 127: Using NODENAME environment for node name appmana-006
2023-03-09 12:21:47.764 [INFO][2820] startup/utils.go 139: Determined node name: appmana-006
2023-03-09 12:21:47.764 [INFO][2820] startup/startup.go 94: Starting node appmana-006 with version v3.23.5
2023-03-09 12:21:47.768 [INFO][2820] startup/startup.go 106: Skipping datastore connection test
2023-03-09 12:21:47.781 [ERROR][2820] startup/startup.go 111: Unable to ensure datastore is migrated. error=unable to query ClusterInformation to determine Calico version: connection is unauthorized: clusterinformations.crd.projectcalico.org "default" is forbidden: User "system:node:appmana-006" cannot get resource "clusterinformations" in API group "crd.projectcalico.org" at the cluster scope
2023-03-09 12:21:47.781 [WARNING][2820] startup/utils.go 49: Terminating
Calico node initialisation failed, will retry...

Observe that calico-node is using the authorization token in c:\\k\\config instead of the calico-node service account's token as specified.

I believe this is because node is looking in the wrong path on containerd 1.7.0-beta.3

Possible Solution

Use the right path to the service account token on 1.7.0-beta.3 and later.

Steps to Reproduce (for bugs)

  1. Create a cluster that uses mutual authentication for nodes. This is part of the kubeadm workflow, and also how I connect Windows nodes to a k0s cluster. This means you will use a bootstrapping kubeconfig for the node, then approve a CSR for the serving node. For example:

On your machine, get bootstrapping kubeconfig

# some control plane node address using k0s
CONTROL_PLANE=10.2.0.10
WORKER=10.2.0.11
# also works with kubeadm I believe
ssh administrator@"${CONTROL_PLANE}" -- sudo k0s token create --role=worker | base64 -D | gunzip - > bootstrap-kubeconfig.yaml
scp bootstrap-kubeconfig.yaml administrator@"$WORKER":C:/bootstrap-kubeconfig.yaml

On Windows worker, install Kubelet

Invoke-WebRequest https://docs.tigera.io/calico/3.25/scripts/Install-Containerd.ps1 -OutFile c:\Install-Containerd.ps1
c:\Install-Containerd.ps1 -ContainerDVersion 1.7.0-beta.3 -CNIConfigPath "c:/etc/cni/net.d" -CNIBinPath "c:/opt/cni/bin"
Invoke-WebRequest https://docs.tigera.io/calico/3.25/scripts/PrepareNode.ps1 -OutFile c:\PrepareNode.ps1
c:\PrepareNode.ps1 -KubernetesVersion v1.26.2 -ContainerRuntime ContainerD
mkdir -pv C:/var/lib/kubelet/pki

C:\k\kubelet.exe --bootstrap-kubeconfig=/bootstrap-kubeconfig.yaml --rotate-certificates --rotate-server-certificates --cert-dir=$env:SYSTEMDRIVE/var/lib/kubelet/pki --kubeconfig=/k/config --hostname-override=$(hostname) --pod-infra-container-image=`"mcr.microsoft.com/oss/kubernetes/pause:3.6`" --enable-debugging-handlers --cgroups-per-qos=false --enforce-node-allocatable=`"`" --resolv-conf=`"`" --container-runtime-endpoint=npipe:////.//pipe//containerd-containerd  --cluster-dns=10.96.0.10 --cluster-domain=cluster.local

Set KUBECONFIG

setx /m KUBECONFIG C:/k/config

Approve the csr

BARE_WORKER=${WORKER%%.*}
# finds the csrs associated with the worker and approves them
kubectl get csr -o json | jq ".items[] | select(.spec.username == \"system:node:${BARE_WORKER}\") | .metadata.name" | xargs kubectl certificate approve
  1. Observe C:/k/config is authorizing system:node:worker-hostname. Mine in these logs is appmana-006, a test machine.
  2. Use the hostprocess manifests to install Calico on Windows.
  3. Observe the service account in the manifests is correctly calico-node.
  4. Observe the logs for node indicate it has authorized with system:node:appmana-006. It should be calico-node.
  5. Observe this user is not permitted to access clusterinformation.
  6. Observe adding KUBECONFIG: "C:\\CalicoWindows\\calico-kube-config" to the configmap resolves the issue.

Your Environment

  • Calico version: 3.23.5
  • Orchestrator version: kubernetes 1.26.2
  • Operating System and version: Windows 2022
@doctorpangloss doctorpangloss changed the title Calico Windows HostProcess manifests do not find service token in the right place with containerd 1.7.0-beta.3 Calico Windows HostProcess manifests do not find service token in the right place Mar 9, 2023
@doctorpangloss doctorpangloss changed the title Calico Windows HostProcess manifests do not find service token in the right place Calico Windows HostProcess manifests do not authorize with the projected service token, they use KUBECONFIG env var incorrectly Mar 9, 2023
@sridhartigera
Copy link
Member

@coutinhop PTAL

@doctorpangloss
Copy link
Author

doctorpangloss commented Mar 21, 2023

reproduces in 3.25 and containerd 1.7.0. Core issue is how the kube API client is constructed in Calico's libraries.

@coutinhop
Copy link
Contributor

This will be fixed by #7857 which will use inclusterconfig correctly on windows HPC

@coutinhop
Copy link
Contributor

Closing this as fixed by #7857 and tigera/operator#2732

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants