Can't use kubectl top pods/nodes on RKE2 #5805

KhalilSantana · 2024-11-11T12:21:51Z

What happened:

Hi, I'm attempting to use Karmada v1.11 on three RKE2 1.28.x clusters, one for karmada itself (cluster "vm2"), and two others for workloads (cluster "vm1" and "vm3"). Moreover, my kubeconfig is configured to use the Karmada resourse proxy for pods and nodes, just like the docs.

However, I can't seem to get kubectl top nodes working when using karmada (using each clusters' kubeconfig works)

kubectl top nodes
error: Metrics API not available

I've deployed the following plugins to the cluster: karmada-descheduler karmada-metrics-adapter karmada-search

Thus we see the metrics-adapter service running in the cluster:

kubectl get apiservices.apiregistration.k8s.io|grep metrics
v1beta1.custom.metrics.k8s.io          karmada-system/karmada-metrics-adapter        True        12d
v1beta2.custom.metrics.k8s.io          karmada-system/karmada-metrics-adapter        True        12d

What you expected to happen:

kubectl top nodes should be able to gets its metrics and present them to the user just like karmadactl can

karmadactl top nodes
NAME          CLUSTER   CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
k8s-worker3   vm3       207m         5%     3522Mi          44%       
rpi-0-xxxx    vm1       277m         6%     1571Mi          20%       
rpi-1-xxxx    vm1       346m         8%     2010Mi          53%       
rpi-2-xxxx    vm1       345m         8%     1752Mi          46%       
rpi-3-xxxx    vm1       309m         7%     1602Mi          42%       
rpi-4-xxxx    vm1       346m         8%     1863Mi          49%       
rpi-5-xxxx    vm3       341m         8%     1898Mi          50%       
rpi-6-xxxx    vm3       338m         8%     1994Mi          52%       
rpi-7-xxxx    vm3       330m         8%     1918Mi          50%       
rpi-8-xxxx    vm3       332m         8%     2351Mi          62%       
rpi-9-xxxx    vm3       351m         8%     1966Mi          51%       
v2-k8s-vm1    vm1       193m         4%     2225Mi          56%

How to reproduce it (as minimally and precisely as possible):

Install RKE2 1.28 in three clusters (cilium, although that is probably not the issue)
In one of the clusters, do a karmadactl join (push mode) the other clusters
Install the metrics-adapter addon for karmada via karmadactl
Wait for everything to settle and try kubectl top nodes

Anything else we need to know?:

Environment:

Karmada version 1.11
kubectl-karmada or karmadactl version (the result of kubectl-karmada version or karmadactl version): kubectl karmada version: version.Info{GitVersion:"v1.11.2", GitCommit:"80841d1cf6802c32c40af34ebe004633d74f4141", GitTreeState:"clean", BuildDate:"2024-10-31T11:00:02Z", GoVersion:"go1.22.7", Compiler:"gc", Platform:"linux/amd64"}
Others: I've also installed prometheus-community/prometheus-adapter in the member clusters because the karmada metrics-adapter was complaining about custom.metrics.k8s.io missing:

$ export KUBECONFIG=~/.kube/cluster-vm2.yaml # mgmt/karmada host cluster
$ kubectl logs -f karmada-metrics-adapter-587867b98c-bbz8x -n karmada-system
W1111 11:48:17.850919       1 custommetrics.go:306] custom.metrics.k8s.io not found in cluster(vm1)
W1111 11:48:17.852096       1 custommetrics.go:306] custom.metrics.k8s.io not found in cluster(vm1)

The text was updated successfully, but these errors were encountered:

RainbowMango · 2024-11-12T03:57:07Z

Hi @XiShanYongYe-Chang Please help to confirm if this is a known issue.

@KhalilSantana Is there any specific reason that you must use kubectl?

XiShanYongYe-Chang · 2024-11-12T07:50:50Z

Hi @KhalilSantana @RainbowMango this is a known issue.

We can see that kubectl top actually sends two requests. One is to request Node resources, and the other is to request NodeMetrics.

For karmada-apiserver, the kubectl command cannot obtain the nodes of all member clusters (karmadactl can do this). and NodeMetrics resources can be obtained through the karmada-metrics-adapter component. So the end result is what is shown in the picture

KhalilSantana · 2024-11-13T01:43:51Z

@RainbowMango Actually I've written kubectl in the issue, but what we actually care about is the API kubectl consumes/invokes, since the actual application that manages this cluster uses the k8s python lib. Our code does basically this:

    config.load_kube_config(kubeconfig_path)
    self.kubeapi = client.CoreV1Api()
    configuration = config.load_kube_config(kubeconfig_path)
    with kubernetes.client.ApiClient(configuration) as api_client:
      self.api_instance = kubernetes.client.CustomObjectsApi(api_client)
    k8s_nodes_metrics = self.api_instance.list_cluster_custom_object("metrics.k8s.io", "v1beta1", "nodes")

Which looks like Karmada metrics adapter should answer since it registers the endpoint:

% kubectl get apiservices.apiregistration.k8s.io|grep metrics
v1beta1.custom.metrics.k8s.io          karmada-system/karmada-metrics-adapter        True        22h
v1beta1.metrics.k8s.io                 karmada-system/karmada-metrics-adapter        True        22h
v1beta2.custom.metrics.k8s.io          karmada-system/karmada-metrics-adapter        True        22h

However, looking more closely I've noticed that the host cluster of the karmada apiserver has it failing for that service:

kubectl get apiservices.apiregistration.k8s.io|grep metrics
v1beta1.metrics.k8s.io                 karmada-system/karmada-metrics-adapter   False (FailedDiscoveryCheck)   37h

And the metrics-adapter logs says it can't verify some certificate:

E1112 01:34:21.567227       1 authentication.go:73] "Unable to authenticate the request" err="[x509: certificate signed by unknown authority, verifying certificate SN=557047679018993809, SKID=, AKID=DB:06:E9:B9:72:A5:40:66:3D:8A:B7:7C:0C:CA:4C:C6:62:83:90:B3 failed: x509: certificate signed by unknown authority]"
E1112 01:34:21.567227       1 authentication.go:73] "Unable to authenticate the request" err="[x509: certificate signed by unknown authority, verifying certificate SN=557047679018993809, SKID=, AKID=DB:06:E9:B9:72:A5:40:66:3D:8A:B7:7C:0C:CA:4C:C6:62:83:90:B3 failed: x509: certificate signed by unknown authority]"

@XiShanYongYe-Chang:

For karmada-apiserver, the kubectl command cannot obtain the nodes of all member clusters (karmadactl can do this).

I'm actually observing different behavior, I can see the nodes (since I've deployed the Proxy Global Resources). Thus, kubectl is able to see the nodes from member clusters:

$ kubectl get nodes -v=6
I1112 08:26:08.208517  158012 loader.go:395] Config loaded from file:  /home/khalil/.kube/redacted-karmada-proxy.yaml
I1112 08:26:08.328486  158012 round_trippers.go:553] GET https://redacted:32443/apis/search.karmada.io/v1alpha1/proxying/karmada/proxy/api/v1/nodes?limit=500 200 OK in 117 milliseconds
NAME          CREATED AT
rpi-0-xxxx    2024-10-30T19:37:02Z
rpi-1-xxxx    2024-10-14T20:31:36Z
rpi-2-xxxx    2024-10-14T20:31:28Z
rpi-3-xxxx    2024-10-14T20:31:26Z
rpi-4-xxxx    2024-10-14T20:31:26Z
v2-k8s-vm1    2024-10-14T20:22:43Z
k8s-worker3   2024-09-28T01:51:00Z
rpi-5-xxxx    2024-10-10T20:51:53Z
rpi-6-xxxx    2024-09-28T01:55:12Z
rpi-7-xxxx    2024-09-28T01:55:14Z
rpi-8-xxxx    2024-09-28T01:55:15Z
rpi-9-xxxx    2024-10-24T19:00:23Z

So it seems I got Karmada in some sort of invalid state currently (at least w.r.t metrics-adapter), I'll reinstall it and try again, and report back with more details sorry for the delay in the response.

KhalilSantana · 2024-11-13T18:26:17Z

OK, I've had to re-install the entire host cluster for Karmada because attempting to re-install karmada (even after cleaning out the var lib and etc directories of karmada didn't work). After this, the metrics-adapter pod seems to be doing OK, but still no metrics being available.

$ export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
$ sudo -E kubectl logs -f karmada-metrics-adapter-587867b98c-7zqfc -n karmada-system
I1113 17:34:03.098227       1 options.go:137] karmada-metrics-adapter version: version.Info{GitVersion:"v1.11.1", GitCommit:"5efee1388a2f3d75bc4590348b776587e76e3527", GitTreeState:"clean", BuildDate:"2024-09-14T09:57:12Z", GoVersion:"go1.22.7", Compiler:"gc", Platform:"linux/amd64"}
I1113 17:34:03.542369       1 serving.go:374] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
I1113 17:34:04.139085       1 handler.go:286] Adding GroupVersion metrics.k8s.io v1beta1 to ResourceManager
I1113 17:34:04.155220       1 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/etc/karmada/pki/ca.crt"
I1113 17:34:04.155236       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I1113 17:34:04.155170       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I1113 17:34:04.155278       1 shared_informer.go:313] Waiting for caches to sync for RequestHeaderAuthRequestController
I1113 17:34:04.155263       1 shared_informer.go:313] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I1113 17:34:04.155713       1 dynamic_serving_content.go:132] "Starting controller" name="serving-cert::apiserver.local.config/certificates/apiserver.crt::apiserver.local.config/certificates/apiserver.key"
I1113 17:34:04.156032       1 secure_serving.go:213] Serving securely on [::]:443
I1113 17:34:04.156094       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I1113 17:34:04.256080       1 shared_informer.go:320] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I1113 17:34:04.256103       1 shared_informer.go:320] Caches are synced for RequestHeaderAuthRequestController
I1113 17:34:04.261809       1 controller.go:236] Try to build informer manager for cluster iot
I1113 17:34:04.385230       1 controller.go:236] Try to build informer manager for cluster x86

Then, when I try probing at the endpoints I see that the metrics node endpoint isn't responding, even though the metrics adapter seems to have registered itself for the metrics endpoint and looks healthy (no errors at least):

$ export KUBECONFIG=~/.kube/redacted-karmada-proxy.yaml
$ kubectl get --raw '/apis/metrics.k8s.io/v1beta1/nodes'
Error from server (ServiceUnavailable): the server is currently unable to handle the request
$ kubectl get svc -A |grep metrics
karmada-system   karmada-metrics-adapter        ExternalName   <none>       karmada-metrics-adapter.karmada-system.svc.cluster.local   <none>    32m
$ kubectl get apiservices.apiregistration.k8s.io|grep metrics
v1beta1.custom.metrics.k8s.io          karmada-system/karmada-metrics-adapter        True        32m
v1beta1.metrics.k8s.io                 karmada-system/karmada-metrics-adapter        True        32m
v1beta2.custom.metrics.k8s.io          karmada-system/karmada-metrics-adapter        True        32m

@XiShanYongYe-Chang Just to be clear, in your previous comment you mean that the endpoint for the NodeMetrics (/apis/custom.metrics.k8s.io/v1beta1/nodes) should be available/working, but the NodeResources api/v1/nodes shouldn't?

If so, what could explain what I'm observing? The NodeResources is working (AFAIK) since I have the proxy global resources, but the metrics-adapter component seems to be non-operational.

KhalilSantana added the kind/bug Categorizes issue or PR as related to a bug. label Nov 11, 2024

github-project-automation bot added this to Karmada Overall Backlog Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't use kubectl top pods/nodes on RKE2 #5805

Can't use kubectl top pods/nodes on RKE2 #5805

KhalilSantana commented Nov 11, 2024

RainbowMango commented Nov 12, 2024

XiShanYongYe-Chang commented Nov 12, 2024

KhalilSantana commented Nov 13, 2024

KhalilSantana commented Nov 13, 2024

Can't use kubectl top pods/nodes on RKE2 #5805

Can't use kubectl top pods/nodes on RKE2 #5805

Comments

KhalilSantana commented Nov 11, 2024

RainbowMango commented Nov 12, 2024

XiShanYongYe-Chang commented Nov 12, 2024

KhalilSantana commented Nov 13, 2024

KhalilSantana commented Nov 13, 2024