Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: duplicate custom resources #2066

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

murphd40
Copy link
Contributor

What this PR does / why we need it:

Fix issue where custom resources are duplicated

How does this change affect the cardinality of KSM: (increases, decreases or does not change cardinality)

No change

Which issue(s) this PR fixes:
Fixes # #2044


Testing:

Before change

$ go run main.go --port=8080 --telemetry-port=8081 --kubeconfig=$KUBECONFIG \
--custom-resource-state-config='{"spec":{"resources":[{"groupVersionKind":{"group":"operators.coreos.com","version":"v1alpha1","kind":"ClusterServiceVersion"},"metrics":[{"name":"csv_info","help":"Cluster Service Version install status","each":{"type":"Info","info":{"labelsFromPath":{"name":["metadata","name"],"status":["status","phase"]}}}}]}]}}' \
--resources clusterserviceversions
I0514 02:05:12.826277   10899 wrapper.go:98] "Starting kube-state-metrics"
I0514 02:05:12.827409   10899 server.go:191] "Used resources" resources=[clusterserviceversions clusterserviceversions]
I0514 02:05:12.827474   10899 types.go:184] "Using all namespaces"
I0514 02:05:12.827484   10899 server.go:218] "Metric allow-denylisting" allowDenyStatus="Excluding the following lists that were on denylist: "
I0514 02:05:12.829363   10899 server.go:357] "Tested communication with server"
I0514 02:05:13.400436   10899 server.go:362] "Run with Kubernetes cluster version" major="1" minor="26" gitVersion="v1.26.3+k3s1" gitTreeState="clean" gitCommit="01ea3ff27be0b04f945179171cec5a8e11a14f7b" platform="linux/amd64"
I0514 02:05:13.400496   10899 server.go:363] "Communication with server successful"
I0514 02:05:13.402199   10899 server.go:314] "Started metrics server" metricsServerAddress="[::]:8080"
I0514 02:05:13.402238   10899 metrics_handler.go:99] "Autosharding disabled"
I0514 02:05:13.402307   10899 server.go:303] "Started kube-state-metrics self metrics server" telemetryAddress="[::]:8081"
I0514 02:05:13.402448   10899 custom_resource_metrics.go:79] "Custom resource state added metrics" familyNames=[kube_customresource_csv_info]
I0514 02:05:13.402740   10899 custom_resource_metrics.go:79] "Custom resource state added metrics" familyNames=[kube_customresource_csv_info]
I0514 02:05:13.403061   10899 builder.go:246] "Active resources" activeStoreNames="clusterserviceversions,clusterserviceversions"
I0514 02:05:13.403142   10899 server.go:73] levelinfomsgListening onaddress[::]:8080
I0514 02:05:13.403162   10899 server.go:73] levelinfomsgTLS is disabled.http2falseaddress[::]:8080
I0514 02:05:13.403197   10899 server.go:73] levelinfomsgListening onaddress[::]:8081
I0514 02:05:13.403296   10899 server.go:73] levelinfomsgTLS is disabled.http2falseaddress[::]:8081
$ curl localhost:8080/metrics
# HELP kube_customresource_csv_info Cluster Service Version install status
# TYPE kube_customresource_csv_info info
kube_customresource_csv_info{customresource_group="operators.coreos.com",customresource_kind="ClusterServiceVersion",customresource_version="v1alpha1",name="t8c-client-operator.v1.2.2",status="Succeeded"} 1
kube_customresource_csv_info{customresource_group="operators.coreos.com",customresource_kind="ClusterServiceVersion",customresource_version="v1alpha1",name="packageserver",status="Succeeded"} 1
# HELP kube_customresource_csv_info Cluster Service Version install status
# TYPE kube_customresource_csv_info info
kube_customresource_csv_info{customresource_group="operators.coreos.com",customresource_kind="ClusterServiceVersion",customresource_version="v1alpha1",name="packageserver",status="Succeeded"} 1
kube_customresource_csv_info{customresource_group="operators.coreos.com",customresource_kind="ClusterServiceVersion",customresource_version="v1alpha1",name="t8c-client-operator.v1.2.2",status="Succeeded"} 1

After Change

$ go run main.go --port=8080 --telemetry-port=8081 --kubeconfig=$KUBECONFIG \
--custom-resource-state-config='{"spec":{"resources":[{"groupVersionKind":{"group":"operators.coreos.com","version":"v1alpha1","kind":"ClusterServiceVersion"},"metrics":[{"name":"csv_info","help":"Cluster Service Version install status","each":{"type":"Info","info":{"labelsFromPath":{"name":["metadata","name"],"status":["status","phase"]}}}}]}]}}' \
--resources clusterserviceversions     
I0514 02:04:35.340743   10694 wrapper.go:98] "Starting kube-state-metrics"
I0514 02:04:35.341747   10694 server.go:191] "Used resources" resources=[clusterserviceversions]
I0514 02:04:35.341813   10694 types.go:184] "Using all namespaces"
I0514 02:04:35.341824   10694 server.go:218] "Metric allow-denylisting" allowDenyStatus="Excluding the following lists that were on denylist: "
I0514 02:04:35.343590   10694 server.go:357] "Tested communication with server"
I0514 02:04:35.931327   10694 server.go:362] "Run with Kubernetes cluster version" major="1" minor="26" gitVersion="v1.26.3+k3s1" gitTreeState="clean" gitCommit="01ea3ff27be0b04f945179171cec5a8e11a14f7b" platform="linux/amd64"
I0514 02:04:35.931434   10694 server.go:363] "Communication with server successful"
I0514 02:04:35.933402   10694 server.go:303] "Started kube-state-metrics self metrics server" telemetryAddress="[::]:8081"
I0514 02:04:35.933432   10694 metrics_handler.go:99] "Autosharding disabled"
I0514 02:04:35.933422   10694 server.go:314] "Started metrics server" metricsServerAddress="[::]:8080"
I0514 02:04:35.933617   10694 custom_resource_metrics.go:79] "Custom resource state added metrics" familyNames=[kube_customresource_csv_info]
I0514 02:04:35.933989   10694 builder.go:246] "Active resources" activeStoreNames="clusterserviceversions"
I0514 02:04:35.934409   10694 server.go:73] levelinfomsgListening onaddress[::]:8080
I0514 02:04:35.934447   10694 server.go:73] levelinfomsgTLS is disabled.http2falseaddress[::]:8080
I0514 02:04:35.934422   10694 server.go:73] levelinfomsgListening onaddress[::]:8081
I0514 02:04:35.934512   10694 server.go:73] levelinfomsgTLS is disabled.http2falseaddress[::]:8081
curl localhost:8080/metrics
# HELP kube_customresource_csv_info Cluster Service Version install status
# TYPE kube_customresource_csv_info info
kube_customresource_csv_info{customresource_group="operators.coreos.com",customresource_kind="ClusterServiceVersion",customresource_version="v1alpha1",name="packageserver",status="Succeeded"} 1
kube_customresource_csv_info{customresource_group="operators.coreos.com",customresource_kind="ClusterServiceVersion",customresource_version="v1alpha1",name="t8c-client-operator.v1.2.2",status="Succeeded"} 1

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 14, 2023
@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 14, 2023
Copy link
Member

@rexagod rexagod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplication can be suppressed by specifying --custom-resources-only flag for the native metrics, which will force KSM to generate metrics only from the CRS configuration, and which has been the supported behaviour till now for when the same (internal) object is specified under both flags.

resources = append(resources, opts.Resources.AsSlice()...)
resources = opts.Resources.AsSlice()
Copy link
Member

@rexagod rexagod May 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One case that satisfies the default case is when a custom-resource-state-only flag is not specified, and the resources flag is non-nil. In this case if we also specify custom-resource-state-config flag (with a CRS configuration), we'd get duplicate metrics if the entries in both overlap (for a common native resource).

resources in the code snippet above are the extracted resources we obtain from the CRS configuration, and removing them here would result in removing the current ability for KSM to generate and serve native and custom metrics (for different objects), for instance, kube-state-metrics --kubeconfig=$KUBECONFIG --custom-resource-state-config-file=... --resources=....

IIUC with this change, the CRS resources extracted will be ignored entirely (for the aforementioned case), which will be a breaking change, since that's the expected behaviour ever since CRS was introduced.

@murphd40
Copy link
Contributor Author

Hi @rexagod . In that case would it be better to add logic to remove duplicates from the resources list?

@murphd40 murphd40 force-pushed the 2044-duplicate-custom-resources branch from 74c3477 to 6f7be35 Compare May 14, 2023 21:49
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 14, 2023
@dgrisonnet
Copy link
Member

Holding since I am not a fan of using custom-state-metrics as a way to hack around kube-state-metrics. I am moving the discussion back to the issue.

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 14, 2023
@dgrisonnet
Copy link
Member

/triage accepted
/assign

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 29, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2024
@@ -53,6 +53,7 @@ import (
"k8s.io/kube-state-metrics/v2/pkg/metricshandler"
"k8s.io/kube-state-metrics/v2/pkg/optin"
"k8s.io/kube-state-metrics/v2/pkg/options"
"k8s.io/kube-state-metrics/v2/pkg/util/collections"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use sets here instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@murphd40 any progress? :)

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: murphd40
Once this PR has been reviewed and has the lgtm label, please ask for approval from dgrisonnet. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rexagod
Copy link
Member

rexagod commented Jan 21, 2024

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 21, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 20, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 20, 2024
@rexagod
Copy link
Member

rexagod commented May 20, 2024

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label May 20, 2024
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 11, 2024
@murphd40 murphd40 changed the title fix duplicate custom resources fix: duplicate custom resources Jul 11, 2024
@murphd40
Copy link
Contributor Author

Hi @ikarlashov @rexagod. Apologies for the delay. I have updated the PR as suggested.

However, when I run

go run main.go --port=8080 --telemetry-port=8081 --kubeconfig=$KUBECONFIG \
--custom-resource-state-config='{"spec":{"resources":[{"groupVersionKind":{"group":"operators.coreos.com","version":"v1alpha1","kind":"ClusterServiceVersion"},"metrics":[{"name":"csv_info","help":"Cluster Service Version install status","each":{"type":"Info","info":{"labelsFromPath":{"name":["metadata","name"],"status":["status","phase"]}}}}]}]}}' \
--resources clusterserviceversions

It fails:

go run main.go --port=8080 --telemetry-port=8081 --kubeconfig=$KUBECONFIG \
--custom-resource-state-config='{"spec":{"resources":[{"groupVersionKind":{"group":"operators.coreos.com","version":"v1alpha1","kind":"ClusterServiceVersion"},"metrics":[{"name":"csv_info","help":"Cluster Service Version install status","each":{"type":"Info","info":{"labelsFromPath":{"name":["metadata","name"],"status":["status","phase"]}}}}]}]}}' \
--resources clusterserviceversions
I0711 16:23:01.705060   97371 wrapper.go:120] "Starting kube-state-metrics"
I0711 16:23:01.706299   97371 server.go:203] "Used resources" resources={"clusterserviceversions":{}}
E0711 16:23:01.706366   97371 wrapper.go:40] "Failed to run kube-state-metrics" err="failed to set up resources: resource clusterserviceversions does not exist. Available resources: networkpolicies,validatingwebhookconfigurations,horizontalpodautoscalers,ingresses,jobs,cronjobs,namespaces,rolebindings,replicasets,resourcequotas,storageclasses,clusterrolebindings,endpointslices,poddisruptionbudgets,endpoints,serviceaccounts,services,persistentvolumes,secrets,volumeattachments,leases,mutatingwebhookconfigurations,persistentvolumeclaims,ingressclasses,pods,replicationcontrollers,statefulsets,deployments,limitranges,nodes,daemonsets,roles,certificatesigningrequests,clusterroles,configmaps"
exit status 1

It is failing on main as well so it is not related to my code. Am I missing something?

It is failing at this point:

https://github.com/kubernetes/kube-state-metrics/blob/85d1423b5b896f4fbce8394afed20cfe17371251/pkg/app/server.go#L205C12-L207

	if err := storeBuilder.WithEnabledResources(resources); err != nil {
		return fmt.Errorf("failed to set up resources: %v", err)
	}

It seems that the custom resource config is no longer added to storeBuilder before this check so the store builder does not recognize it...

@mrueg
Copy link
Member

mrueg commented Jul 11, 2024

Can you add a unit test that produces duplicate CRs as well?

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 12, 2024
@ikarlashov
Copy link
Contributor

@murphd40 Please :)

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 8, 2024
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@sebastiangaiser
Copy link

Is this PR still needed as #2502 got merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants