Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typical kubernetes workload metrics as telemetry input to enable dashboarding and alerting #972

Open
11 of 13 tasks
a-thaler opened this issue Apr 12, 2024 · 1 comment
Open
11 of 13 tasks
Assignees
Labels
area/metrics MetricPipeline kind/feature Categorizes issue or PR as related to a new feature.
Milestone

Comments

@a-thaler
Copy link
Collaborator

a-thaler commented Apr 12, 2024

Description
The MetricPipeline supports already an input type runtime which emits metrics around the container and pod resource consumption. What is missing are further typical metrics:

  • from the apiserver about configured resource limits
  • from the apiserver around the state of workloads
  • from the kubelet statistics of the volumes
  • from the kubelet statistics of the nodes
    mainly the typical metrics resulting from the kubletstatsreceiver and the k8sclusterreceiver

Having these metrics available, basic troubleshooting for kubernetes workload including alerting can be fullfiled.

Goal
Provide a way to collect a typical set of metrics for basic workload troubleshooting (comparable to the metrics used by the dashboards provided by the kube-prometheus-stack)

Criterias

  • Typical metrics are collectable which are needed to troubleshoot
    • Pod compute resource
    • Node resource usage
    • Volume resource usage
    • Health of workloads (deployment stuck for example)
  • Namespace specific metrics can be enabled per namespace (probably independent from non-namespaces resources)
  • Node and Volume related metrics can be enabled optional to workload related metrics

Actions

Reasons
The current feature set is a good start but are missing apiserver related details like limits to get a complete picture for troubleshooting and defining relevant alerts. Furthermore typical workload health related metrics are missing from the apiserver. Also volumes and node statistics are important in daily operations.

Attachments

Release Notes


@a-thaler a-thaler added kind/feature Categorizes issue or PR as related to a new feature. area/metrics MetricPipeline labels Apr 12, 2024
@a-thaler a-thaler changed the title Metric inputs to cover typical workload operations Typical kubernetes workload metrics as telemetry input to enable dashboarding and alerting Apr 23, 2024
@a-thaler a-thaler added this to the 1.27.0 milestone Oct 31, 2024
@a-thaler
Copy link
Collaborator Author

Feature will be fully rolled out with version 1.27.0. Afterwards, the defaults get changed so that the sub-selectors are enabled by default for new clusters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metrics MetricPipeline kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

2 participants