Metric API proposal to cover typical Kubernetes metrics #1001

a-thaler · 2024-04-22T13:23:50Z

Description
As outlined in #972 typical metrics should be easily collactable so that typical kubernetes workload monitoring gets possible. The metrics should be based on the kubletstatsreceiver and k8sclusterreceiver only.
A concrete API proposal is needed on how to enable the metric collection from user side. Hereby, you need to think about which metrics you are usually enabling together as they are always used in combination. Also, the selection via namespaces should be applied to namespace-typical metrics only.
The input name should trigger the right expectations.

Criterias

Concrete API proposal for selecting the typical metrics, outlining which chunk of metrics are getting enabled by which input
Users should be able to limit metrics to certain namespaces
Users should be able to enable bigger chunks selectively

Ideas

  input:
    cluster:
      enabled: true
      namespaces:
        include:
        - myNamespace
    host:
      enabled: true
    runtime:
      enabled: true
      namespaces:
        include:
        - myNamespace

Reasons

Attachments

Release Notes

The text was updated successfully, but these errors were encountered:

chrkl · 2024-06-07T06:33:21Z

Our discussions showed that it is hard to split the metrics from k8sclusterreceiver and kubeletstatsreceiver into the three sections cluster, host, and runtime. We rather propose to have a single input section and give the user the option to chose different resources to be included in the metric output:

input:
  runtime:
    enabled: true
    resources:
      pod:
        enabled: true
      container:
        enabled: true
      node:
        enabled: true
      volume:
        enabled: true        
      daemonset:
        enabled: false
      deployment:
        enabled: false
      statefulset:
        enabled: false
      quota:
        enabled: false
      job:
        enabled: false
      hpa:
        enabled: false
    namespaces:
      include:
      - myNamespace

The individual resources should include the following metrics:

pod

k8s.pod.cpu.time (kubeletstats)
k8s.pod.cpu.utilization (kubeletstats)
k8s.pod.filesystem.available (kubeletstats)
k8s.pod.filesystem.capacity (kubeletstats)
k8s.pod.filesystem.usage (kubeletstats)
k8s.pod.memory.available (kubeletstats)
k8s.pod.memory.usage (kubeletstats)
k8s.pod.network.errors (kubeletstats)
k8s.pod.network.io (kubeletstats)
k8s.pod.cpu.usage (kubeletstats)
k8s.pod.phase (k8scluster)

container

k8s.container.cpu_limit (k8scluster)
k8s.container.cpu_request (k8scluster)
k8s.container.ephemeralstorage_limit (k8scluster)
k8s.container.ephemeralstorage_request (k8scluster)
k8s.container.memory_limit (k8scluster)
k8s.container.memory_request (k8scluster)
k8s.container.restarts (k8scluster)
container.cpu.utilization (kubeletstats)
container.cpu.usage (kubeletstats)
container.cpu.time (kubeletstats)
container.filesystem.available (kubeletstats)
container.filesystem.capacity (kubeletstats)
container.filesystem.usage (kubeletstats)
container.memory.usage (kubeletstats)
container.cpu.usage (kubeletstats)

node

k8s.node.cpu.utilization (kubeletstats)
k8s.node.cpu.usage (kubeletstats)
k8s.node.filesystem.available (kubeletstats)
k8s.node.filesystem.capacity (kubeletstats)
k8s.node.filesystem.usage (kubeletstats)
k8s.node.memory.available (kubeletstats)
k8s.node.network.errors (kubeletstats)
k8s.node.network.io (kubeletstats)

volume

k8s.volume.available (kubeletstats)
k8s.volume.capacity (kubeletstats)

daemonset

k8s.daemonset.current_scheduled_nodes (k8scluster)
k8s.daemonset.desired_scheduled_nodes (k8scluster)
k8s.daemonset.misscheduled_nodes (k8scluster)
k8s.daemonset.ready_nodes (k8scluster)

deployment

k8s.deployment.available (k8scluster)
k8s.deployment.desired (k8scluster)

statefulset

k8s.statefulset.current_pods (k8scluster)
k8s.statefulset.desired_pods (k8scluster)
k8s.statefulset.ready_pods (k8scluster)
k8s.statefulset.updated_pods (k8scluster)

quota

k8s.resource_quota.hard_limit (k8scluster)
k8s.resource_quota.used (k8scluster)

job

k8s.cronjob.active_jobs (k8scluster)
k8s.job.active_pods (k8scluster)
k8s.job.desired_successful_pods (k8scluster)
k8s.job.failed_pods (k8scluster)
k8s.job.max_parallel_pods (k8scluster)
k8s.job.successful_pods (k8scluster)

hpa

k8s.hpa.current_replicas (k8scluster)
k8s.hpa.desired_replicas (k8scluster)
k8s.hpa.max_replicas (k8scluster)
k8s.hpa.min_replicas (k8scluster)

The bold marked metrics are already part of the runtime input (release 1.17).

chrkl · 2024-06-10T11:23:10Z

The shown proposal will be implemented as MetricPipeline input in a follow up.

a-thaler added area/metrics MetricPipeline kind/decision Marks a decision document labels Apr 22, 2024

a-thaler mentioned this issue Apr 22, 2024

Typical kubernetes workload metrics as telemetry input to enable dashboarding and alerting #972

Open

13 tasks

a-thaler changed the title ~~Metric API PoC to cover typical Kubernetes metrics~~ Metric API proposal to cover typical Kubernetes metrics Apr 22, 2024

chrkl self-assigned this Jun 4, 2024

chrkl closed this as completed Jun 10, 2024

This was referenced Jun 17, 2024

Option to disable pod and/or container metrics #1183

Closed

Enhance runtime input with k8sclusterreciver metrics #1184

Closed

This was referenced Jul 25, 2024

Enhance runtime input with selectors for nodes #1300

Closed

Enhance runtime input with selectors for PVC volumes #1301

Closed

shorim mentioned this issue Aug 19, 2024

feat: Add K8s cluster receiver #1343

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metric API proposal to cover typical Kubernetes metrics #1001

Metric API proposal to cover typical Kubernetes metrics #1001

a-thaler commented Apr 22, 2024 •

edited

Loading

chrkl commented Jun 7, 2024 •

edited by a-thaler

Loading

chrkl commented Jun 10, 2024

Metric API proposal to cover typical Kubernetes metrics #1001

Metric API proposal to cover typical Kubernetes metrics #1001

Comments

a-thaler commented Apr 22, 2024 • edited Loading

chrkl commented Jun 7, 2024 • edited by a-thaler Loading

pod

container

node

volume

daemonset

deployment

statefulset

quota

job

hpa

chrkl commented Jun 10, 2024

a-thaler commented Apr 22, 2024 •

edited

Loading

chrkl commented Jun 7, 2024 •

edited by a-thaler

Loading