From 9a60cd063b905c2e502368b0b2b0dfa8d92b65bf Mon Sep 17 00:00:00 2001 From: shwethamuralikrishnaa Date: Fri, 23 Aug 2024 14:40:40 +0200 Subject: [PATCH] Add readme for Crossplane metrics Signed-off-by: shwethamuralikrishnaa shwetha.muralikrishnaa@sap.com Signed-off-by: shwethamuralikrishnaa --- content/master/concepts/_index.md | 3 ++ content/master/concepts/metrics.md | 44 ++++++++++++++++++++++++++++++ 2 files changed, 47 insertions(+) create mode 100644 content/master/concepts/metrics.md diff --git a/content/master/concepts/_index.md b/content/master/concepts/_index.md index 3c821d9e..20c9d7bd 100644 --- a/content/master/concepts/_index.md +++ b/content/master/concepts/_index.md @@ -78,3 +78,6 @@ building and managing external resources through Kubernetes. * [**Packages**]({{}}) are a convenient way to package up an entire custom platform and define any other Crossplane related requirements. Packages define how to install Providers, custom APIs or composition functions. + +* [**Metrics**]({{}}) are essential for monitoring Crossplane's + operations, helping to quickly identify and resolve potential issues. diff --git a/content/master/concepts/metrics.md b/content/master/concepts/metrics.md new file mode 100644 index 00000000..f4f6fa71 --- /dev/null +++ b/content/master/concepts/metrics.md @@ -0,0 +1,44 @@ +--- +title: Metrics +weight: 60 +description: "Metrics are essential for monitoring Crossplane's operations, helping to quickly identify and resolve potential issues." +--- + +This page offers explanations of various metrics gathered from Crossplane, which are essential for effective monitoring and alerting within your Crossplane environment. +Understanding these metrics will help you maintain the health and performance of your resources, ensuring that any issues can be quickly identified and addressed. +Please note that this document focuses exclusively on Crossplane-specific metrics and does not cover standard Go metrics. + + +{{}} +| Metric Name | Description | Further Explanation | +| --- | --- | --- | +| {{}}certwatcher_read_certificate_errors_total{{}} | Total number of certificate read errors | | +| {{}}certwatcher_read_certificate_total{{}} | Total number of certificate reads | | +| {{}}composition_run_function_seconds_bucket{{}} | Histogram of RunFunctionResponse latency (seconds) | | +| {{}}controller_runtime_active_workers{{}} | Number of currently used workers per controller | The number of threads that currently process jobs from the work queue. | +| {{}}controller_runtime_max_concurrent_reconciles{{}} | Maximum number of concurrent reconciles per controller | Describes how many reconciles can happen in parallel. | +| {{}}controller_runtime_reconcile_errors_total{{}} | Total number of reconciliation errors per controller | A counter that counts reconcile errors. Sharp or non-stop rising of this metric might be a problem. | +| {{}}controller_runtime_reconcile_time_seconds_bucket{{}} | Length of time per reconciliation per controller | | +| {{}}controller_runtime_reconcile_total{{}} | Total number of reconciliations per controller | | +| {{}}controller_runtime_webhook_latency_seconds_bucket{{}} | Histogram of the latency of processing admission requests | | +| {{}}controller_runtime_webhook_requests_in_flight{{}} | Current number of admission requests being served | | +| {{}}controller_runtime_webhook_requests_total{{}} | Total number of admission requests by HTTP status code | | +| {{}}rest_client_requests_total{{}} | Number of HTTP requests, partitioned by status code, method, and host | | +| {{}}workqueue_adds_total{{}} | Total number of adds handled by workqueue | | +| {{}}workqueue_depth{{}} | Current depth of workqueue | | +| {{}}workqueue_longest_running_processor_seconds{{}} | How many seconds has the longest running processor for workqueue been running | | +| {{}}workqueue_queue_duration_seconds_bucket{{}} | How long in seconds an item stays in workqueue before being requested | The time it takes from the moment a job is added to the workqueue until the processing of this job starts. | +| {{}}workqueue_retries_total{{}} | Total number of retries handled by workqueue | | +| {{}}workqueue_unfinished_work_seconds{{}} | How many seconds of work has been done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. | | +| {{}}workqueue_work_duration_seconds_bucket{{}} | How long in seconds processing an item from workqueue takes | The time it takes from the moment the job is picked up until it is finished (either successfully or with an error). | +| {{}}crossplane_managed_resource_exists{{}} | The number of managed resources that exist | | +| {{}}crossplane_managed_resource_ready{{}} | The number of managed resources in Ready=True state | | +| {{}}crossplane_managed_resource_synced{{}} | The number of managed resources in Synced=True state | | +| {{}}upjet_resource_ext_api_duration_bucket{{}} | Measures in seconds how long it takes a Cloud SDK call to complete | | +| {{}}upjet_resource_external_api_calls_total{{}} | The number of external API calls | The number of calls to cloud providers, with labels describing which endpoints resources have been queried. | +| {{}}upjet_resource_reconcile_delay_seconds_bucket{{}} | Measures in seconds how long the reconciles for a resource have been delayed from the configured poll periods | | +| {{}}crossplane_managed_resource_deletion_seconds_bucket{{}} | The time it took for a managed resource to be deleted | | +| {{}}crossplane_managed_resource_first_time_to_readiness_seconds_bucket{{}} | The time it took for a managed resource to become ready first time after creation | | +| {{}}crossplane_managed_resource_first_time_to_reconcile_seconds_bucket{{}} | The time it took for a managed resource to be detected by the controller | | +| {{}}upjet_resource_ttr_bucket{{}} | Measures in seconds the time-to-readiness (TTR) for managed resources | | +{{< /table >}}