diff --git a/content/master/concepts/_index.md b/content/master/concepts/_index.md index 20c9d7bd..c5f11ce1 100644 --- a/content/master/concepts/_index.md +++ b/content/master/concepts/_index.md @@ -79,5 +79,5 @@ building and managing external resources through Kubernetes. entire custom platform and define any other Crossplane related requirements. Packages define how to install Providers, custom APIs or composition functions. -* [**Metrics**]({{}}) are essential for monitoring Crossplane's - operations, helping to quickly identify and resolve potential issues. +* [**Metrics**]({{}}) are essential for monitoring Crossplane's + operations, helping to identify and resolve potential issues. diff --git a/content/master/concepts/metrics.md b/content/master/concepts/metrics.md index f4f6fa71..b2303633 100644 --- a/content/master/concepts/metrics.md +++ b/content/master/concepts/metrics.md @@ -4,41 +4,41 @@ weight: 60 description: "Metrics are essential for monitoring Crossplane's operations, helping to quickly identify and resolve potential issues." --- -This page offers explanations of various metrics gathered from Crossplane, which are essential for effective monitoring and alerting within your Crossplane environment. -Understanding these metrics will help you maintain the health and performance of your resources, ensuring that any issues can be quickly identified and addressed. -Please note that this document focuses exclusively on Crossplane-specific metrics and does not cover standard Go metrics. +This page offers explanations of various metrics gathered from Crossplane, which are essential for effective monitoring and alerting in your Crossplane environment. +Understanding these metrics help you maintain the health and performance of your resources, ensuring that any issues can be identified and addressed. +Please note that this document focuses on Crossplane specific metrics and doesn't cover standard Go metrics. -{{}} +{{
}} | Metric Name | Description | Further Explanation | | --- | --- | --- | | {{}}certwatcher_read_certificate_errors_total{{}} | Total number of certificate read errors | | | {{}}certwatcher_read_certificate_total{{}} | Total number of certificate reads | | | {{}}composition_run_function_seconds_bucket{{}} | Histogram of RunFunctionResponse latency (seconds) | | -| {{}}controller_runtime_active_workers{{}} | Number of currently used workers per controller | The number of threads that currently process jobs from the work queue. | -| {{}}controller_runtime_max_concurrent_reconciles{{}} | Maximum number of concurrent reconciles per controller | Describes how many reconciles can happen in parallel. | -| {{}}controller_runtime_reconcile_errors_total{{}} | Total number of reconciliation errors per controller | A counter that counts reconcile errors. Sharp or non-stop rising of this metric might be a problem. | +| {{}}controller_runtime_active_workers{{}} | Number of used workers per controller | The number of threads that currently process jobs from the work queue. | +| {{}}controller_runtime_max_concurrent_reconciles{{}} | Maximum number of concurrent reconciles per controller | Describes how reconciles can happen in parallel. | +| {{}}controller_runtime_reconcile_errors_total{{}} | Total number of reconciliation errors per controller | A counter that counts reconcile errors. Sharp or non stop rising of this metric might be a problem. | | {{}}controller_runtime_reconcile_time_seconds_bucket{{}} | Length of time per reconciliation per controller | | | {{}}controller_runtime_reconcile_total{{}} | Total number of reconciliations per controller | | | {{}}controller_runtime_webhook_latency_seconds_bucket{{}} | Histogram of the latency of processing admission requests | | -| {{}}controller_runtime_webhook_requests_in_flight{{}} | Current number of admission requests being served | | +| {{}}controller_runtime_webhook_requests_in_flight{{}} | Current number of admission requests served | | | {{}}controller_runtime_webhook_requests_total{{}} | Total number of admission requests by HTTP status code | | | {{}}rest_client_requests_total{{}} | Number of HTTP requests, partitioned by status code, method, and host | | -| {{}}workqueue_adds_total{{}} | Total number of adds handled by workqueue | | -| {{}}workqueue_depth{{}} | Current depth of workqueue | | -| {{}}workqueue_longest_running_processor_seconds{{}} | How many seconds has the longest running processor for workqueue been running | | -| {{}}workqueue_queue_duration_seconds_bucket{{}} | How long in seconds an item stays in workqueue before being requested | The time it takes from the moment a job is added to the workqueue until the processing of this job starts. | -| {{}}workqueue_retries_total{{}} | Total number of retries handled by workqueue | | -| {{}}workqueue_unfinished_work_seconds{{}} | How many seconds of work has been done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. | | -| {{}}workqueue_work_duration_seconds_bucket{{}} | How long in seconds processing an item from workqueue takes | The time it takes from the moment the job is picked up until it is finished (either successfully or with an error). | +| {{}}workqueue_adds_total{{}} | Total number of adds handled by `workqueue` | | +| {{}}workqueue_depth{{}} | Current depth of `workqueue` | | +| {{}}workqueue_longest_running_processor_seconds{{}} | How many seconds has the longest running processor for `workqueue` been running | | +| {{}}workqueue_queue_duration_seconds_bucket{{}} | How long in seconds an item stays in `workqueue` before requested | The time it takes from the moment a job enter the `workqueue` until the processing of this job starts. | +| {{}}workqueue_retries_total{{}} | Total number of retries handled by `workqueue` | | +| {{}}workqueue_unfinished_work_seconds{{}} | The number of seconds of work has been done that's in progress and hasn't observed by `work_duration`. Large values means stuck threads. | | +| {{}}workqueue_work_duration_seconds_bucket{{}} | How long in seconds processing an item from `workqueue` takes | The time it takes from the moment the job start until it finish (either successfully or with an error). | | {{}}crossplane_managed_resource_exists{{}} | The number of managed resources that exist | | -| {{}}crossplane_managed_resource_ready{{}} | The number of managed resources in Ready=True state | | -| {{}}crossplane_managed_resource_synced{{}} | The number of managed resources in Synced=True state | | +| {{}}crossplane_managed_resource_ready{{}} | The number of managed resources in `Ready=True` state | | +| {{}}crossplane_managed_resource_synced{{}} | The number of managed resources in `Synced=True` state | | | {{}}upjet_resource_ext_api_duration_bucket{{}} | Measures in seconds how long it takes a Cloud SDK call to complete | | -| {{}}upjet_resource_external_api_calls_total{{}} | The number of external API calls | The number of calls to cloud providers, with labels describing which endpoints resources have been queried. | -| {{}}upjet_resource_reconcile_delay_seconds_bucket{{}} | Measures in seconds how long the reconciles for a resource have been delayed from the configured poll periods | | -| {{}}crossplane_managed_resource_deletion_seconds_bucket{{}} | The time it took for a managed resource to be deleted | | +| {{}}upjet_resource_external_api_calls_total{{}} | The number of external API calls | The number of calls to cloud providers, with labels describing the endpoints resources. | +| {{}}upjet_resource_reconcile_delay_seconds_bucket{{}} | Measures in seconds how long the reconciles for a resource delay from the configured poll periods | | +| {{}}crossplane_managed_resource_deletion_seconds_bucket{{}} | The time it took to delete a managed resource | | | {{}}crossplane_managed_resource_first_time_to_readiness_seconds_bucket{{}} | The time it took for a managed resource to become ready first time after creation | | -| {{}}crossplane_managed_resource_first_time_to_reconcile_seconds_bucket{{}} | The time it took for a managed resource to be detected by the controller | | -| {{}}upjet_resource_ttr_bucket{{}} | Measures in seconds the time-to-readiness (TTR) for managed resources | | +| {{}}crossplane_managed_resource_first_time_to_reconcile_seconds_bucket{{}} | The time it took to detect a managed resource by the controller | | +| {{}}upjet_resource_ttr_bucket{{}} | Measures in seconds the `time-to-readiness` `(TTR)` for managed resources | | {{< /table >}}