Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📖 (doc): Add a doc as a guidance to help users know how to consume the metrics and integrate it with other solutions #1524

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

camilamacedo86
Copy link
Contributor

@camilamacedo86 camilamacedo86 commented Dec 13, 2024

  • Improved Metrics Consumption Documentation:
    • Added a comprehensive guide for consuming metrics exposed by Operator-Controller and CatalogD services.
    • Detailed steps to enable metrics, validate access using curl, and integrate securely with Prometheus Operator using ServiceMonitor

See the commands in this guidance executed with success

Operator-Controller Metrics

$ kubectl create clusterrolebinding operator-controller-metrics-binding \
>    --clusterrole=operator-controller-metrics-reader \
>    --serviceaccount=olmv1-system:operator-controller-controller-manager
clusterrolebinding.rbac.authorization.k8s.io/operator-controller-metrics-binding created

$ TOKEN=$(kubectl create token operator-controller-controller-manager -n olmv1-system)

echo $TOKEN
<VALUE HIDDEN>

$ kubectl apply -f - <<EOF
> apiVersion: v1
> kind: Pod
> metadata:
>   name: curl-metrics
>   namespace: olmv1-system
> spec:
>   serviceAccountName: operator-controller-controller-manager
>   containers:
>   - name: curl
>     image: curlimages/curl:latest
>     command:
>     - sh
>     - -c
>     - sleep 3600
>     securityContext:
>       runAsNonRoot: true
>       readOnlyRootFilesystem: true
>       runAsUser: 1000
>       runAsGroup: 1000
>       allowPrivilegeEscalation: false
>       capabilities:
>         drop:
>         - ALL
>     volumeMounts:
>     - mountPath: /tmp/cert
>       name: olm-cert
>       readOnly: true
>   volumes:
>   - name: olm-cert
>     secret:
>       secretName: olmv1-cert
>   securityContext:
>     runAsNonRoot: true
>   restartPolicy: Never
> EOF
pod/curl-metrics created

$ kubectl exec -it curl-metrics -n olmv1-system -- sh

/home/curl_user $ curl -v -k -H "Authorization: Bearer <TOKEN HIDEN VALUE>" https://operator-controller-service.o
lmv1-system.svc.cluster.local:8443/metrics
* Host operator-controller-service.olmv1-system.svc.cluster.local:8443 was resolved.
* IPv6: (none)
* IPv4: 10.96.239.246
*   Trying 10.96.239.246:8443...
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256 / x25519 / id-ecPublicKey
* ALPN: server accepted http/1.1
* Server certificate:
*  subject: 
*  start date: Feb  6 18:10:38 2025 GMT
*  expire date: May  7 18:10:38 2025 GMT
*  issuer: CN=olmv1-ca
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.

 $ ls -la /tmp/cert/
total 4
drwxrwxrwt    3 root     root           140 Feb  6 19:39 .
drwxrwxrwt    1 root     root          4096 Feb  6 19:39 ..
drwxr-xr-x    2 root     root           100 Feb  6 19:39 ..2025_02_06_19_39_47.4112203720
lrwxrwxrwx    1 root     root            32 Feb  6 19:39 ..data -> ..2025_02_06_19_39_47.4112203720
lrwxrwxrwx    1 root     root            13 Feb  6 19:39 ca.crt -> ..data/ca.crt
lrwxrwxrwx    1 root     root            14 Feb  6 19:39 tls.crt -> ..data/tls.crt
lrwxrwxrwx    1 root     root            14 Feb  6 19:39 tls.key -> ..data/tls.key


$ curl -v --cacert /tmp/cert/ca.crt --cert /tmp/cert/tls.crt --key /tmp/cert/tls.key -H "<HIDDEN VALUE TOKEN>" https
://operator-controller-service.olmv1-system.svc.cluster.local:8443/metrics
* Host operator-controller-service.olmv1-system.svc.cluster.local:8443 was resolved.
* IPv6: (none)
* IPv4: 10.96.239.246
*   Trying 10.96.239.246:8443...
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
*  CAfile: /tmp/cert/ca.crt
*  CApath: none
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256 / x25519 / id-ecPublicKey
* ALPN: server accepted http/1.1
* Server certificate:
*  subject: 
*  start date: Feb  6 18:10:38 2025 GMT
*  expire date: May  7 18:10:38 2025 GMT
*  subjectAltName: host "operator-controller-service.olmv1-system.svc.cluster.local" matched cert's "operator-controller-service.olmv1-system.svc.cluster.local"
*  issuer: CN=olmv1-ca
*  SSL certificate verify ok.
*   Certificate level 0: Public key type EC/prime256v1 (256/128 Bits/secBits), signed using ecdsa-with-SHA256
*   Certificate level 1: Public key type EC/prime256v1 (256/128 Bits/secBits), signed using ecdsa-with-SHA256
* Connected to operator-controller-service.olmv1-system.svc.cluster.local (10.96.239.246) port 8443
* using HTTP/1.x
> GET /metrics HTTP/1.1
> Host: operator-controller-service.olmv1-system.svc.cluster.local:8443
> User-Agent: curl/8.12.0
> Accept: */*
> Authorization: Bearer <HIDDEN VALUE TOKEN>
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* Request completely sent off
< HTTP/1.1 200 OK
< Content-Type: text/plain; version=0.0.4; charset=utf-8; escaping=values
< Date: Thu, 06 Feb 2025 19:59:39 GMT
< Transfer-Encoding: chunked
< 
# HELP certwatcher_read_certificate_errors_total Total number of certificate read errors
# TYPE certwatcher_read_certificate_errors_total counter
certwatcher_read_certificate_errors_total 0
# HELP certwatcher_read_certificate_total Total number of certificate reads
# TYPE certwatcher_read_certificate_total counter
certwatcher_read_certificate_total 182
# HELP controller_runtime_active_workers Number of currently used workers per controller
# TYPE controller_runtime_active_workers gauge
controller_runtime_active_workers{controller="clustercatalog"} 0
controller_runtime_active_workers{controller="clusterextension"} 0
# HELP controller_runtime_max_concurrent_reconciles Maximum number of concurrent reconciles per controller
# TYPE controller_runtime_max_concurrent_reconciles gauge
controller_runtime_max_concurrent_reconciles{controller="clustercatalog"} 1
controller_runtime_max_concurrent_reconciles{controller="clusterextension"} 1
# HELP controller_runtime_reconcile_errors_total Total number of reconciliation errors per controller
# TYPE controller_runtime_reconcile_errors_total counter
controller_runtime_reconcile_errors_total{controller="clustercatalog"} 0
controller_runtime_reconcile_errors_total{controller="clusterextension"} 0
# HELP controller_runtime_reconcile_panics_total Total number of reconciliation panics per controller
# TYPE controller_runtime_reconcile_panics_total counter
controller_runtime_reconcile_panics_total{controller="clustercatalog"} 0
controller_runtime_reconcile_panics_total{controller="clusterextension"} 0
# HELP controller_runtime_reconcile_time_seconds Length of time per reconciliation per controller
# TYPE controller_runtime_reconcile_time_seconds histogram
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.005"} 2
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.01"} 2
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.025"} 2
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.05"} 2
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.1"} 2
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.15"} 2
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.2"} 2
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.25"} 2
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.3"} 2
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.35"} 2
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.4"} 2
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.45"} 2
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.5"} 2
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.6"} 2
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.7"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.8"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.9"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="1"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="1.25"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="1.5"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="1.75"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="2"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="2.5"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="3"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="3.5"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="4"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="4.5"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="5"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="6"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="7"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="8"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="9"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="10"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="15"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="20"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="25"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="30"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="40"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="50"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="60"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="+Inf"} 5
controller_runtime_reconcile_time_seconds_sum{controller="clustercatalog"} 2.0012384990000003
controller_runtime_reconcile_time_seconds_count{controller="clustercatalog"} 5
# HELP controller_runtime_reconcile_total Total number of reconciliations per controller
# TYPE controller_runtime_reconcile_total counter
controller_runtime_reconcile_total{controller="clustercatalog",result="error"} 0
controller_runtime_reconcile_total{controller="clustercatalog",result="requeue"} 0
controller_runtime_reconcile_total{controller="clustercatalog",result="requeue_after"} 0
controller_runtime_reconcile_total{controller="clustercatalog",result="success"} 5
controller_runtime_reconcile_total{controller="clusterextension",result="error"} 0
controller_runtime_reconcile_total{controller="clusterextension",result="requeue"} 0
controller_runtime_reconcile_total{controller="clusterextension",result="requeue_after"} 0
controller_runtime_reconcile_total{controller="clusterextension",result="success"} 0
# HELP controller_runtime_terminal_reconcile_errors_total Total number of terminal reconciliation errors per controller
# TYPE controller_runtime_terminal_reconcile_errors_total counter
controller_runtime_terminal_reconcile_errors_total{controller="clustercatalog"} 0
controller_runtime_terminal_reconcile_errors_total{controller="clusterextension"} 0
# HELP controller_runtime_webhook_panics_total Total number of webhook panics
# TYPE controller_runtime_webhook_panics_total counter
controller_runtime_webhook_panics_total 0
# HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 1.5417e-05
go_gc_duration_seconds{quantile="0.25"} 4.5082e-05
go_gc_duration_seconds{quantile="0.5"} 5.7084e-05
go_gc_duration_seconds{quantile="0.75"} 7.7666e-05
go_gc_duration_seconds{quantile="1"} 0.00097725
go_gc_duration_seconds_sum 0.01147825
go_gc_duration_seconds_count 91
# HELP go_gc_gogc_percent Heap size target percentage configured by the user, otherwise 100. This value is set by the GOGC environment variable, and the runtime/debug.SetGCPercent function. Sourced from /gc/gogc:percent
# TYPE go_gc_gogc_percent gauge
go_gc_gogc_percent 100
# HELP go_gc_gomemlimit_bytes Go runtime memory limit configured by the user, otherwise math.MaxInt64. This value is set by the GOMEMLIMIT environment variable, and the runtime/debug.SetMemoryLimit function. Sourced from /gc/gomemlimit:bytes
# TYPE go_gc_gomemlimit_bytes gauge
go_gc_gomemlimit_bytes 9.223372036854776e+18
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 74
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.23.6"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated in heap and currently in use. Equals to /memory/classes/heap/objects:bytes.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 6.307e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated in heap until now, even if released already. Equals to /gc/heap/allocs:bytes.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 5.01226256e+08
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table. Equals to /memory/classes/profiling/buckets:bytes.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.570115e+06
# HELP go_memstats_frees_total Total number of heap objects frees. Equals to /gc/heap/frees:objects + /gc/heap/tiny/allocs:objects.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 6.630091e+06
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata. Equals to /memory/classes/metadata/other:bytes.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 3.927888e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and currently in use, same as go_memstats_alloc_bytes. Equals to /memory/classes/heap/objects:bytes.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 6.307e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used. Equals to /memory/classes/heap/released:bytes + /memory/classes/heap/free:bytes.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 1.7276928e+07
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use. Equals to /memory/classes/heap/objects:bytes + /memory/classes/heap/unused:bytes
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 1.0215424e+07
# HELP go_memstats_heap_objects Number of currently allocated objects. Equals to /gc/heap/objects:objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 32116
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS. Equals to /memory/classes/heap/released:bytes.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 1.6474112e+07
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system. Equals to /memory/classes/heap/objects:bytes + /memory/classes/heap/unused:bytes + /memory/classes/heap/released:bytes + /memory/classes/heap/free:bytes.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 2.7492352e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.7388718963945072e+09
# HELP go_memstats_mallocs_total Total number of heap objects allocated, both live and gc-ed. Semantically a counter version for go_memstats_heap_objects gauge. Equals to /gc/heap/allocs:objects + /gc/heap/tiny/allocs:objects.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 6.662207e+06
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures. Equals to /memory/classes/metadata/mcache/inuse:bytes.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 13200
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system. Equals to /memory/classes/metadata/mcache/inuse:bytes + /memory/classes/metadata/mcache/free:bytes.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 15600
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures. Equals to /memory/classes/metadata/mspan/inuse:bytes.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 300320
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system. Equals to /memory/classes/metadata/mspan/inuse:bytes + /memory/classes/metadata/mspan/free:bytes.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 424320
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place. Equals to /gc/heap/goal:bytes.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 1.3188984e+07
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations. Equals to /memory/classes/other:bytes.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 2.275845e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes obtained from system for stack allocator in non-CGO environments. Equals to /memory/classes/heap/stacks:bytes.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 1.867776e+06
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator. Equals to /memory/classes/heap/stacks:bytes + /memory/classes/os-stacks:bytes.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 1.867776e+06
# HELP go_memstats_sys_bytes Number of bytes obtained from system. Equals to /memory/classes/total:byte.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 3.7573896e+07
# HELP go_sched_gomaxprocs_threads The current runtime.GOMAXPROCS setting, or the number of operating system threads that can execute user-level Go code simultaneously. Sourced from /sched/gomaxprocs:threads
# TYPE go_sched_gomaxprocs_threads gauge
go_sched_gomaxprocs_threads 11
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 16
# HELP leader_election_master_status Gauge of if the reporting system is master of the relevant lease, 0 indicates backup, 1 indicates master. 'name' is the string used to identify the lease. Please make sure to group by name.
# TYPE leader_election_master_status gauge
leader_election_master_status{name="9c4404e7.operatorframework.io"} 1
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 5.6
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP process_network_receive_bytes_total Number of bytes received by the process over the network.
# TYPE process_network_receive_bytes_total counter
process_network_receive_bytes_total 2.3431279e+07
# HELP process_network_transmit_bytes_total Number of bytes sent by the process over the network.
# TYPE process_network_transmit_bytes_total counter
process_network_transmit_bytes_total 287708
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 13
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 9.7468416e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.73887016829e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 1.631985664e+09
# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes 1.8446744073709552e+19
# HELP rest_client_requests_total Number of HTTP requests, partitioned by status code, method, and host.
# TYPE rest_client_requests_total counter
rest_client_requests_total{code="200",host="10.96.0.1:443",method="GET"} 18
rest_client_requests_total{code="200",host="10.96.0.1:443",method="PUT"} 70
rest_client_requests_total{code="201",host="10.96.0.1:443",method="POST"} 7
rest_client_requests_total{code="404",host="10.96.0.1:443",method="GET"} 1
# HELP workqueue_adds_total Total number of adds handled by workqueue
# TYPE workqueue_adds_total counter
workqueue_adds_total{controller="clustercatalog",name="clustercatalog"} 5
workqueue_adds_total{controller="clusterextension",name="clusterextension"} 0
# HELP workqueue_depth Current depth of workqueue
# TYPE workqueue_depth gauge
workqueue_depth{controller="clustercatalog",name="clustercatalog"} 0
workqueue_depth{controller="clusterextension",name="clusterextension"} 0
# HELP workqueue_longest_running_processor_seconds How many seconds has the longest running processor for workqueue been running.
# TYPE workqueue_longest_running_processor_seconds gauge
workqueue_longest_running_processor_seconds{controller="clustercatalog",name="clustercatalog"} 0
workqueue_longest_running_processor_seconds{controller="clusterextension",name="clusterextension"} 0
# HELP workqueue_queue_duration_seconds How long in seconds an item stays in workqueue before being requested
# TYPE workqueue_queue_duration_seconds histogram
workqueue_queue_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="1e-08"} 0
workqueue_queue_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="1e-07"} 0
workqueue_queue_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="1e-06"} 0
workqueue_queue_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="9.999999999999999e-06"} 2
workqueue_queue_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="9.999999999999999e-05"} 3
workqueue_queue_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="0.001"} 4
workqueue_queue_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="0.01"} 5
workqueue_queue_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="0.1"} 5
workqueue_queue_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="1"} 5
workqueue_queue_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="10"} 5
workqueue_queue_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="100"} 5
workqueue_queue_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="1000"} 5
workqueue_queue_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="+Inf"} 5
workqueue_queue_duration_seconds_sum{controller="clustercatalog",name="clustercatalog"} 0.0023819590000000003
workqueue_queue_duration_seconds_count{controller="clustercatalog",name="clustercatalog"} 5
workqueue_queue_duration_seconds_bucket{controller="clusterextension",name="clusterextension",le="1e-08"} 0
workqueue_queue_duration_seconds_bucket{controller="clusterextension",name="clusterextension",le="1e-07"} 0
workqueue_queue_duration_seconds_bucket{controller="clusterextension",name="clusterextension",le="1e-06"} 0
workqueue_queue_duration_seconds_bucket{controller="clusterextension",name="clusterextension",le="9.999999999999999e-06"} 0
workqueue_queue_duration_seconds_bucket{controller="clusterextension",name="clusterextension",le="9.999999999999999e-05"} 0
workqueue_queue_duration_seconds_bucket{controller="clusterextension",name="clusterextension",le="0.001"} 0
workqueue_queue_duration_seconds_bucket{controller="clusterextension",name="clusterextension",le="0.01"} 0
workqueue_queue_duration_seconds_bucket{controller="clusterextension",name="clusterextension",le="0.1"} 0
workqueue_queue_duration_seconds_bucket{controller="clusterextension",name="clusterextension",le="1"} 0
workqueue_queue_duration_seconds_bucket{controller="clusterextension",name="clusterextension",le="10"} 0
workqueue_queue_duration_seconds_bucket{controller="clusterextension",name="clusterextension",le="100"} 0
workqueue_queue_duration_seconds_bucket{controller="clusterextension",name="clusterextension",le="1000"} 0
workqueue_queue_duration_seconds_bucket{controller="clusterextension",name="clusterextension",le="+Inf"} 0
workqueue_queue_duration_seconds_sum{controller="clusterextension",name="clusterextension"} 0
workqueue_queue_duration_seconds_count{controller="clusterextension",name="clusterextension"} 0
# HELP workqueue_retries_total Total number of retries handled by workqueue
# TYPE workqueue_retries_total counter
workqueue_retries_total{controller="clustercatalog",name="clustercatalog"} 0
workqueue_retries_total{controller="clusterextension",name="clusterextension"} 0
# HELP workqueue_unfinished_work_seconds How many seconds of work has been done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases.
# TYPE workqueue_unfinished_work_seconds gauge
workqueue_unfinished_work_seconds{controller="clustercatalog",name="clustercatalog"} 0
workqueue_unfinished_work_seconds{controller="clusterextension",name="clusterextension"} 0
# HELP workqueue_work_duration_seconds How long in seconds processing an item from workqueue takes.
# TYPE workqueue_work_duration_seconds histogram
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="1e-08"} 0
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="1e-07"} 0
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="1e-06"} 0
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="9.999999999999999e-06"} 0
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="9.999999999999999e-05"} 1
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="0.001"} 1
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="0.01"} 2
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="0.1"} 2
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="1"} 5
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="10"} 5
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="100"} 5
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="1000"} 5
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="+Inf"} 5
workqueue_work_duration_seconds_sum{controller="clustercatalog",name="clustercatalog"} 2.0018936270000003
workqueue_work_duration_seconds_count{controller="clustercatalog",name="clustercatalog"} 5
workqueue_work_duration_seconds_bucket{controller="clusterextension",name="clusterextension",le="1e-08"} 0
workqueue_work_duration_seconds_bucket{controller="clusterextension",name="clusterextension",le="1e-07"} 0
workqueue_work_duration_seconds_bucket{controller="clusterextension",name="clusterextension",le="1e-06"} 0
workqueue_work_duration_seconds_bucket{controller="clusterextension",name="clusterextension",le="9.999999999999999e-06"} 0
workqueue_work_duration_seconds_bucket{controller="clusterextension",name="clusterextension",le="9.999999999999999e-05"} 0
workqueue_work_duration_seconds_bucket{controller="clusterextension",name="clusterextension",le="0.001"} 0
workqueue_work_duration_seconds_bucket{controller="clusterextension",name="clusterextension",le="0.01"} 0
workqueue_work_duration_seconds_bucket{controller="clusterextension",name="clusterextension",le="0.1"} 0
workqueue_work_duration_seconds_bucket{controller="clusterextension",name="clusterextension",le="1"} 0
workqueue_work_duration_seconds_bucket{controller="clusterextension",name="clusterextension",le="10"} 0
workqueue_work_duration_seconds_bucket{controller="clusterextension",name="clusterextension",le="100"} 0
workqueue_work_duration_seconds_bucket{controller="clusterextension",name="clusterextension",le="1000"} 0
workqueue_work_duration_seconds_bucket{controller="clusterextension",name="clusterextension",le="+Inf"} 0
workqueue_work_duration_seconds_sum{controller="clusterextension",name="clusterextension"} 0
workqueue_work_duration_seconds_count{controller="clusterextension",name="clusterextension"} 0
* Connection #0 to host operator-controller-service.olmv1-system.svc.cluster.local left intact

CatalogD Metrics

$ kubectl create clusterrolebinding catalogd-metrics-binding \
>    --clusterrole=catalogd-metrics-reader \
>    --serviceaccount=olmv1-system:catalogd-controller-manager
clusterrolebinding.rbac.authorization.k8s.io/catalogd-metrics-binding created

TOKEN=$(kubectl create token catalogd-controller-manager -n olmv1-system)
camilam@Camilas-MacBook-Pro ~/go/src/github/operator-framework/operator-controller (remove-monitor-prometheues) $ echo $TOKEN
<HIDDEN VALUE>


$ OLM_SECRET=$(kubectl get secret -n olmv1-system -o jsonpath="{.items[*].metadata.name}" | tr ' ' '\n' | grep '^catalogd-service-cert')
camilam@Camilas-MacBook-Pro ~/go/src/github/operator-framework/operator-controller (remove-monitor-prometheues) $ echo $OLM_SECRET
catalogd-service-cert-v1.2.0-rc4-19-g099a6cf

$ kubectl apply -f - <<EOF
> apiVersion: v1
> kind: Pod
> metadata:
>   name: curl-metrics-catalogd
>   namespace: olmv1-system
> spec:
>   serviceAccountName: catalogd-controller-manager
>   containers:
>   - name: curl
>     image: curlimages/curl:latest
>     command:
>     - sh
>     - -c
>     - sleep 3600
>     securityContext:
>       runAsNonRoot: true
>       readOnlyRootFilesystem: true
>       runAsUser: 1000
>       runAsGroup: 1000
>       allowPrivilegeEscalation: false
>       capabilities:
>         drop:
>         - ALL
>     volumeMounts:
>     - mountPath: /tmp/cert
>       name: catalogd-cert
>       readOnly: true
>   volumes:
>   - name: catalogd-cert
>     secret:
>       secretName: $OLM_SECRET
>   securityContext:
>     runAsNonRoot: true
>   restartPolicy: Never
> EOF
pod/curl-metrics-catalogd created

$ kubectl exec -it curl-metrics-catalogd -n olmv1-system -- sh

$ curl -v -k -H "Authorization: Bearer HIDDEN VALUE" https://catalogd-service.olmv1-system.svc.cluster.local:7443/metrics
* Host catalogd-service.olmv1-system.svc.cluster.local:7443 was resolved.
* IPv6: (none)
* IPv4: 10.96.37.5
*   Trying 10.96.37.5:7443...
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256 / x25519 / id-ecPublicKey
* ALPN: server accepted http/1.1
* Server certificate:
*  subject: 
*  start date: Feb  6 18:10:38 2025 GMT
*  expire date: May  7 18:10:38 2025 GMT
*  issuer: CN=olmv1-ca
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
*   Certificate level 0: Public key type EC/prime256v1 (256/128 Bits/secBits), signed using ecdsa-with-SHA256
* Connected to catalogd-service.olmv1-system.svc.cluster.local (10.96.37.5) port 7443
* using HTTP/1.x
> GET /metrics HTTP/1.1
> Host: catalogd-service.olmv1-system.svc.cluster.local:7443
> User-Agent: curl/8.12.0
> Accept: */*
> Authorization: Bearer <HIDDEN VALUE>
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* Request completely sent off
< HTTP/1.1 200 OK
< Content-Type: text/plain; version=0.0.4; charset=utf-8; escaping=values
< Date: Thu, 06 Feb 2025 20:07:49 GMT
< Transfer-Encoding: chunked
< 
# HELP catalogd_http_request_duration_seconds Histogram of request duration in seconds
# TYPE catalogd_http_request_duration_seconds histogram
catalogd_http_request_duration_seconds_bucket{code="200",le="0.1"} 0
catalogd_http_request_duration_seconds_bucket{code="200",le="0.2"} 0
catalogd_http_request_duration_seconds_bucket{code="200",le="0.3"} 3



$ curl -v --cacert /tmp/cert/ca.crt --cert /tmp/cert/tls.crt --key /tmp/cert/tls.key -H "Authorization: Bearer <HIDDEN VALUE>" https://catalogd-service.olmv1-sys
tem.svc.cluster.local:7443/metrics
* Host catalogd-service.olmv1-system.svc.cluster.local:7443 was resolved.
* IPv6: (none)
* IPv4: 10.96.37.5
*   Trying 10.96.37.5:7443...
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
*  CAfile: /tmp/cert/ca.crt
*  CApath: none
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256 / x25519 / id-ecPublicKey
* ALPN: server accepted http/1.1
* Server certificate:
*  subject: 
*  start date: Feb  6 18:10:38 2025 GMT
*  expire date: May  7 18:10:38 2025 GMT
*  subjectAltName: host "catalogd-service.olmv1-system.svc.cluster.local" matched cert's "catalogd-service.olmv1-system.svc.cluster.local"
*  issuer: CN=olmv1-ca
*  SSL certificate verify ok.
*   Certificate level 0: Public key type EC/prime256v1 (256/128 Bits/secBits), signed using ecdsa-with-SHA256
*   Certificate level 1: Public key type EC/prime256v1 (256/128 Bits/secBits), signed using ecdsa-with-SHA256
* Connected to catalogd-service.olmv1-system.svc.cluster.local (10.96.37.5) port 7443
* using HTTP/1.x
> GET /metrics HTTP/1.1
> Host: catalogd-service.olmv1-system.svc.cluster.local:7443
> User-Agent: curl/8.12.0
> Accept: */*
> Authorization: Bearer <HIDDEN VALUE>
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* Request completely sent off
< HTTP/1.1 200 OK
< Content-Type: text/plain; version=0.0.4; charset=utf-8; escaping=values
< Date: Thu, 06 Feb 2025 20:09:16 GMT
< Transfer-Encoding: chunked
< 
# HELP catalogd_http_request_duration_seconds Histogram of request duration in seconds
# TYPE catalogd_http_request_duration_seconds histogram
catalogd_http_request_duration_seconds_bucket{code="200",le="0.1"} 0
catalogd_http_request_duration_seconds_bucket{code="200",le="0.2"} 0
catalogd_http_request_duration_seconds_bucket{code="200",le="0.3"} 3
catalogd_http_request_duration_seconds_bucket{code="200",le="0.4"} 3
catalogd_http_request_duration_seconds_bucket{code="200",le="0.5"} 3
catalogd_http_request_duration_seconds_bucket{code="200",le="0.6"} 3
catalogd_http_request_duration_seconds_bucket{code="200",le="0.7"} 3
catalogd_http_request_duration_seconds_bucket{code="200",le="0.8"} 3
catalogd_http_request_duration_seconds_bucket{code="200",le="0.9"} 3
catalogd_http_request_duration_seconds_bucket{code="200",le="1"} 3
catalogd_http_request_duration_seconds_bucket{code="200",le="1.2"} 3
catalogd_http_request_duration_seconds_bucket{code="200",le="1.6"} 3
catalogd_http_request_duration_seconds_bucket{code="200",le="2"} 3
catalogd_http_request_duration_seconds_bucket{code="200",le="2.4"} 3
catalogd_http_request_duration_seconds_bucket{code="200",le="2.8"} 3
catalogd_http_request_duration_seconds_bucket{code="200",le="3.2"} 3
catalogd_http_request_duration_seconds_bucket{code="200",le="3.6"} 3
catalogd_http_request_duration_seconds_bucket{code="200",le="4"} 3
catalogd_http_request_duration_seconds_bucket{code="200",le="10"} 3
catalogd_http_request_duration_seconds_bucket{code="200",le="+Inf"} 3
catalogd_http_request_duration_seconds_sum{code="200"} 0.73957221
catalogd_http_request_duration_seconds_count{code="200"} 3
# HELP certwatcher_read_certificate_errors_total Total number of certificate read errors
# TYPE certwatcher_read_certificate_errors_total counter
certwatcher_read_certificate_errors_total 0
# HELP certwatcher_read_certificate_total Total number of certificate reads
# TYPE certwatcher_read_certificate_total counter
certwatcher_read_certificate_total 239
# HELP controller_runtime_active_workers Number of currently used workers per controller
# TYPE controller_runtime_active_workers gauge
controller_runtime_active_workers{controller="clustercatalog"} 0
# HELP controller_runtime_max_concurrent_reconciles Maximum number of concurrent reconciles per controller
# TYPE controller_runtime_max_concurrent_reconciles gauge
controller_runtime_max_concurrent_reconciles{controller="clustercatalog"} 1
# HELP controller_runtime_reconcile_errors_total Total number of reconciliation errors per controller
# TYPE controller_runtime_reconcile_errors_total counter
controller_runtime_reconcile_errors_total{controller="clustercatalog"} 0
# HELP controller_runtime_reconcile_panics_total Total number of reconciliation panics per controller
# TYPE controller_runtime_reconcile_panics_total counter
controller_runtime_reconcile_panics_total{controller="clustercatalog"} 0
# HELP controller_runtime_reconcile_time_seconds Length of time per reconciliation per controller
# TYPE controller_runtime_reconcile_time_seconds histogram
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.005"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.01"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.025"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.05"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.1"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.15"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.2"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.25"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.3"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.35"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.4"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.45"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.5"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.6"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.7"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.8"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="0.9"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="1"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="1.25"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="1.5"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="1.75"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="2"} 5
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="2.5"} 6
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="3"} 6
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="3.5"} 6
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="4"} 6
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="4.5"} 6
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="5"} 6
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="6"} 6
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="7"} 7
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="8"} 7
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="9"} 8
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="10"} 8
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="15"} 9
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="20"} 9
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="25"} 9
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="30"} 9
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="40"} 9
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="50"} 9
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="60"} 9
controller_runtime_reconcile_time_seconds_bucket{controller="clustercatalog",le="+Inf"} 9
controller_runtime_reconcile_time_seconds_sum{controller="clustercatalog"} 27.022783096000005
controller_runtime_reconcile_time_seconds_count{controller="clustercatalog"} 9
# HELP controller_runtime_reconcile_total Total number of reconciliations per controller
# TYPE controller_runtime_reconcile_total counter
controller_runtime_reconcile_total{controller="clustercatalog",result="error"} 0
controller_runtime_reconcile_total{controller="clustercatalog",result="requeue"} 0
controller_runtime_reconcile_total{controller="clustercatalog",result="requeue_after"} 8
controller_runtime_reconcile_total{controller="clustercatalog",result="success"} 1
# HELP controller_runtime_terminal_reconcile_errors_total Total number of terminal reconciliation errors per controller
# TYPE controller_runtime_terminal_reconcile_errors_total counter
controller_runtime_terminal_reconcile_errors_total{controller="clustercatalog"} 0
# HELP controller_runtime_webhook_latency_seconds Histogram of the latency of processing admission requests
# TYPE controller_runtime_webhook_latency_seconds histogram
controller_runtime_webhook_latency_seconds_bucket{webhook="/mutate-olm-operatorframework-io-v1-clustercatalog",le="0.005"} 1
controller_runtime_webhook_latency_seconds_bucket{webhook="/mutate-olm-operatorframework-io-v1-clustercatalog",le="0.01"} 1
controller_runtime_webhook_latency_seconds_bucket{webhook="/mutate-olm-operatorframework-io-v1-clustercatalog",le="0.025"} 1
controller_runtime_webhook_latency_seconds_bucket{webhook="/mutate-olm-operatorframework-io-v1-clustercatalog",le="0.05"} 1
controller_runtime_webhook_latency_seconds_bucket{webhook="/mutate-olm-operatorframework-io-v1-clustercatalog",le="0.1"} 1
controller_runtime_webhook_latency_seconds_bucket{webhook="/mutate-olm-operatorframework-io-v1-clustercatalog",le="0.25"} 1
controller_runtime_webhook_latency_seconds_bucket{webhook="/mutate-olm-operatorframework-io-v1-clustercatalog",le="0.5"} 1
controller_runtime_webhook_latency_seconds_bucket{webhook="/mutate-olm-operatorframework-io-v1-clustercatalog",le="1"} 1
controller_runtime_webhook_latency_seconds_bucket{webhook="/mutate-olm-operatorframework-io-v1-clustercatalog",le="2.5"} 1
controller_runtime_webhook_latency_seconds_bucket{webhook="/mutate-olm-operatorframework-io-v1-clustercatalog",le="5"} 1
controller_runtime_webhook_latency_seconds_bucket{webhook="/mutate-olm-operatorframework-io-v1-clustercatalog",le="10"} 1
controller_runtime_webhook_latency_seconds_bucket{webhook="/mutate-olm-operatorframework-io-v1-clustercatalog",le="+Inf"} 1
controller_runtime_webhook_latency_seconds_sum{webhook="/mutate-olm-operatorframework-io-v1-clustercatalog"} 0.00074775
controller_runtime_webhook_latency_seconds_count{webhook="/mutate-olm-operatorframework-io-v1-clustercatalog"} 1
# HELP controller_runtime_webhook_panics_total Total number of webhook panics
# TYPE controller_runtime_webhook_panics_total counter
controller_runtime_webhook_panics_total 0
# HELP controller_runtime_webhook_requests_in_flight Current number of admission requests being served.
# TYPE controller_runtime_webhook_requests_in_flight gauge
controller_runtime_webhook_requests_in_flight{webhook="/mutate-olm-operatorframework-io-v1-clustercatalog"} 0
# HELP controller_runtime_webhook_requests_total Total number of admission requests by HTTP status code.
# TYPE controller_runtime_webhook_requests_total counter
controller_runtime_webhook_requests_total{code="200",webhook="/mutate-olm-operatorframework-io-v1-clustercatalog"} 1
controller_runtime_webhook_requests_total{code="500",webhook="/mutate-olm-operatorframework-io-v1-clustercatalog"} 0
# HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 1.0042e-05
go_gc_duration_seconds{quantile="0.25"} 0.000109126
go_gc_duration_seconds{quantile="0.5"} 0.000406292
go_gc_duration_seconds{quantile="0.75"} 0.000706125
go_gc_duration_seconds{quantile="1"} 0.002504167
go_gc_duration_seconds_sum 0.268305997
go_gc_duration_seconds_count 576
# HELP go_gc_gogc_percent Heap size target percentage configured by the user, otherwise 100. This value is set by the GOGC environment variable, and the runtime/debug.SetGCPercent function. Sourced from /gc/gogc:percent
# TYPE go_gc_gogc_percent gauge
go_gc_gogc_percent 100
# HELP go_gc_gomemlimit_bytes Go runtime memory limit configured by the user, otherwise math.MaxInt64. This value is set by the GOMEMLIMIT environment variable, and the runtime/debug.SetMemoryLimit function. Sourced from /gc/gomemlimit:bytes
# TYPE go_gc_gomemlimit_bytes gauge
go_gc_gomemlimit_bytes 9.223372036854776e+18
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 46
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.23.6"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated in heap and currently in use. Equals to /memory/classes/heap/objects:bytes.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 5.411736e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated in heap until now, even if released already. Equals to /gc/heap/allocs:bytes.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 3.104861088e+09
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table. Equals to /memory/classes/profiling/buckets:bytes.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.811397e+06
# HELP go_memstats_frees_total Total number of heap objects frees. Equals to /gc/heap/frees:objects + /gc/heap/tiny/allocs:objects.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 3.3519681e+07
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata. Equals to /memory/classes/metadata/other:bytes.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 3.983232e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and currently in use, same as go_memstats_alloc_bytes. Equals to /memory/classes/heap/objects:bytes.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 5.411736e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used. Equals to /memory/classes/heap/released:bytes + /memory/classes/heap/free:bytes.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 2.678784e+07
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use. Equals to /memory/classes/heap/objects:bytes + /memory/classes/heap/unused:bytes
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 8.798208e+06
# HELP go_memstats_heap_objects Number of currently allocated objects. Equals to /gc/heap/objects:objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 29726
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS. Equals to /memory/classes/heap/released:bytes.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 2.5485312e+07
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system. Equals to /memory/classes/heap/objects:bytes + /memory/classes/heap/unused:bytes + /memory/classes/heap/released:bytes + /memory/classes/heap/free:bytes.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 3.5586048e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.7388725002675047e+09
# HELP go_memstats_mallocs_total Total number of heap objects allocated, both live and gc-ed. Semantically a counter version for go_memstats_heap_objects gauge. Equals to /gc/heap/allocs:objects + /gc/heap/tiny/allocs:objects.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 3.3549407e+07
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures. Equals to /memory/classes/metadata/mcache/inuse:bytes.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 13200
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system. Equals to /memory/classes/metadata/mcache/inuse:bytes + /memory/classes/metadata/mcache/free:bytes.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 15600
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures. Equals to /memory/classes/metadata/mspan/inuse:bytes.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 276960
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system. Equals to /memory/classes/metadata/mspan/inuse:bytes + /memory/classes/metadata/mspan/free:bytes.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 456960
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place. Equals to /gc/heap/goal:bytes.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 1.1352128e+07
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations. Equals to /memory/classes/other:bytes.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 1.684435e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes obtained from system for stack allocator in non-CGO environments. Equals to /memory/classes/heap/stacks:bytes.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 2.162688e+06
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator. Equals to /memory/classes/heap/stacks:bytes + /memory/classes/os-stacks:bytes.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 2.162688e+06
# HELP go_memstats_sys_bytes Number of bytes obtained from system. Equals to /memory/classes/total:byte.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 4.570036e+07
# HELP go_sched_gomaxprocs_threads The current runtime.GOMAXPROCS setting, or the number of operating system threads that can execute user-level Go code simultaneously. Sourced from /sched/gomaxprocs:threads
# TYPE go_sched_gomaxprocs_threads gauge
go_sched_gomaxprocs_threads 11
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 16
# HELP leader_election_master_status Gauge of if the reporting system is master of the relevant lease, 0 indicates backup, 1 indicates master. 'name' is the string used to identify the lease. Please make sure to group by name.
# TYPE leader_election_master_status gauge
leader_election_master_status{name="catalogd-operator-lock"} 1
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 12.41
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP process_network_receive_bytes_total Number of bytes received by the process over the network.
# TYPE process_network_receive_bytes_total counter
process_network_receive_bytes_total 2.77225892e+08
# HELP process_network_transmit_bytes_total Number of bytes sent by the process over the network.
# TYPE process_network_transmit_bytes_total counter
process_network_transmit_bytes_total 2.4226949e+07
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 12
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 6.3533056e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.73887016837e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 1.315606528e+09
# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes 1.8446744073709552e+19
# HELP rest_client_requests_total Number of HTTP requests, partitioned by status code, method, and host.
# TYPE rest_client_requests_total counter
rest_client_requests_total{code="200",host="10.96.0.1:443",method="GET"} 8
rest_client_requests_total{code="200",host="10.96.0.1:443",method="PUT"} 97
rest_client_requests_total{code="201",host="10.96.0.1:443",method="POST"} 5
rest_client_requests_total{code="404",host="10.96.0.1:443",method="GET"} 1
# HELP workqueue_adds_total Total number of adds handled by workqueue
# TYPE workqueue_adds_total counter
workqueue_adds_total{controller="clustercatalog",name="clustercatalog"} 9
# HELP workqueue_depth Current depth of workqueue
# TYPE workqueue_depth gauge
workqueue_depth{controller="clustercatalog",name="clustercatalog"} 0
# HELP workqueue_longest_running_processor_seconds How many seconds has the longest running processor for workqueue been running.
# TYPE workqueue_longest_running_processor_seconds gauge
workqueue_longest_running_processor_seconds{controller="clustercatalog",name="clustercatalog"} 0
# HELP workqueue_queue_duration_seconds How long in seconds an item stays in workqueue before being requested
# TYPE workqueue_queue_duration_seconds histogram
workqueue_queue_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="1e-08"} 0
workqueue_queue_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="1e-07"} 0
workqueue_queue_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="1e-06"} 0
workqueue_queue_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="9.999999999999999e-06"} 2
workqueue_queue_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="9.999999999999999e-05"} 6
workqueue_queue_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="0.001"} 9
workqueue_queue_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="0.01"} 9
workqueue_queue_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="0.1"} 9
workqueue_queue_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="1"} 9
workqueue_queue_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="10"} 9
workqueue_queue_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="100"} 9
workqueue_queue_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="1000"} 9
workqueue_queue_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="+Inf"} 9
workqueue_queue_duration_seconds_sum{controller="clustercatalog",name="clustercatalog"} 0.000706877
workqueue_queue_duration_seconds_count{controller="clustercatalog",name="clustercatalog"} 9
# HELP workqueue_retries_total Total number of retries handled by workqueue
# TYPE workqueue_retries_total counter
workqueue_retries_total{controller="clustercatalog",name="clustercatalog"} 8
# HELP workqueue_unfinished_work_seconds How many seconds of work has been done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases.
# TYPE workqueue_unfinished_work_seconds gauge
workqueue_unfinished_work_seconds{controller="clustercatalog",name="clustercatalog"} 0
# HELP workqueue_work_duration_seconds How long in seconds processing an item from workqueue takes.
# TYPE workqueue_work_duration_seconds histogram
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="1e-08"} 0
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="1e-07"} 0
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="1e-06"} 0
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="9.999999999999999e-06"} 0
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="9.999999999999999e-05"} 0
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="0.001"} 4
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="0.01"} 5
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="0.1"} 5
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="1"} 5
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="10"} 8
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="100"} 9
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="1000"} 9
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="+Inf"} 9
workqueue_work_duration_seconds_sum{controller="clustercatalog",name="clustercatalog"} 27.023021222000008
workqueue_work_duration_seconds_count{controller="clustercatalog",name="clustercatalog"} 9
* Connection #0 to host catalogd-service.olmv1-system.svc.cluster.local left intact
/home/curl_user $ 

Prometheus

$ kubectl apply --server-side -f https://github.com/prometheus-operator/prometheus-operator/releases/download/v0.77.1/bundle.yaml
customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/prometheusagents.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/scrapeconfigs.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator serverside-applied
clusterrole.rbac.authorization.k8s.io/prometheus-operator serverside-applied
deployment.apps/prometheus-operator serverside-applied
serviceaccount/prometheus-operator serverside-applied
service/prometheus-operator serverside-applied

Operator-Controller

camilam@Camilas-MacBook-Pro ~/go/src/github/operator-framework/operator-controller (remove-monitor-prometheues) $ kubectl apply -f - <<EOF
> apiVersion: monitoring.coreos.com/v1
> kind: ServiceMonitor
> metadata:
>   labels:
>     control-plane: operator-controller-controller-manager
>   name: controller-manager-metrics-monitor
>   namespace: olmv1-system
> spec:
>   endpoints:
>     - path: /metrics
>       port: https
>       scheme: https
>       bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
>       tlsConfig:
>         insecureSkipVerify: false 
>         serverName: operator-controller-service.olmv1-system.svc
>         ca:
>           secret:
>             name: olmv1-cert
>             key: ca.crt
>         cert:
>           secret:
>             name: olmv1-cert
>             key: tls.crt
>         keySecret:
>           name: olmv1-cert
>           key: tls.key
>   selector:
>     matchLabels:
>       control-plane: operator-controller-controller-manager
> EOF
servicemonitor.monitoring.coreos.com/controller-manager-metrics-monitor created
camilam@Camilas-MacBook-Pro ~/go/src/github/operator-framework/operator-controller (remove-monitor-prometheues) $ kubectl get servicemonitor -n olmv1-system
NAME                                 AGE
controller-manager-metrics-monitor   33s

Catalogd

$ OLM_SECRET=$(kubectl get secret -n olmv1-system -o jsonpath="{.items[*].metadata.name}" | tr ' ' '\n' | grep '^catalogd-service-cert')
camilam@Camilas-MacBook-Pro ~/go/src/github/operator-framework/operator-controller (remove-monitor-prometheues) $ 
camilam@Camilas-MacBook-Pro ~/go/src/github/operator-framework/operator-controller (remove-monitor-prometheues) $ echo $OLM_SECRET
catalogd-service-cert-v1.2.0-rc4-19-g099a6cf
camilam@Camilas-MacBook-Pro ~/go/src/github/operator-framework/operator-controller (remove-monitor-prometheues) $ kubectl apply -f - <<EOF
> apiVersion: monitoring.coreos.com/v1
> kind: ServiceMonitor
> metadata:
>   labels:
>     control-plane: catalogd-controller-manager
>   name: catalogd-metrics-monitor
>   namespace: olmv1-system
> spec:
>   endpoints:
>     - path: /metrics
>       port: https
>       scheme: https
>       bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
>       tlsConfig:
>         serverName: catalogd-service.olmv1-system.svc
>         insecureSkipVerify: false
>         ca:
>           secret:
>             name: $OLM_SECRET
>             key: ca.crt
>         cert:
>           secret:
>             name: $OLM_SECRET
>             key: tls.crt
>         keySecret:
>           name: $OLM_SECRET
>           key: tls.key
>   selector:
>     matchLabels:
>       control-plane: catalogd-controller-manager
> EOF
servicemonitor.monitoring.coreos.com/catalogd-metrics-monitor created
camilam@Camilas-MacBook-Pro ~/go/src/github/operator-framework/operator-controller (remove-monitor-prometheues) $ kubectl get servicemonitor -n olmv1-system
NAME                                 AGE
catalogd-metrics-monitor             27s
controller-manager-metrics-monitor   3m46s

@camilamacedo86 camilamacedo86 requested a review from a team as a code owner December 13, 2024 22:39
Copy link

netlify bot commented Dec 13, 2024

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit 1d0c654
🔍 Latest deploy log https://app.netlify.com/sites/olmv1/deploys/67a63880963b3e00087aef4a
😎 Deploy Preview https://deploy-preview-1524--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@camilamacedo86 camilamacedo86 changed the title ⚠️ remove prometheus manifests 🐛 Improving Security: Removing Unused Manifests for Prometheus Dec 13, 2024
@camilamacedo86 camilamacedo86 force-pushed the remove-monitor-prometheues branch from bac6ff6 to 3b00bd8 Compare December 13, 2024 22:45
@camilamacedo86 camilamacedo86 changed the title 🐛 Improving Security: Removing Unused Manifests for Prometheus 🐛 (fix) Removing Unused and Insecure Manifests for Prometheus Dec 13, 2024
@camilamacedo86 camilamacedo86 force-pushed the remove-monitor-prometheues branch from 3b00bd8 to 85f94da Compare December 13, 2024 22:46
@camilamacedo86
Copy link
Contributor Author

/hold

Just to ensure that we all convey

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 13, 2024
Copy link

codecov bot commented Dec 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.24%. Comparing base (d72e551) to head (1d0c654).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1524   +/-   ##
=======================================
  Coverage   68.24%   68.24%           
=======================================
  Files          58       58           
  Lines        4988     4988           
=======================================
  Hits         3404     3404           
  Misses       1342     1342           
  Partials      242      242           
Flag Coverage Δ
e2e 52.92% <ø> (ø)
unit 55.45% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bentito
Copy link
Contributor

bentito commented Dec 16, 2024

We should probably be proposing to fix this, if needed, not remove it, since via Prometheus is the primary way people would likely want to monitor metrics for op-con doing its work.

@camilamacedo86
Copy link
Contributor Author

Hi @bentito

I think the best approach might be:

  • a) remove it since we do not support the integration now
  • b) Create an epic for us to do an RFC to know how we will support it.
  • We need to discuss how we do it with the install script, etc. and let it follow the priority list

WDYT?

@bentito

This comment was marked as resolved.

@camilamacedo86 camilamacedo86 force-pushed the remove-monitor-prometheues branch from 85f94da to 23cc069 Compare December 16, 2024 19:47
@tmshort
Copy link
Contributor

tmshort commented Dec 17, 2024

Could we move a fixed monitor.yaml to documents, rather than deleting it?

@joelanford
Copy link
Member

Could we move a fixed monitor.yaml to documents, rather than deleting it?

Not opposed, but we need to be careful about how we do this because our service names are not part of our public API and may change at any time. If we were the ones managing the ServiceMonitor, we could make sure that it was updated appropriately if/when we change our service names. By documenting it, we may inadvertently put the service name and namespace in our public API, which IMO requires a MUCH bigger discussion.

@camilamacedo86 camilamacedo86 force-pushed the remove-monitor-prometheues branch from 23cc069 to cbdc3a7 Compare December 18, 2024 10:52
@camilamacedo86 camilamacedo86 changed the title 🐛 (fix) Removing Unused and Insecure Manifests for Prometheus 🐛 (fix/doc): add metrics consumption guide and remove unsafe configurations Dec 18, 2024
@camilamacedo86
Copy link
Contributor Author

/hold cancel

It seems reasonable to get merged now.
All suggestions are done
And in the community meeting on Dec 17, we discussed it and we agreed to remove the unused and unsafe config

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 18, 2024
@camilamacedo86
Copy link
Contributor Author

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 19, 2024
@camilamacedo86
Copy link
Contributor Author

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 19, 2024
@camilamacedo86
Copy link
Contributor Author

camilamacedo86 commented Feb 4, 2025

Hi @azych,

Thank you for raising this! Your comment brings up an important point.

However, I believe this behavior depends on how Prometheus is configured. For example, it is possible to configure Prometheus to monitor all namespaces, as outlined in the official documentation:
Monitoring All Namespaces.

Given this flexibility in configuration, I don’t think we should add details about serviceMonitorSelector in this guide. The ServiceMonitor here is just an example, and its primary purpose is to clarify how to pass the serviceName and certificates, rather than enforce a specific Prometheus setup. However, a note to alert users about it was added.

PS.: See: https://github.com/prometheus-operator/kube-prometheus/blob/23b33729e2d31660539ca43f5c553907c3f0b823/manifests/prometheus-prometheus.yaml#L48-L49 by default it seems to be configured to check all namespaces.

Let me know what you think!

@camilamacedo86 camilamacedo86 force-pushed the remove-monitor-prometheues branch 2 times, most recently from 2a713a8 to 5b50966 Compare February 4, 2025 09:10
@camilamacedo86
Copy link
Contributor Author

Hi @LalatenduMohanty @tmshort @bentito @michaelryanpeter

It is good to review now.
Please feel free to share your thoughts.

Thank you for the help.

docs/draft/howto/consuming-metrics.md Outdated Show resolved Hide resolved
docs/draft/howto/consuming-metrics.md Outdated Show resolved Hide resolved
docs/draft/howto/consuming-metrics.md Outdated Show resolved Hide resolved
@LalatenduMohanty
Copy link
Member

@camilamacedo86 Do you want us to test the manifests and commands are in this PR?

@camilamacedo86
Copy link
Contributor Author

Hi @LalatenduMohanty

@camilamacedo86 Do you want us to test the manifests and commands in this PR?

In the past, I did mainly the same as what is here and added the outputs in the description of the PRs see:

From there, the only change was the port (fixed now)
Also, we are doing mainly the same of our e2e tests:

I can test again now since it is open for a while, and I addressed some comments.

@camilamacedo86 camilamacedo86 force-pushed the remove-monitor-prometheues branch 2 times, most recently from fd28a8c to dc78cdd Compare February 6, 2025 20:15
@camilamacedo86
Copy link
Contributor Author

camilamacedo86 commented Feb 6, 2025

Hi @tylerslaton @LalatenduM,

I went through all the steps again since it had been a while, and I needed to make two tweaks.
To assist with the review, I’ve included the full execution of the steps in the description so you can validate them easily.

I really appreciate your time and help!
Feel free to check.

Thanks again!


```shell
curl -v -k -H "Authorization: Bearer <TOKEN>" \
https://operator-controller-service.olmv1-system.svc.cluster.local:8443/metrics
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tylerslaton THANK You for check that it has something wrong
The name of the service was wrong when I copied and pasted the command.
It is fine now as you can check in the description

c/c @LalatenduMohanty


```shell
OLM_SECRET=$(kubectl get secret -n olmv1-system -o jsonpath="{.items[*].metadata.name}" | tr ' ' '\n' | grep '^catalogd-service-cert')
echo $OLM_SECRET
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tylerslaton @LalatenduMohanty

Here as well was not working now it is fine

$ OLM_SECRET=$(kubectl get secret -n olmv1-system -o jsonpath="{.items[*].metadata.name}" | tr ' ' '\n' | grep '^catalogd-service-cert')
camilam@Camilas-MacBook-Pro ~/go/src/github/operator-framework/operator-controller (remove-monitor-prometheues) $ echo $OLM_SECRET
catalogd-service-cert-v1.2.0-rc4-19-g099a6cf

@camilamacedo86 camilamacedo86 force-pushed the remove-monitor-prometheues branch 2 times, most recently from eb14a7c to a6a73ec Compare February 6, 2025 20:41
Copy link
Contributor

@michaelryanpeter michaelryanpeter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have questions, please feel free to reach out.

I don't think anything I mentioned is a blocker. Only suggestions to improve clarity and scanability.

docs/draft/howto/consuming-metrics.md Outdated Show resolved Hide resolved
docs/draft/howto/consuming-metrics.md Outdated Show resolved Hide resolved
docs/draft/howto/consuming-metrics.md Outdated Show resolved Hide resolved
docs/draft/howto/consuming-metrics.md Outdated Show resolved Hide resolved
docs/draft/howto/consuming-metrics.md Outdated Show resolved Hide resolved
docs/draft/howto/consuming-metrics.md Outdated Show resolved Hide resolved
docs/draft/howto/consuming-metrics.md Outdated Show resolved Hide resolved
docs/draft/howto/consuming-metrics.md Outdated Show resolved Hide resolved
docs/draft/howto/consuming-metrics.md Show resolved Hide resolved
docs/draft/howto/consuming-metrics.md Show resolved Hide resolved
@camilamacedo86 camilamacedo86 force-pushed the remove-monitor-prometheues branch 2 times, most recently from 0ba1078 to fe87ffa Compare February 7, 2025 16:15
Copy link
Contributor

@michaelryanpeter michaelryanpeter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of final nits. Otherwise, LGTM.

docs/draft/howto/consuming-metrics.md Outdated Show resolved Hide resolved
docs/draft/howto/consuming-metrics.md Outdated Show resolved Hide resolved
docs/draft/howto/consuming-metrics.md Outdated Show resolved Hide resolved
…metrics and integrate it with other solutions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants