diff --git a/README.md b/README.md index 682e185..a7ef2d4 100644 --- a/README.md +++ b/README.md @@ -40,7 +40,7 @@ managers (SLURM, Openstack, k8s) - Provides targets using [HTTP Discovery Component](https://grafana.com/docs/alloy/latest/reference/components/discovery/discovery.http/) to [Grafana Alloy](https://grafana.com/docs/alloy/latest) to continuously profile compute units - Realtime access to metrics *via* Grafana dashboards -- Access control to Prometheus datasource in Grafana +- Access control to Prometheus and Pyroscope datasources in Grafana - Stores aggregated metrics in a separate DB that can be retained for long time - CEEMS apps are [capability aware](https://tbhaxor.com/understanding-linux-capabilities/) diff --git a/build/config/ceems_lb/ceems_lb.yml b/build/config/ceems_lb/ceems_lb.yml index 9d4a196..092ceeb 100644 --- a/build/config/ceems_lb/ceems_lb.yml +++ b/build/config/ceems_lb/ceems_lb.yml @@ -12,7 +12,7 @@ # --- ceems_lb: - # Load balancing strategy. Three possibilites + # Load balancing strategy. Two possibilites # # - round-robin # - least-connection @@ -20,7 +20,7 @@ ceems_lb: # Round robin and least connection are classic strategies and are # self explanatory. # - strategy: resource-based + strategy: round-robin # List of backends for each cluster # diff --git a/build/package/ceems_exporter/ceems_exporter.service b/build/package/ceems_exporter/ceems_exporter.service index 3993c77..9447b9f 100644 --- a/build/package/ceems_exporter/ceems_exporter.service +++ b/build/package/ceems_exporter/ceems_exporter.service @@ -18,6 +18,9 @@ StartLimitInterval=0 ProtectHome=read-only +# CEEMS Exporter is capability aware which means it drops all unnecessary capabilities based on +# runtime configuration. Thus, all these capabilities will not set on actual process if +# the collectors that do need them are not enabled. AmbientCapabilities=CAP_SYS_PTRACE CAP_DAC_READ_SEARCH CAP_SETUID CAP_SETGID CAP_DAC_OVERRIDE CAP_BPF CAP_PERFMON CAP_SYS_RESOURCE CapabilityBoundingSet=CAP_SYS_PTRACE CAP_DAC_READ_SEARCH CAP_SETUID CAP_SETGID CAP_DAC_OVERRIDE CAP_BPF CAP_PERFMON CAP_SYS_RESOURCE diff --git a/pkg/lb/base/base.go b/pkg/lb/base/base.go index 9dc904e..6450e7e 100644 --- a/pkg/lb/base/base.go +++ b/pkg/lb/base/base.go @@ -11,7 +11,7 @@ const CEEMSLoadBalancerAppName = "ceems_lb" // CEEMSLoadBalancerApp is kingpin CLI app. var CEEMSLoadBalancerApp = *kingpin.New( CEEMSLoadBalancerAppName, - "Prometheus load balancer to query from different instances.", + "CEEMS load balancer for TSDB and Pyroscope servers with access control support.", ) // Backend defines backend server. diff --git a/website/docs/00-introduction.md b/website/docs/00-introduction.md index 745ce2b..afa56cf 100644 --- a/website/docs/00-introduction.md +++ b/website/docs/00-introduction.md @@ -31,28 +31,31 @@ managers (SLURM, Openstack, k8s) - Provides targets using [HTTP Discovery Component](https://grafana.com/docs/alloy/latest/reference/components/discovery/discovery.http/) to [Grafana Alloy](https://grafana.com/docs/alloy/latest) to continuously profile compute units - Realtime access to metrics *via* Grafana dashboards -- Access control to Prometheus datasource in Grafana +- Access control to Prometheus and Pyroscope datasources in Grafana - Stores aggregated metrics in a separate DB that can be retained for long time - CEEMS apps are [capability aware](https://tbhaxor.com/understanding-linux-capabilities/) ## Components -CEEMS provide a set of components that enable operators to monitor the consumption of +CEEMS provide a set of components that enable operators and end users to monitor the consumption of resources of the compute units of different resource managers like SLURM, Openstack and Kubernetes. - CEEMS Prometheus exporter is capable of exporting compute unit metrics including energy consumption, performance, IO and network metrics from different resource managers in a -unified manner. +unified manner. In addition, CEEMS exporter is capable of providing targets to +[Grafana Alloy](https://grafana.com/docs/alloy/latest/reference/components/discovery/discovery.http/) +for continuously profiling compute units using +[eBPF](https://grafana.com/docs/alloy/latest/reference/components/pyroscope/pyroscope.ebpf/) - CEEMS API server can store the aggregate metrics and metadata of each compute unit originating from different resource managers. -- CEEMS load balancer provides basic access control on TSDB so that compute unit metrics +- CEEMS load balancer provides basic access control on TSDB and Pyroscope so that compute unit metrics from different projects/tenants/namespaces are isolated. "Compute Unit" in the current context has a wider scope. It can be a batch job in HPC, -a VM in cloud, a pod in k8s, _etc_. The main objective of the stack is to quantify +a VM in cloud, a pod in k8s, *etc*. The main objective of the stack is to quantify the energy consumed and estimate emissions by each "compute unit". The repository itself does not provide any frontend apps to show dashboards and it is meant to use along with Grafana and Prometheus to show statistics to users. diff --git a/website/docs/02-objectives.md b/website/docs/02-objectives.md index f9343ed..43958a5 100644 --- a/website/docs/02-objectives.md +++ b/website/docs/02-objectives.md @@ -1,6 +1,6 @@ # Objectives -The objectives of the current stack are two-fold: +The objectives of the current stack are several folds: - For end users to be able to monitor their compute units in real time. Besides the conventional metrics like CPU usage, memory usage, _etc_, the stack also exposes diff --git a/website/docs/components/ceems-lb.md b/website/docs/components/ceems-lb.md index 9f35de0..422b7a1 100644 --- a/website/docs/components/ceems-lb.md +++ b/website/docs/components/ceems-lb.md @@ -6,13 +6,13 @@ sidebar_position: 3 ## Background -The motivation behind creating CEEMS load balancer component is that Prometheus TSDB -do not enforce any sort of access control over its metrics querying. This means once -a user has been given the permissions to query a Prometheus TSDB, they can query _any_ -metrics stored in the TSDB. +The motivation behind creating CEEMS load balancer component is that neither Prometheus TSDB +nor Grafana Pyroscope enforce any sort of access control over its metrics/profiles querying. +This means once a user has been given the permissions to query a Prometheus TSDB/Grafana +Pyroscope server, they can query _any_ metrics/profiles stored in the server. -Generally, it is not necessary to expose TSDB to end users directly and it is done -using Grafana as Prometheus datasource. Dashboards that are exposed to the end users +Generally, it is not necessary to expose TSDB/Pyroscope server to end users directly and it is done +using Grafana as Prometheus/Pyroscope datasource. Dashboards that are exposed to the end users need to have query access on the underlying datasource that the dashboard uses. Although a regular user with [`Viewer`](https://grafana.com/docs/grafana/latest/administration/roles-and-permissions/access-control/#basic-roles) @@ -23,28 +23,28 @@ This effectively means, the user can make _any_ query to the underlying datasour Prometheus, using the browser cookie that is set by Grafana auth. The consequence is that the user can query the metrics of _any_ user or _any_ compute unit. Straight forward solutions to this problem is to create a Prometheus instance for each project/namespace. -However, this is not a scalable solution when they are thousands of projects/namespaces -exist. +However, this is not a scalable solution when there are thousands of projects/namespaces +in a given deployment. This can pose few issues in multi tenant systems like HPC and cloud computing platforms. Ideally, we do not want one user to be able to access the compute unit metrics of other users. CEEMS load balancer component has been created to address this issue. CEEMS Load Balancer addresses this issue by acting as a gate keeper to introspect the -query before deciding whether to proxy the request to TSDB or not. It means when a user -makes a TSDB query for a given compute unit, CEEMS load balancer will check if the user +query before deciding whether to proxy the request to TSDB/Pyroscope or not. It means when a user +makes a TSDB/Pyroscope query for a given compute unit, CEEMS load balancer will check if the user owns that compute unit by verifying with CEEMS API server. ## Objectives The main objectives of the CEEMS load balancer are two-fold: -- To provide access control on the TSDB so that compute units of each project/namespace +- To provide access control on the TSDB/Pyroscope so that compute units of each project/namespace are only accessible to the members of that project/namespace -- To provide basic load balancing for replicated TSDB instances. +- To provide basic load balancing for replicated TSDB/Pyroscope instances. -Thus, CEEMS load balancer can be configured as Prometheus data source in Grafana and -the load balancer will take care of routing traffic to backend TSDB instances and at +Thus, CEEMS load balancer can be configured as Prometheus and Pyroscope data sources in Grafana and +the load balancer will take care of routing traffic to backend TSDB/Pyroscope instances and at the same time enforcing access control. ## Load balancing @@ -53,6 +53,14 @@ CEEMS load balancer supports classic load balancing strategies like round-robin connection methods. Besides these two, it supports resource based strategy that is based on retention time. Let's take a look at this strategy in-detail. +:::warning[WARNING] + +Resource based load balancing strategy is only supported for TSDB. For Pyroscope, +this strategy is not supported and when used, it will be defaulted to least-connection +strategy. + +::: + Taking Prometheus TSDB as an example, Prometheus advises to use local file system to store the data. This ensure performance and data integrity. However, storing data on local disk is not fault tolerant unless data is replicated elsewhere. There are cloud native @@ -77,12 +85,12 @@ then routing the request to either "hot" or "cold" instances of TSDB. ## Multi cluster support A single deployment of CEEMS load balancer is capable of loading balancing traffic between -different replicated TSDB instances of multiple clusters. Imagine there are two different +different replicated TSDB/Pyroscope instances of multiple clusters. Imagine there are two different clusters, one for SLURM and one for Openstack, in a DC. Slurm cluster has two dedicated -TSDB instances where data is replicated between them and the same for Openstack cluster. -Thus, in total, there are four TSDB instances, two for SLURM cluster and two for +TSDB/Pyroscope instances where data is replicated between them and the same for Openstack cluster. +Thus, in total, there are four TSDB/Pyroscope instances, two for SLURM cluster and two for Openstack cluster. A single instance of CEEMS load balancer can route the traffic -between these four different TSDB instances by targeting the correct cluster. +between these four different TSDB/Pyroscope instances by targeting the correct cluster. However, in the production with heavy traffic a single instance of CEEMS load balancer might not be a optimal solution. In that case, it is however possible to deploy a dedicated diff --git a/website/docs/components/metrics.md b/website/docs/components/metrics.md index 8b8197d..06622a6 100644 --- a/website/docs/components/metrics.md +++ b/website/docs/components/metrics.md @@ -4,6 +4,38 @@ sidebar_position: 4 # CEEMS Exporter Metrics +CEEMS exporter ships multiple collectors of which some are enabled by +default. + +## Enabled by default + +The following collectors are enabled by default + +- cpu +- meminfo +- rapl + +## Disabled by default + +The rest of the collectors and sub-collectors are disabled by default. Collectors +disabled by default are: + +- ipmi_dcmi +- emissions +- slurm +- libvirt + +Sub-collectors disabled by default are: + +- ebpf.io-metrics +- ebpf.network-metrics +- perf.hardware-events +- perf.software-events +- perf.hardware-cache-events +- rdma.stats + +## Metrics list + The following are the list of metrics exposed by CEEMS exporter along with the labels for each metric and its description. The first column shows the collector that metric belongs to. @@ -16,10 +48,10 @@ shows the collector that metric belongs to. | meminfo | ceems_meminfo_MemTotal_bytes | hostname | Total memory in the current host. As reported in `/proc/meminfo` | | meminfo | ceems_meminfo_MemFree_bytes | hostname | Total free memory in the current host. As reported in `/proc/meminfo` | | meminfo | ceems_meminfo_MemAvailable_bytes | hostname | Total available memory in the current host. As reported in `/proc/meminfo` | -| ipmi | ceems_ipmi_dcmi_current_watts | hostname | Current power consumption reported by IPMI DCMI | -| ipmi | ceems_ipmi_dcmi_avg_watts | hostname | Average power consumption reported by IPMI DCMI within sampling period | -| ipmi | ceems_ipmi_dcmi_min_watts | hostname | Minimum power consumption reported by IPMI DCMI within sampling period | -| ipmi | ceems_ipmi_dcmi_max_watts | hostname | Maximum power consumption reported by IPMI DCMI within sampling period | +| ipmi_dcmi | ceems_ipmi_dcmi_current_watts | hostname | Current power consumption reported by IPMI DCMI | +| ipmi_dcmi | ceems_ipmi_dcmi_avg_watts | hostname | Average power consumption reported by IPMI DCMI within sampling period | +| ipmi_dcmi | ceems_ipmi_dcmi_min_watts | hostname | Minimum power consumption reported by IPMI DCMI within sampling period | +| ipmi_dcmi | ceems_ipmi_dcmi_max_watts | hostname | Maximum power consumption reported by IPMI DCMI within sampling period | | rapl | ceems_rapl_package_joules_total | path, index | Current RAPL package energy value. Labels `index` and `path` gives info about package details. | | rapl | ceems_rapl_dram_joules_total | path, index | Current RAPL DRAM energy value. Labels `index` and `path` gives info about package details. | | rapl | ceems_rapl_core_joules_total | path, index | Current RAPL core energy value. Labels `index` and `path` gives info about package details. diff --git a/website/docs/configuration/ceems-lb.md b/website/docs/configuration/ceems-lb.md index fb3abb6..251b7b9 100644 --- a/website/docs/configuration/ceems-lb.md +++ b/website/docs/configuration/ceems-lb.md @@ -4,6 +4,13 @@ sidebar_position: 4 # CEEMS Load Balancer +CEEMS load balancer supports providing load balancer for TSDB and Pyroscope +servers. When both TSDB and Pyroscope backend servers are configured, CEEMS LB +will launch two different web servers listening at two different ports one +for TSDB and one for Pyroscope. + +## CEEMS Load Balancer Configuration + CEEMS Load Balancer configuration has one main section and two optional section. A basic skeleton of the configuration is as follows: @@ -28,22 +35,25 @@ A valid sample configuration file can be found in the [repo](https://github.com/mahendrapaipuri/ceems/blob/main/build/config/ceems_lb/ceems_lb.yml). -## CEEMS Load Balancer Configuration - A sample CEEMS LB config file is shown below: ```yaml - ceems_lb: strategy: resource-based backends: - id: slurm-0 tsdb_urls: - http://localhost:9090 + pyroscope_urls: + - http://localhost:4040 - id: slurm-1 tsdb_urls: - http://localhost:9090 + + - id: slurm-2 + pyroscope_urls: + - http://localhost:4040 ``` - `strategy`: Load balancing strategy. Besides classical `round-robin` and @@ -55,16 +65,44 @@ that has the data based on the time period in the query. that the `id` in the backend must be the same `id` used in the [Clusters Configuration](./ceems-api-server.md#clusters-configuration). This is how CEEMS LB will know which cluster to target. - - `backends.tsdb_urls`: A list of TSDB servers that scrape metrics from this + - `backends.tsdb_urls`: A list of TSDB servers that scrape metrics from the + cluster identified by `id`. + - `backends.pyroscope_urls`: A list of Pyroscope servers that store profiling data from the cluster identified by `id`. :::warning[WARNING] +`resource-based` strategy is only supported for TSDB and when used along with +Pyroscope, the load balancing strategy for Pyroscope servers will be defaulted +to `least-connection`. + CEEMS LB is meant to deploy in the same DMZ as the TSDB servers and hence, it does not support TLS for the backends. ::: +### CEEMS Load Balancer CLI configuration + +By default CEEMS LB servers listen at ports `9030` and `9040` when both +TSDB and Pyroscope backend servers are configured. If intended to use +custom ports, the CLI flag `--web.listen-address` must be repeated to set up +port for TSDB and Pyroscope backends. For instance, for the sample config shown +above, the CLI arguments to launch LB servers at custom ports will be: + +```bash +ceems_lb --config.file config.yml --web.listen-address ":8000" --web.listen-address ":9000" +``` + +This will launch TSDB load balancer listening at port `8000` and Pyroscope load +balancer listening at port `9000`. + +:::important[IMPORTANT] + +When both TSDB and Pyroscope backend servers are configured, the first listen +address is attributed to TSDB and second one to Pyroscope. + +::: + ### Matching `backends.id` with `clusters.id` #### Using custom header @@ -148,7 +186,7 @@ For instance, for `slurm-0` cluster the provisioned datasource config for Grafana will look as follows: ```yaml -- name: CEEMS-LB +- name: CEEMS-TSDB-LB type: prometheus access: proxy url: http://localhost:9030 @@ -164,10 +202,25 @@ config for Grafana will look as follows: secureJsonData: basicAuthPassword: httpHeaderValue1: slurm-0 - isDefault: true ``` -assuming CEEMS LB is running at port 9030 on the same host as Grafana. +assuming CEEMS LB is running at port 9030 on the same host as Grafana. Similarly, +for Pyroscope the provisioned config must look like: + +```yaml +- name: CEEMS-Pyro-LB + type: pyroscope + access: proxy + url: http://localhost:9040 + basicAuth: true + basicAuthUser: ceems + jsonData: + httpHeaderName1: X-Ceems-Cluster-Id + secureJsonData: + basicAuthPassword: + httpHeaderValue1: slurm-0 +``` + Notice that we set the header and value in `jsonData` and `secureJsonData`, respectively. This ensures that datasource will send the header with every request to CEEMS LB and then LB will redirect the query request @@ -201,6 +254,23 @@ CEEMS LB, the query label will take the precedence. ::: +Similarly for setting up this label on profiling data in Pyroscope, +it is necessary to use `external_labels` config parameter for Grafana +Alloy when exporting profiles to Pyroscope server. A sample config +for Grafana Alloy that pushes profiling data can be as follows: + +```river +pyroscope.write "monitoring" { + endpoint { + url = "http://pyroscope:4040" + } + + external_labels = { + "ceems_id" = "slurm-0", + } +} +``` + ## CEEMS API Server Configuration This is an optional config when provided will enforce access diff --git a/website/docs/configuration/config-reference.md b/website/docs/configuration/config-reference.md index caf4d37..108f729 100644 --- a/website/docs/configuration/config-reference.md +++ b/website/docs/configuration/config-reference.md @@ -828,19 +828,17 @@ A `backend_config` allows configuring backend TSDB servers for load balancer. # for compute unit ownership, CEEMS LB will use the ID to query for the compute # units of that cluster. # -# This identifier needs to be in the path parameter for requests to CEEMS LB -# to target correct cluster. For instance there are two different clusters, -# say `cluster-0` and `cluster-1`, that have different TSDBs configured. Using CEEMS +# This identifier needs to be set as header value for `X-Ceems-Cluster-Id` for +# requests to CEEMS LB to target correct cluster. For instance there are two different +# clusters, say cluster-0 and cluster-1, that have different TSDBs configured. Using CEEMS # LB we can load balance the traffic for these two clusters using a single CEEMS LB # deployement. However, we need to tell CEEMS LB which cluster to target for the -# incoming traffic. This is done via path parameter. +# incoming traffic. This is done via header. # -# If CEEMS LB is running at http://localhost:9030, then the `cluster-0` is reachable at -# `http://localhost:9030/cluster-0` and `cluster-1` at `http://localhost:9030/cluster-1`. -# Internally, CEEMS will strip the first part in the URL path, use it to identify -# cluster and proxy the rest of URL path to underlying TSDB backend. -# Thus, all the requests to `http://localhost:9030/cluster-0` will be load -# balanced across TSDB backends of `cluster-0`. +# The TSDBs running in `cluster-0` must be configured on Grafana to send a header +# value `X-Ceems-Cluster-Id` to `cluster-0` in each request. CEEMS LB will inspect +# this header value and proxies the request to correct TSDB in `cluster-0` based +# on chosen LB strategy. # id: @@ -859,6 +857,22 @@ id: # tsdb_urls: [ - ] + +# List of Pyroscope servers for this cluster. Load balancing between these servers +# will be made based on the strategy chosen. +# +# TLS is not supported for backends. CEEMS LB supports TLS and TLS terminates +# at the LB and requests are proxied to backends on HTTP. +# +# LB and backend servers are meant to be in the same DMZ so that we do not need +# to encrypt communications. Backends however support basic auth and they can +# be configured in URL with usual syntax. +# +# An example of configuring the basic auth username and password with backend +# - http://alice:password@localhost:4040 +# +pyroscope_urls: + [ - ] ``` ## ``