docs: Update documentation for CEEMS LB

Signed-off-by: Mahendra Paipuri <[email protected]>
mahendrapaipuri · Dec 30, 2024 · 87ee3af · 87ee3af
1 parent 130c54d
commit 87ee3af
Show file tree

Hide file tree

Showing 10 changed files with 179 additions and 49 deletions.
diff --git a/README.md b/README.md
@@ -40,7 +40,7 @@ managers (SLURM, Openstack, k8s)
 - Provides targets using [HTTP Discovery Component](https://grafana.com/docs/alloy/latest/reference/components/discovery/discovery.http/)
 to [Grafana Alloy](https://grafana.com/docs/alloy/latest) to continuously profile compute units
 - Realtime access to metrics *via* Grafana dashboards
-- Access control to Prometheus datasource in Grafana
+- Access control to Prometheus and Pyroscope datasources in Grafana
 - Stores aggregated metrics in a separate DB that can be retained for long time
 - CEEMS apps are [capability aware](https://tbhaxor.com/understanding-linux-capabilities/)
 

diff --git a/build/config/ceems_lb/ceems_lb.yml b/build/config/ceems_lb/ceems_lb.yml
@@ -12,15 +12,15 @@
 #
 ---
 ceems_lb:
-  # Load balancing strategy. Three possibilites
+  # Load balancing strategy. Two possibilites
   #
   # - round-robin
   # - least-connection
   #
   # Round robin and least connection are classic strategies and are
   # self explanatory.
   #
-  strategy: resource-based
+  strategy: round-robin
 
   # List of backends for each cluster
   #

diff --git a/build/package/ceems_exporter/ceems_exporter.service b/build/package/ceems_exporter/ceems_exporter.service
@@ -18,6 +18,9 @@ StartLimitInterval=0
 
 ProtectHome=read-only
 
+# CEEMS Exporter is capability aware which means it drops all unnecessary capabilities based on
+# runtime configuration. Thus, all these capabilities will not set on actual process if
+# the collectors that do need them are not enabled.
 AmbientCapabilities=CAP_SYS_PTRACE CAP_DAC_READ_SEARCH CAP_SETUID CAP_SETGID CAP_DAC_OVERRIDE CAP_BPF CAP_PERFMON CAP_SYS_RESOURCE
 CapabilityBoundingSet=CAP_SYS_PTRACE CAP_DAC_READ_SEARCH CAP_SETUID CAP_SETGID CAP_DAC_OVERRIDE CAP_BPF CAP_PERFMON CAP_SYS_RESOURCE
 

diff --git a/pkg/lb/base/base.go b/pkg/lb/base/base.go
@@ -11,7 +11,7 @@ const CEEMSLoadBalancerAppName = "ceems_lb"
 // CEEMSLoadBalancerApp is kingpin CLI app.
 var CEEMSLoadBalancerApp = *kingpin.New(
 	CEEMSLoadBalancerAppName,
-	"Prometheus load balancer to query from different instances.",
+	"CEEMS load balancer for TSDB and Pyroscope servers with access control support.",
 )
 
 // Backend defines backend server.

diff --git a/website/docs/00-introduction.md b/website/docs/00-introduction.md
@@ -31,28 +31,31 @@ managers (SLURM, Openstack, k8s)
 - Provides targets using [HTTP Discovery Component](https://grafana.com/docs/alloy/latest/reference/components/discovery/discovery.http/)
 to [Grafana Alloy](https://grafana.com/docs/alloy/latest) to continuously profile compute units
 - Realtime access to metrics *via* Grafana dashboards
-- Access control to Prometheus datasource in Grafana
+- Access control to Prometheus and Pyroscope datasources in Grafana
 - Stores aggregated metrics in a separate DB that can be retained for long time
 - CEEMS apps are [capability aware](https://tbhaxor.com/understanding-linux-capabilities/)
 
 ## Components
 
-CEEMS provide a set of components that enable operators to monitor the consumption of
+CEEMS provide a set of components that enable operators and end users to monitor the consumption of
 resources of the compute units of different resource managers like SLURM, Openstack and
 Kubernetes.
 
 - CEEMS Prometheus exporter is capable of exporting compute unit metrics including energy
 consumption, performance, IO and network metrics from different resource managers in a
-unified manner.
+unified manner. In addition, CEEMS exporter is capable of providing targets to
+[Grafana Alloy](https://grafana.com/docs/alloy/latest/reference/components/discovery/discovery.http/)
+for continuously profiling compute units using
+[eBPF](https://grafana.com/docs/alloy/latest/reference/components/pyroscope/pyroscope.ebpf/)
 
 - CEEMS API server can store the aggregate metrics and metadata of each compute unit
 originating from different resource managers.
 
-- CEEMS load balancer provides basic access control on TSDB so that compute unit metrics
+- CEEMS load balancer provides basic access control on TSDB and Pyroscope so that compute unit metrics
 from different projects/tenants/namespaces are isolated.
 
 "Compute Unit" in the current context has a wider scope. It can be a batch job in HPC,
-a VM in cloud, a pod in k8s, _etc_. The main objective of the stack is to quantify
+a VM in cloud, a pod in k8s, *etc*. The main objective of the stack is to quantify
 the energy consumed and estimate emissions by each "compute unit". The repository itself
 does not provide any frontend apps to show dashboards and it is meant to use along
 with Grafana and Prometheus to show statistics to users.

diff --git a/website/docs/02-objectives.md b/website/docs/02-objectives.md
@@ -1,6 +1,6 @@
 # Objectives
 
-The objectives of the current stack are two-fold:
+The objectives of the current stack are several folds:
 
 - For end users to be able to monitor their compute units in real time. Besides the
 conventional metrics like CPU usage, memory usage, _etc_, the stack also exposes

diff --git a/website/docs/components/ceems-lb.md b/website/docs/components/ceems-lb.md
@@ -6,13 +6,13 @@ sidebar_position: 3
 
 ## Background
 
-The motivation behind creating CEEMS load balancer component is that Prometheus TSDB
-do not enforce any sort of access control over its metrics querying. This means once
-a user has been given the permissions to query a Prometheus TSDB, they can query _any_
-metrics stored in the TSDB.
+The motivation behind creating CEEMS load balancer component is that neither Prometheus TSDB
+nor Grafana Pyroscope enforce any sort of access control over its metrics/profiles querying.
+This means once a user has been given the permissions to query a Prometheus TSDB/Grafana
+Pyroscope server, they can query _any_ metrics/profiles stored in the server.
 
-Generally, it is not necessary to expose TSDB to end users directly and it is done
-using Grafana as Prometheus datasource. Dashboards that are exposed to the end users
+Generally, it is not necessary to expose TSDB/Pyroscope server to end users directly and it is done
+using Grafana as Prometheus/Pyroscope datasource. Dashboards that are exposed to the end users
 need to have query access on the underlying
 datasource that the dashboard uses. Although a regular user with
 [`Viewer`](https://grafana.com/docs/grafana/latest/administration/roles-and-permissions/access-control/#basic-roles)
@@ -23,28 +23,28 @@ This effectively means, the user can make _any_ query to the underlying datasour
 Prometheus, using the browser cookie that is set by Grafana auth. The consequence is that
 the user can query the metrics of _any_ user or _any_ compute unit. Straight forward
 solutions to this problem is to create a Prometheus instance for each project/namespace.
-However, this is not a scalable solution when they are thousands of projects/namespaces
-exist.
+However, this is not a scalable solution when there are thousands of projects/namespaces
+in a given deployment.
 
 This can pose few issues in multi tenant systems like HPC and cloud computing platforms.
 Ideally, we do not want one user to be able to access the compute unit metrics of
 other users. CEEMS load balancer component has been created to address this issue.
 
 CEEMS Load Balancer addresses this issue by acting as a gate keeper to introspect the
-query before deciding whether to proxy the request to TSDB or not. It means when a user
-makes a TSDB query for a given compute unit, CEEMS load balancer will check if the user
+query before deciding whether to proxy the request to TSDB/Pyroscope or not. It means when a user
+makes a TSDB/Pyroscope query for a given compute unit, CEEMS load balancer will check if the user
 owns that compute unit by verifying with CEEMS API server.
 
 ## Objectives
 
 The main objectives of the CEEMS load balancer are two-fold:
 
-- To provide access control on the TSDB so that compute units of each project/namespace
+- To provide access control on the TSDB/Pyroscope so that compute units of each project/namespace
 are only accessible to the members of that project/namespace
-- To provide basic load balancing for replicated TSDB instances.
+- To provide basic load balancing for replicated TSDB/Pyroscope instances.
 
-Thus, CEEMS load balancer can be configured as Prometheus data source in Grafana and
-the load balancer will take care of routing traffic to backend TSDB instances and at
+Thus, CEEMS load balancer can be configured as Prometheus and Pyroscope data sources in Grafana and
+the load balancer will take care of routing traffic to backend TSDB/Pyroscope instances and at
 the same time enforcing access control.
 
 ## Load balancing
@@ -53,6 +53,14 @@ CEEMS load balancer supports classic load balancing strategies like round-robin
 connection methods. Besides these two, it supports resource based strategy that is
 based on retention time. Let's take a look at this strategy in-detail.
 
+:::warning[WARNING]
+
+Resource based load balancing strategy is only supported for TSDB. For Pyroscope,
+this strategy is not supported and when used, it will be defaulted to least-connection
+strategy.
+
+:::
+
 Taking Prometheus TSDB as an example, Prometheus advises to use local file system to store
 the data. This ensure performance and data integrity. However, storing data on local
 disk is not fault tolerant unless data is replicated elsewhere. There are cloud native
@@ -77,12 +85,12 @@ then routing the request to either "hot" or "cold" instances of TSDB.
 ## Multi cluster support
 
 A single deployment of CEEMS load balancer is capable of loading balancing traffic between
-different replicated TSDB instances of multiple clusters. Imagine there are two different
+different replicated TSDB/Pyroscope instances of multiple clusters. Imagine there are two different
 clusters, one for SLURM and one for Openstack, in a DC. Slurm cluster has two dedicated
-TSDB instances where data is replicated between them and the same for Openstack cluster.
-Thus, in total, there are four TSDB instances, two for SLURM cluster and two for
+TSDB/Pyroscope instances where data is replicated between them and the same for Openstack cluster.
+Thus, in total, there are four TSDB/Pyroscope instances, two for SLURM cluster and two for
 Openstack cluster. A single instance of CEEMS load balancer can route the traffic
-between these four different TSDB instances by targeting the correct cluster.
+between these four different TSDB/Pyroscope instances by targeting the correct cluster.
 
 However, in the production with heavy traffic a single instance of CEEMS load balancer
 might not be a optimal solution. In that case, it is however possible to deploy a dedicated

diff --git a/website/docs/components/metrics.md b/website/docs/components/metrics.md
@@ -4,6 +4,38 @@ sidebar_position: 4
 
 # CEEMS Exporter Metrics
 
+CEEMS exporter ships multiple collectors of which some are enabled by
+default.
+
+## Enabled by default
+
+The following collectors are enabled by default
+
+- cpu
+- meminfo
+- rapl
+
+## Disabled by default
+
+The rest of the collectors and sub-collectors are disabled by default. Collectors
+disabled by default are:
+
+- ipmi_dcmi
+- emissions
+- slurm
+- libvirt
+
+Sub-collectors disabled by default are:
+
+- ebpf.io-metrics
+- ebpf.network-metrics
+- perf.hardware-events
+- perf.software-events
+- perf.hardware-cache-events
+- rdma.stats
+
+## Metrics list
+
 The following are the list of metrics exposed by CEEMS exporter along
 with the labels for each metric and its description. The first column
 shows the collector that metric belongs to.
@@ -16,10 +48,10 @@ shows the collector that metric belongs to.
 |  meminfo  |         ceems_meminfo_MemTotal_bytes         |           hostname           |                                                                    Total memory in the current host. As reported in `/proc/meminfo`                                                                   |
 |  meminfo  |          ceems_meminfo_MemFree_bytes         |           hostname           |                                                                 Total free memory in the current host. As reported in `/proc/meminfo`                                                                 |
 |  meminfo  |       ceems_meminfo_MemAvailable_bytes       |           hostname           |                                                               Total available memory in the current host. As reported in `/proc/meminfo`                                                              |
-|    ipmi   |         ceems_ipmi_dcmi_current_watts        |           hostname           |                                                                            Current power consumption reported by IPMI DCMI                                                                            |
-|    ipmi   |           ceems_ipmi_dcmi_avg_watts          |           hostname           |                                                                 Average power consumption reported by IPMI DCMI within sampling period                                                                |
-|    ipmi   |           ceems_ipmi_dcmi_min_watts          |           hostname           |                                                                 Minimum power consumption reported by IPMI DCMI within sampling period                                                                |
-|    ipmi   |           ceems_ipmi_dcmi_max_watts          |           hostname           |                                                                 Maximum power consumption reported by IPMI DCMI within sampling period                                                                |
+|    ipmi_dcmi   |         ceems_ipmi_dcmi_current_watts        |           hostname           |                                                                            Current power consumption reported by IPMI DCMI                                                                            |
+|    ipmi_dcmi   |           ceems_ipmi_dcmi_avg_watts          |           hostname           |                                                                 Average power consumption reported by IPMI DCMI within sampling period                                                                |
+|    ipmi_dcmi   |           ceems_ipmi_dcmi_min_watts          |           hostname           |                                                                 Minimum power consumption reported by IPMI DCMI within sampling period                                                                |
+|    ipmi_dcmi   |           ceems_ipmi_dcmi_max_watts          |           hostname           |                                                                 Maximum power consumption reported by IPMI DCMI within sampling period                                                                |
 |    rapl   |        ceems_rapl_package_joules_total       |         path,  index         |                                                     Current RAPL package energy value. Labels `index` and `path` gives info about package details.                                                    |
 |    rapl   |         ceems_rapl_dram_joules_total         |          path, index         |                                                      Current RAPL DRAM energy value. Labels `index` and `path` gives info about package details.                                                      |
 |    rapl   |         ceems_rapl_core_joules_total         |          path, index         |                                                      Current RAPL core energy value. Labels `index` and `path` gives info about package details.     

diff --git a/website/docs/configuration/ceems-lb.md b/website/docs/configuration/ceems-lb.md
@@ -4,6 +4,13 @@ sidebar_position: 4
 
 # CEEMS Load Balancer
 
+CEEMS load balancer supports providing load balancer for TSDB and Pyroscope
+servers. When both TSDB and Pyroscope backend servers are configured, CEEMS LB
+will launch two different web servers listening at two different ports one
+for TSDB and one for Pyroscope.
+
+## CEEMS Load Balancer Configuration
+
 CEEMS Load Balancer configuration has one main section and two optional
 section. A basic skeleton of the configuration is as follows:
 
@@ -28,22 +35,25 @@ A valid sample
 configuration file can be found in the
 [repo](https://github.com/mahendrapaipuri/ceems/blob/main/build/config/ceems_lb/ceems_lb.yml).
 
-## CEEMS Load Balancer Configuration
-
 A sample CEEMS LB config file is shown below:
 
 ```yaml
-
 ceems_lb:
   strategy: resource-based
   backends:
     - id: slurm-0
       tsdb_urls: 
         - http://localhost:9090
+      pyroscope_urls:
+        - http://localhost:4040
 
     - id: slurm-1
       tsdb_urls: 
         - http://localhost:9090
+
+    - id: slurm-2
+      pyroscope_urls: 
+        - http://localhost:4040
 ```
 
 - `strategy`: Load balancing strategy. Besides classical `round-robin` and
@@ -55,16 +65,44 @@ that has the data based on the time period in the query.
      that the `id` in the backend must be the same `id` used in the
      [Clusters Configuration](./ceems-api-server.md#clusters-configuration). This
      is how CEEMS LB will know which cluster to target.
-  - `backends.tsdb_urls`: A list of TSDB servers that scrape metrics from this
+  - `backends.tsdb_urls`: A list of TSDB servers that scrape metrics from the
+     cluster identified by `id`.
+  - `backends.pyroscope_urls`: A list of Pyroscope servers that store profiling data from the
      cluster identified by `id`.
 
 :::warning[WARNING]
 
+`resource-based` strategy is only supported for TSDB and when used along with
+Pyroscope, the load balancing strategy for Pyroscope servers will be defaulted
+to `least-connection`.
+
 CEEMS LB is meant to deploy in the same DMZ as the TSDB servers and hence, it
 does not support TLS for the backends.
 
 :::
 
+### CEEMS Load Balancer CLI configuration
+
+By default CEEMS LB servers listen at ports `9030` and `9040` when both
+TSDB and Pyroscope backend servers are configured. If intended to use
+custom ports, the CLI flag `--web.listen-address` must be repeated to set up
+port for TSDB and Pyroscope backends. For instance, for the sample config shown
+above, the CLI arguments to launch LB servers at custom ports will be:
+
+```bash
+ceems_lb --config.file config.yml --web.listen-address ":8000" --web.listen-address ":9000"
+```
+
+This will launch TSDB load balancer listening at port `8000` and Pyroscope load
+balancer listening at port `9000`.
+
+:::important[IMPORTANT]
+
+When both TSDB and Pyroscope backend servers are configured, the first listen
+address is attributed to TSDB and second one to Pyroscope.
+
+:::
+
 ### Matching `backends.id` with `clusters.id`
 
 #### Using custom header
@@ -148,7 +186,7 @@ For instance, for `slurm-0` cluster the provisioned datasource
 config for Grafana will look as follows:
 
 ```yaml
-- name: CEEMS-LB
+- name: CEEMS-TSDB-LB
   type: prometheus
   access: proxy
   url: http://localhost:9030
@@ -164,10 +202,25 @@ config for Grafana will look as follows:
   secureJsonData:
     basicAuthPassword: <ceems_lb_basic_auth_password>
     httpHeaderValue1: slurm-0
-  isDefault: true
 ```
 
-assuming CEEMS LB is running at port 9030 on the same host as Grafana.
+assuming CEEMS LB is running at port 9030 on the same host as Grafana. Similarly,
+for Pyroscope the provisioned config must look like:
+
+```yaml
+- name: CEEMS-Pyro-LB
+  type: pyroscope
+  access: proxy
+  url: http://localhost:9040
+  basicAuth: true
+  basicAuthUser: ceems
+  jsonData:
+    httpHeaderName1: X-Ceems-Cluster-Id
+  secureJsonData:
+    basicAuthPassword: <ceems_lb_basic_auth_password>
+    httpHeaderValue1: slurm-0
+```
+
 Notice that we set the header and value in `jsonData` and `secureJsonData`,
 respectively. This ensures that datasource will send the header with
 every request to CEEMS LB and then LB will redirect the query request
@@ -201,6 +254,23 @@ CEEMS LB, the query label will take the precedence.
 
 :::
 
+Similarly for setting up this label on profiling data in Pyroscope,
+it is necessary to use `external_labels` config parameter for Grafana
+Alloy when exporting profiles to Pyroscope server. A sample config
+for Grafana Alloy that pushes profiling data can be as follows:
+
+```river
+pyroscope.write "monitoring" {
+  endpoint {
+    url = "http://pyroscope:4040"
+  }
+
+  external_labels = {
+    "ceems_id" = "slurm-0",
+  }
+}
+```
+
 ## CEEMS API Server Configuration
 
 This is an optional config when provided will enforce access