This tool allows OpenShift users to run a watcher for Prometheus queries and define thresholds (using a yaml file) to observe the performance of the OpenShift cluster during performance testing. It could be generalized to run constantly against a cluster and alert you when cluster is looking bad. It may sound like some of the other monitoring & alerting solutions but its supposed to be simple, scalable and user-friendly.
- Runs external and can work with any "Prometheus"
- Can be extended to run queries other than Prometheus, such as ElasticSearch, or simple OC CLI commands
- History for each time you run - can be stored in log files
┌─────────────────────┐ ┌───────────────────────────┐
│ │ │ OpenShift │
│ Benchmark Job │ │ │
│ │ │ ┌───────────────┐ │
│ (optional) │ │ │ Prometheus │ │
└────────▲────────────┘ │ │ ▲ │ │
│ │ └───────┬───────┘ │ - At least one prometheus cluster info required
Ability to kill benchmark job │ │ │
│ │ │ │
│ └─────────────┼─────────────┘
┌────────┴────────────┐ │
┌─────────────────┐ │ │ Determines Url and Token │
│ │ │ Continuous Perf ├───────────────────────────────────────────┘
│ Slack Notifs. ◄─────┤ Analysis - CPA │ Runs Queries
│ │ │ │
│ │ │ │ ┌──────────────────────────┐
└─────────────────┘ └───────┬─────────────┘ │ │
│ │ │
│ Requires Url and Token │ Prometheus - external
└───────────────────────────────────────────►│ │
Runs Queries │ │
└──────────────────────────┘
- Create oc cli connection to OpenShift/Kubernetes using Kubeconfig
- Determine Prometheus url, bearerToken for OpenShift
- If Prometheus url, bearerToken already included in the yaml, use that
- Create yaml format for queries, and expected outcomes (Use a struct to read that in)
- Spwan go routine to run queries and analyze results
- Spwan goroutine to receive notification when a query yields "False" value
- Update to latest go and recompile
- Add CLI to the program
- Add a parameter to read different query files in config dir
- Add parameter for clearing/not-clearing screen
- Add Parameter for timeout
- Add a Makefile
- File logging the output
- Print output to screen even when logging enabled - simultaneously
- Let user decide query frequency
- Slack Notification
- Notify/Do Something(e.g. Pause/Kill benchmark jobs to preserve cluster) when results don't match conditions
- Spawn goroutines to keep running queries and evaluating results to handle scale - e.g. when we have very large number of queries in the yaml file, we can divide and concurrently run queries
- If slack config is not set, it is ignored and no attempts will be made to notify via slack
- debug/verbose mode
- Enhance log files to include uuid/time
- Use env vars
- RFE: come up with a basic "cluster health" profile that anyone can use. Operator monitoring + some best practice monitors from the dittybopper dashboards
- Then build the binary using make file:
make build
or update your binary usingmake update
. You Can clean existin binary withmake clean
or do clean and update/build usingmake all
. - Set
KUBECONFIG
envvar, and make sure to reviewconfig/queries.yaml
. - You can then run the following command:
./bin/cpa -t 60s -h
Usage: cpa [--noclrscr] [--queries QUERIES] [--query-frequency QUERY-FREQUENCY] [--timeout TIMEOUT] [--log-output] [--terminate-benchmark TERMINATE-BENCHMARK]
Options:
--noclrscr Do not clear screen after each iteration. Clears screen by default. [default: false]
--queries QUERIES, -q QUERIES
queries file to use [default: queries.yaml]
--query-frequency QUERY-FREQUENCY, -f QUERY-FREQUENCY
How often do we run queries. You can pass values like 4h or 1h10m10s [default: 20s]
--timeout TIMEOUT, -t TIMEOUT
Duration to run Continuous Performance Analysis. You can pass values like 4h or 1h10m10s [default: 4h]
--log-output, -l Output will be stored in a log file(cpa.log) in addition to stdout. [default: false]
--terminate-benchmark TERMINATE-BENCHMARK, -k TERMINATE-BENCHMARK
When CPA is running in parallel with benchmark job, let CPA know to kill benchmark if any query fail. (E.g. -k <processID>) Helpful to preserve cluster for further analysis.
--help, -h display this help and exit