Skip to content

Commit

Permalink
added expiry metrics and alerts (#121)
Browse files Browse the repository at this point in the history
* added expiry metrics and alerts

Signed-off-by: raffaelespazzoli <[email protected]>

* addressed Andrew's comment

Signed-off-by: raffaelespazzoli <[email protected]>
  • Loading branch information
raffaelespazzoli committed Feb 17, 2022
1 parent 3ae3a41 commit efc5a23
Show file tree
Hide file tree
Showing 21 changed files with 292 additions and 46 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,5 @@ testbin/*

bundle/
bundle.Dockerfile
charts/
charts/
config/local-development/tilt/replace-image.yaml
16 changes: 10 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -203,18 +203,22 @@ catalog-build: opm ## Build a catalog image.
catalog-push: ## Push a catalog image.
$(MAKE) docker-push IMG=$(CATALOG_IMG)

# Generate helm chart
helmchart: kustomize
mkdir -p ./charts/${OPERATOR_NAME}/templates
mkdir -p ./charts/${OPERATOR_NAME}/crds
repo=${OPERATOR_NAME} envsubst < ./config/local-development/tilt/env-replace-image.yaml > ./config/local-development/tilt/replace-image.yaml
$(KUSTOMIZE) build ./config/helmchart -o ./charts/${OPERATOR_NAME}/templates
sed -i 's/\([{}]\{2\}\)/{{ "\1" }}/g' ./charts/${OPERATOR_NAME}/templates/monitoring.coreos.com_v1_prometheusrule_${OPERATOR_NAME}-certificate-rule-alerts.yaml
sed -i 's/release-namespace/{{.Release.Namespace}}/' ./charts/${OPERATOR_NAME}/templates/*.yaml
rm ./charts/${OPERATOR_NAME}/templates/v1_namespace_release-namespace.yaml ./charts/${OPERATOR_NAME}/templates/apps_v1_deployment_${OPERATOR_NAME}-controller-manager.yaml
cp ./config/helmchart/templates/* ./charts/${OPERATOR_NAME}/templates
$(KUSTOMIZE) build ./config/helmchart | sed 's/namespace: system/namespace: {{ .Release.Namespace }}/' > ./charts/${OPERATOR_NAME}/templates/rbac.yaml
if [ -d "./config/crd" ]; then $(KUSTOMIZE) build ./config/crd > ./charts/${OPERATOR_NAME}/crds/crds.yaml; fi
version=${VERSION} envsubst < ./config/helmchart/Chart.yaml.tpl > ./charts/${OPERATOR_NAME}/Chart.yaml
version=${VERSION} image_repo=$${IMG%:*} envsubst < ./config/helmchart/values.yaml.tpl > ./charts/${OPERATOR_NAME}/values.yaml
sed -i '/^apiVersion: monitoring.coreos.com/i {{ if .Values.enableMonitoring }}' ./charts/${OPERATOR_NAME}/templates/rbac.yaml
echo {{ end }} >> ./charts/${OPERATOR_NAME}/templates/rbac.yaml
helm lint ./charts/${OPERATOR_NAME}
sed -i '1s/^/{{ if .Values.enableMonitoring }}/' ./charts/${OPERATOR_NAME}/templates/monitoring.coreos.com_v1_servicemonitor_${OPERATOR_NAME}-controller-manager-metrics-monitor.yaml
echo {{ end }} >> ./charts/${OPERATOR_NAME}/templates/monitoring.coreos.com_v1_servicemonitor_${OPERATOR_NAME}-controller-manager-metrics-monitor.yaml
sed -i '1s/^/{{ if .Values.enableMonitoring }}/' ./charts/${OPERATOR_NAME}/templates/monitoring.coreos.com_v1_prometheusrule_${OPERATOR_NAME}-certificate-rule-alerts.yaml
echo {{ end }} >> ./charts/${OPERATOR_NAME}/templates/monitoring.coreos.com_v1_prometheusrule_${OPERATOR_NAME}-certificate-rule-alerts.yaml
helm lint ./charts/${OPERATOR_NAME}

helmchart-repo: helmchart
mkdir -p ${HELM_REPO_DEST}/${OPERATOR_NAME}
Expand Down
28 changes: 21 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,17 @@ Here is an example of a certificate soon-to-expiry event:

![cert-expiry](media/cert-expiry.png)

In addition to this, This operator generates the following metrics for al TLS certificates:

| Metric Name | Descrption |
|:-:|:-:|
| certutils_certificate_issue_time | time at which the certificate was created in seconds from from January 1, 1970 UTC |
| certutils_certificate_expiry_time | time at which the certificate expires in seconds from from January 1, 1970 UTC |
| cert:validity_duration:sec | duration of the certificate validity in seconds |
| cert:time_to_expiration:sec | time left to expiration in seconds |

The operator also sets two alerts that fire respectively when a certificate has 15% and 5% left of its lifetime.

## CA Injection

[ValidatingWebhookConfiguration](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/), [MutatingWebhokConfiguration](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/) [CustomResourceDefinition](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) and [APIService](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/) types of objects (and possibly in the future others) need the master API process to connect to trusted servers to perform their function. In order to do so over an encrypted connection, a CA bundle needs to be configured. In these objects the CA bundle is passed as part of the CR and not as a secret, and that is fine because the CA bundles are public info. However it may be difficult at deploy time to know what the correct CA bundle should be. Often the CA bundle needs to be discovered as a piece on information owned by some other objects of the cluster.
Expand Down Expand Up @@ -187,7 +198,7 @@ It is recommended to deploy this operator via [`OperatorHub`](https://operatorhu
| amd64 | ✅ |
| arm64 | ✅ |
| ppc64le | ✅ |
| s390x | |
| s390x | |

### Deploying from OperatorHub

Expand Down Expand Up @@ -243,13 +254,16 @@ helm upgrade cert-utils-operator cert-utils-operator/cert-utils-operator

## Running the operator locally

> Note: this operator build process is tested with [podman](https://podman.io/), but some of the build files (Makefile specifically) use docker because they are generated automatically by operator-sdk. It is recommended [remap the docker command to the podman command](https://developers.redhat.com/blog/2020/11/19/transitioning-from-docker-to-podman#transition_to_the_podman_cli).

```shell
make manifests
oc new-project cert-utils-operator-local
kustomize build ./config/local-development | oc apply -f - -n cert-utils-operator-local
export token=$(oc serviceaccounts get-token 'cert-utils-controller-manager' -n cert-utils-operator-local)
oc login --token ${token}
make run ENABLE_WEBHOOKS=false
export repo=raffaelespazzoli
docker login quay.io/$repo
oc new-project cert-utils-operator
oc project cert-utils-operator
oc label namespace cert-utils-operator openshift.io/cluster-monitoring="true"
envsubst < config/local-development/tilt/env-replace-image.yaml > config/local-development/tilt/replace-image.yaml
tilt up
```

### Test helm chart locally
Expand Down
25 changes: 25 additions & 0 deletions Tiltfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# -*- mode: Python -*-

compile_cmd = 'CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o bin/manager main.go'
image = 'quay.io/' + os.environ['repo'] + '/cert-utils-operator'

local_resource(
'cert-utils-operator-compile',
compile_cmd,
deps=['./main.go','./api','./controllers'])


custom_build(
image,
'podman build -t $EXPECTED_REF --ignorefile ci.Dockerfile.dockerignore -f ./ci.Dockerfile . && podman push $EXPECTED_REF $EXPECTED_REF',
entrypoint=['/manager'],
deps=['./bin'],
live_update=[
sync('./bin/manager',"/manager"),
],
skips_local_docker=True,
)

allow_k8s_contexts(k8s_context())
k8s_yaml(kustomize('./config/local-development/tilt'))
k8s_resource('cert-utils-operator-controller-manager',resource_deps=['cert-utils-operator-compile'])
7 changes: 7 additions & 0 deletions ci.Dockerfile.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
api/
bundle/
config/
controllers/
examples/
hack/
test/
5 changes: 5 additions & 0 deletions config/default/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,11 @@ vars:
name: controller-manager-metrics
fieldref:
fieldpath: metadata.namespace
- name: ROLE_NAME
objref:
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
name: prometheus-k8s
# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER' prefix.
#- name: CERTIFICATE_NAMESPACE # namespace of the certificate CR
# objref:
Expand Down
4 changes: 4 additions & 0 deletions config/helmchart/cert-manager-ca-injection.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
- op: add
path: /metadata/annotations
value:
cert-manager.io/inject-ca-from: "{{ .Release.Namespace }}/webhook-server-cert"
42 changes: 13 additions & 29 deletions config/helmchart/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -1,34 +1,18 @@
# Adds namespace to all resources.


# Value of this field is prepended to the
# names of all resources, e.g. a deployment named
# "wordpress" becomes "alices-wordpress".
# Note that it should also match with the prefix (text before '-') of the namespace
# field above.
namePrefix: cert-utils-operator-

# Labels to add to all resources and selectors.
#commonLabels:
# someName: someValue

resources:
- service-account.yaml
namespace: release-namespace

bases:
- ../rbac
- ../prometheus
- ../local-development/tilt

vars:
- name: METRICS_SERVICE_NAME
objref:
kind: Service
patchesJson6902:
- target:
group: admissionregistration.k8s.io
version: v1
name: controller-manager-metrics
- name: METRICS_SERVICE_NAMESPACE
objref:
kind: Service
kind: MutatingWebhookConfiguration
name: cert-utils-operator-mutating-webhook-configuration
path: ./cert-manager-ca-injection.yaml
- target:
group: admissionregistration.k8s.io
version: v1
name: controller-manager-metrics
fieldref:
fieldpath: metadata.namespace
kind: ValidatingWebhookConfiguration
name: cert-utils-operator-validating-webhook-configuration
path: ./cert-manager-ca-injection.yaml
8 changes: 8 additions & 0 deletions config/local-development/tilt/env-replace-image.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
- op: replace
path: /spec/template/spec/containers/1/image
value:
quay.io/$repo/cert-utils-operator:latest
- op: add
path: /spec/template/spec/containers/1/args/-
value:
--zap-devel=true
19 changes: 19 additions & 0 deletions config/local-development/tilt/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Adds namespace to all resources.
namespace: cert-utils-operator

# Labels to add to all resources and selectors.
#commonLabels:
# someName: someValue

bases:
- ../../default
- ./service-account.yaml


patchesJson6902:
- target:
group: apps
version: v1
kind: Deployment
name: cert-utils-operator-controller-manager
path: ./replace-image.yaml
15 changes: 15 additions & 0 deletions config/local-development/tilt/prometheus-role.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: prometheus-k8s
rules:
- apiGroups:
- ""
resources:
- endpoints
- pods
- services
verbs:
- get
- list
- watch
12 changes: 12 additions & 0 deletions config/local-development/tilt/prometheus-rolebinding.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: prometheus-k8s
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: prometheus-k8s
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: openshift-monitoring
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@ apiVersion: v1
kind: ServiceAccount
metadata:
name: controller-manager
namespace: system
namespace: system
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ metadata:
description: Set of utilities for TLS certificates
operatorframework.io/suggested-namespace: cert-utils-operator
operators.openshift.io/infrastructure-features: '["Disconnected"]'
operatorframework.io/cluster-monitoring: "true"
repository: https://github.com/redhat-cop/cert-utils-operator
support: Best Effort
labels:
Expand Down
3 changes: 3 additions & 0 deletions config/prometheus/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
resources:
- monitor.yaml
- role.yaml
- rolebinding.yaml
- rules.yaml

configurations:
- kustomizeconfig.yaml
4 changes: 3 additions & 1 deletion config/prometheus/kustomizeconfig.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
---
varReference:
- path: spec/endpoints/tlsConfig/serverName
kind: ServiceMonitor
kind: ServiceMonitor
- path: roleRef/name
kind: RoleBinding
16 changes: 16 additions & 0 deletions config/prometheus/role.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: prometheus-k8s
namespace: system
rules:
- apiGroups:
- ""
resources:
- endpoints
- pods
- services
verbs:
- get
- list
- watch
13 changes: 13 additions & 0 deletions config/prometheus/rolebinding.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: prometheus-k8s
namespace: system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: $(ROLE_NAME)
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: openshift-monitoring
34 changes: 34 additions & 0 deletions config/prometheus/rules.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: certificate-rule-alerts
spec:
groups:
- name: cert-utils-operator-recording-rules
rules:
- record: cert:validity_duration:sec
expr: certutils_certificate_expiry_time - certutils_certificate_issue_time
- record: cert:time_to_expiration:sec
expr: certutils_certificate_expiry_time - time()
- name: cert-utils-operator-alerting-rules
rules:
- alert: CertificateApproachingExpiration
annotations:
message: >-
Certificate {{ $labels.namespace }}/{{ $labels.name }} is at 85% of its lifetime
summary: >-
Certificate {{ $labels.namespace }}/{{ $labels.name }} is at 85% of its lifetime
expr: |
cert:time_to_expiration:sec/cert:validity_duration:sec < 0.15
labels:
severity: warning
- alert: CertificateIsAboutToExpire
annotations:
message: >-
Certificate {{ $labels.namespace }}/{{ $labels.name }} is at 95% of its lifetime
summary: >-
Certificate {{ $labels.namespace }}/{{ $labels.name }} is at 95% of its lifetime
expr: >
cert:time_to_expiration:sec/cert:validity_duration:sec < 0.05
labels:
severity: critical
Loading

0 comments on commit efc5a23

Please sign in to comment.