Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cosmos-operator-rpc-node): port local git chart changes #5

Merged
merged 1 commit into from
Jan 7, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions charts/cosmos-operator-rpc-node/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# RPC node Helm Chart

A Helm chart for deploying cosmos-sdk based RPC node.
The chart utilizes [the Kubernetes cosmos-operator](https://github.com/strangelove-ventures/cosmos-operator).

## Overview

This chart deploys a complete RPC node including:

- CosmosFullNode (cosmos-operator's CRD)
- Ingress with configrable endpoints (RPC, gRPC, API, WS, etc)
- Optional prometheus monitoring for:
- Default CometBFT metrics
- Latest block height compared to a given public RPC endpoint
- Endpoint monitoring for the deployed RPC endpoint

## Prerequisites

- Helm 3.2.0+
- Cosmos-operator
- Ingress Controller (nginx)
- Cert-manager (optional, for TLS)
- Prometheus blackbox exporter (optional, for monitoring)
- Prometheus json exporter (optional, for monitoring)

## Installation


1. Customize the values file as needed or create a new one, and install the chart (note that the node name will be taken from the helm release name)

```bash
helm install <release-name> . \
--create-namespace \
--namespace <namespace> \
-f <values-file>
```

## Contributing

[Contributing guidelines](CONTRIBUTING.md)

## License

Apache 2.0
10 changes: 10 additions & 0 deletions charts/cosmos-operator-rpc-node/templates/NOTES.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Your HTTP Basic Auth credentials have been automatically generated.

Username: {{ randAlphaNum 8 }}
Password: {{ randAlphaNum 16 }}

These have been stored in the {{ .Release.Name }}-basic-auth-creds Kubernetes Secret.

You can find them by running:

kubectl -n {{ .Release.Namespace }} get secret {{ .Release.Name }}-basic-auth-creds -o jsonpath="{.data}" | jq -r 'to_entries[] | "\(.key): \(.value | @base64d)"'
4 changes: 2 additions & 2 deletions charts/cosmos-operator-rpc-node/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
{{- if (index $context.Values.endpoints $endpointName).host }}
{{- printf "%s" (index $context.Values.endpoints $endpointName).host }}
{{- else if eq $endpointName "rpc" }}
{{- printf "%s-%s.%s" $context.Release.Name $context.Values.blch.nodeType "tm.p2p.org" }}
{{- printf "%s-%s.%s" $context.Release.Name $context.Values.blch.nodeType $context.Values.endpointsBaseDomain }}
{{- else }}
{{- printf "%s-%s-%s.%s" $context.Release.Name $context.Values.blch.nodeType $endpointName "tm.p2p.org" }}
{{- printf "%s-%s-%s.%s" $context.Release.Name $context.Values.blch.nodeType $endpointName $context.Values.endpointsBaseDomain }}
{{- end -}}
{{- end -}}
32 changes: 32 additions & 0 deletions charts/cosmos-operator-rpc-node/templates/basic-auth-secret.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{{- if .Values.basicAuth.enabled }}
{{- $existingSecret := lookup "v1" "Secret" .Release.Namespace (printf "%s-basic-auth-creds" .Release.Name) }}
{{- if not $existingSecret }}
{{- $username := randAlphaNum 8 }}
{{- $password := randAlphaNum 16 }}
---
apiVersion: v1
kind: Secret
metadata:
name: {{ .Release.Name }}-basic-auth-creds
annotations:
"helm.sh/hook": pre-upgrade, pre-install
"argocd.argoproj.io/hook": PreSync
"argocd.argoproj.io/hook-delete-policy": HookFailed
type: Opaque
data:
username: {{ $username | b64enc | quote }}
password: {{ $password | b64enc | quote }}
---
apiVersion: v1
kind: Secret
metadata:
name: {{ .Release.Name }}-basic-auth
annotations:
"helm.sh/hook": pre-upgrade, pre-install
"argocd.argoproj.io/hook": PreSync
"argocd.argoproj.io/hook-delete-policy": HookFailed
type: Opaque
data:
auth: {{ htpasswd $username $password | b64enc | quote }}
{{- end }}
{{- end }}
23 changes: 19 additions & 4 deletions charts/cosmos-operator-rpc-node/templates/ingress-nlb.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,23 +8,38 @@ apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: {{ default $name $val.ingressName }}-nlb
{{- if $val.additionalIngressAnnotations }}
annotations:
{{ toYaml $val.additionalIngressAnnotations | nindent 4 }}
{{- end }}
{{ toYaml $val.additionalIngressAnnotations | nindent 4 }}
{{- if $.Values.basicAuth.enabled }}
nginx.ingress.kubernetes.io/auth-type: "basic"
nginx.ingress.kubernetes.io/auth-secret: "{{ $.Release.Name }}-basic-auth"
nginx.ingress.kubernetes.io/auth-realm: "Authentication Required - Login"
{{- end }}
spec:
ingressClassName: nginx-nlb
rules:
- host: {{ $host | quote }}
http:
paths:
{{- if kindIs "slice" $val.path }}
{{- range $val.path }}
- path: {{ .path }}
pathType: ImplementationSpecific
backend:
service:
name: {{ $.Release.Name }}-rpc
port:
number: {{ .servicePort }}
{{- end }}
{{- else }}
- path: {{ default "/" $val.path }}
pathType: ImplementationSpecific
backend:
service:
name: {{ $.Release.Name }}-rpc
port:
number: {{ $val.servicePort }}
number: {{ $val.servicePort }}
{{- end }}
tls:
- hosts:
- {{ default $host $val.tlsHost | quote }}
Expand Down
54 changes: 21 additions & 33 deletions charts/cosmos-operator-rpc-node/templates/prometheus-rules.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,32 +11,32 @@ spec:
groups:
- name: blockchain-alerts
rules:
- record: chain_block_height_diff
expr: |
label_replace(chain_latest_block_height{cosmos_node="{{ .Release.Name }}", namespace="{{ .Release.Namespace }}"}, "cosmos_node", "$1", "pod", "")
- on (namespace) group_left(pod)
cometbft_consensus_height{pod=~"{{ .Release.Name }}-.?", namespace="{{ .Release.Namespace }}"}
labels:
cosmos_node: "{{ .Release.Name }}"
namespace: "{{ .Release.Namespace }}"
- alert: BlockHeightDifferenceGrowing
expr: |
chain_block_height_diff{cosmos_node="{{ .Release.Name }}", namespace="{{ .Release.Namespace }}"} > {{ .Values.monitoring.alerts.growingBlockHeightDifference }}
sum(chain_latest_block_height{cosmos_node="{{ .Release.Name }}", namespace="{{ .Release.Namespace }}", rpc_endpoint="{{ .Values.monitoring.publicRpcEndpoint }}"})
- sum(chain_latest_block_height{cosmos_node="{{ .Release.Name }}", namespace="{{ .Release.Namespace }}", rpc_endpoint="{{ include "host" (dict "context" $ "endpointName" "rpc") }}"})
> {{ .Values.monitoring.alerts.growingBlockHeightDifference }}
for: 5m
labels:
severity: warning
namespace: {{ .Release.Namespace }}
pod: {{ .Release.Name }}
annotations:
summary: "Block height difference is growing for chain {{ .Values.blch.id }}"
description: "{{ .Release.Name }} node for chain {{ .Values.blch.id }} in namespace {{ .Values.namespace }} is more than {{ .Values.monitoring.alerts.growingBlockHeightDifference }} blocks behind the public RPC endpoint."
summary: "Block height difference is growing for {{ .Release.Namespace }}/{{ .Release.Name }}"
description: "{{ .Release.Name }} node in namespace {{ .Release.Namespace }} is more than {{ .Values.monitoring.alerts.growingBlockHeightDifference }} blocks behind the public RPC endpoint."
- alert: BlockHeightDifferenceCritical
expr: |
chain_block_height_diff{cosmos_node="{{ .Release.Name }}", namespace="{{ .Release.Namespace }}"} > {{ .Values.monitoring.alerts.maximumBlockHeightDifference }}
sum(chain_latest_block_height{cosmos_node="{{ .Release.Name }}", namespace="{{ .Release.Namespace }}", rpc_endpoint="{{ .Values.monitoring.publicRpcEndpoint }}"})
- sum(chain_latest_block_height{cosmos_node="{{ .Release.Name }}", namespace="{{ .Release.Namespace }}", rpc_endpoint="{{ include "host" (dict "context" $ "endpointName" "rpc") }}"})
> {{ .Values.monitoring.alerts.maximumBlockHeightDifference }}
for: 5m
labels:
severity: critical
namespace: {{ .Release.Namespace }}
pod: {{ .Release.Name }}
annotations:
summary: "Block height difference too high for chain {{ .Values.blch.id }}"
description: "{{ .Release.Name }} node for chain {{ .Values.blch.id }} in namespace {{ .Values.namespace }} is more than {{ .Values.monitoring.alerts.maximumBlockHeightDifference }} blocks behind the public RPC endpoint."
summary: "Block height difference too high for {{ .Release.Namespace }}/{{ .Release.Name }}"
description: "{{ .Release.Name }} node in namespace {{ .Release.Namespace }} is more than {{ .Values.monitoring.alerts.maximumBlockHeightDifference }} blocks behind the public RPC endpoint."
- alert: CometBFTPeersDrop
expr: |
(cometbft_p2p_peers{pod=~"{{ .Release.Name }}-.*", namespace="{{ .Release.Namespace }}"}
Expand All @@ -47,33 +47,21 @@ spec:
labels:
severity: info
annotations:
summary: "CometBFT P2P Peers Drop for {{`$labels.chain_id`}}"
description: "The number of P2P peers for chain {{`$labels.chain_id`}} in {{`$labels.namespace`}}/{{`$labels.pod`}} has dropped by more than 25% over the last 5 minutes."
- alert: LowTxSuccessRate
expr: |
cometbft_consensus_total_txs{pod=~"{{ .Release.Name }}-.*", namespace="{{ .Release.Namespace }}"}
- cometbft_mempool_failed_txs{pod=~"{{ .Release.Name }}-.*", namespace="{{ .Release.Namespace }}"}
/ cometbft_consensus_total_txs{pod=~"{{ .Release.Name }}-.*", namespace="{{ .Release.Namespace }}"}
* 100 < {{ .Values.monitoring.alerts.txSuccessRateThreshold }}
for: 5m
labels:
severity: warning
annotations:
summary: "High Failed TXs for {{`$labels.pod`}}"
description: "Transaction success rate is below the SLO for {{`$labels.chain_id`}} in {{`$labels.namespace`}} /{{`$labels.pod`}}."
- alert: RpcSvcDown
summary: "CometBFT P2P Peers Drop for {{ .Release.Name }} node in namespace {{ .Release.Namespace }}"
description: "The number of P2P peers for {{ .Release.Name }} node in namespace {{ .Release.Namespace }} has dropped by more than 25% over the last 5 minutes."
- alert: RpcEndpointDown
expr: |
probe_http_status_code{
rpc_svc="{{ .Release.Name }}-rpc.{{ .Release.Namespace }}"}
rpc_endpoint="{{ include "host" (dict "context" $ "endpointName" "rpc") }}"}
< 200
or
probe_http_status_code{rpc_svc="{{ .Release.Name }}-rpc.{{ .Release.Namespace }}"}
probe_http_status_code{rpc_endpoint="{{ include "host" (dict "context" $ "endpointName" "rpc") }}"}
>= 300
for: 5m
labels:
severity: critical
annotations:
summary: "The RPC svc for {{ .Release.Name }}-rpc.{{ .Release.Namespace }} is down."
description: "Service {{ .Release.Name }}-rpc in namespace {{ .Release.Namespace }} has been down for the last 5 minutes."
summary: "The RPC endpoint {{ include "host" (dict "context" $ "endpointName" "rpc") }} is down."
description: "Endpoint for {{ .Release.Name }} node in namespace {{ .Release.Namespace }} has been down for the last 5 minutes."
{{- end -}}
{{- end -}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
{{- if .Values.publishSnapshot.enabled -}}
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: {{ .Release.Name }}-public-snapshot
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
nginx.ingress.kubernetes.io/backend-protocol: HTTPS
nginx.ingress.kubernetes.io/rewrite-target: {{ .Values.publishSnapshot.pathPrefix }}/{{ .Values.blch.name }}/index.html
nginx.ingress.kubernetes.io/upstream-vhost: {{ .Values.publishSnapshot.baseDomain }}
spec:
ingressClassName: nginx-nlb
rules:
- host: {{ .Values.blch.name }}-{{ .Values.blch.network }}-{{ .Values.blch.nodeType }}-snapshots.{{ .Values.endpointsBaseDomain }}
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: {{ .Release.Name }}-public-snapshot
port:
number: 443
tls:
- hosts:
- {{ .Values.blch.name }}-{{ .Values.blch.network }}-{{ .Values.blch.nodeType }}-snapshots.{{ .Values.endpointsBaseDomain }}
secretName: {{ .Release.Name }}-public-snapshot-tls
{{- end -}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{{- if .Values.publishSnapshot.enabled -}}
apiVersion: v1
kind: ServiceAccount
metadata:
name: snapshots-bucket-{{ .Release.Name }}
{{- end -}}
12 changes: 12 additions & 0 deletions charts/cosmos-operator-rpc-node/templates/public-snapshot-svc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{{- if .Values.publishSnapshot.enabled -}}
apiVersion: v1
kind: Service
metadata:
name: {{ .Release.Name }}-public-snapshot
spec:
type: ExternalName
externalName: {{ .Values.publishSnapshot.baseDomain }}
ports:
- port: 443
targetPort: 443
{{- end -}}
30 changes: 26 additions & 4 deletions charts/cosmos-operator-rpc-node/templates/rpc_node.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,12 @@ metadata:
{{ end }}

spec:
{{ if and (eq (int .Values.replicas) 1) .Values.rollingUpdateEnabled }}
# Only one replica will be running, the second replica will be disabled and used only during rolling updates
replicas: 2
{{ else}}
replicas: {{ .Values.replicas }}
{{ end }}
{{ if .Values.maxUnavailable }}
strategy:
maxUnavailable: {{ .Values.maxUnavailable }}
Expand All @@ -36,6 +41,13 @@ spec:
overrides: |-
{{ .Values.blch.appOverrides | nindent 8 }}
{{- end }}
{{ if .Values.blch.dataDir }}
dataDir: {{ .Values.blch.dataDir }}
{{ end }}
{{ if .Values.blch.startCmd }}
startCmd:
{{ toYaml .Values.blch.startCmd | nindent 6 }}
{{ end }}
network: {{ .Values.blch.network }}
chainID: {{ .Values.blch.id }}
binary: {{ .Values.blch.binary }}
Expand All @@ -57,6 +69,12 @@ spec:
{{ if .Values.blch.addrbookURL }}
addrbookURL: {{ .Values.blch.addrbookURL }}
{{ end }}
{{ if .Values.blch.snapshotScript }}
snapshotScript: {{ toYaml .Values.blch.snapshotScript | nindent 6 }}
{{ end }}
{{ if .Values.blch.genesisScript }}
genesisScript: {{ toYaml .Values.blch.genesisScript | nindent 6 }}
{{ end }}
{{ if .Values.blch.config }}
config:
{{ if .Values.blch.config.seeds }}
Expand All @@ -77,8 +95,8 @@ spec:
image: "{{ .Values.image }}:{{ .Values.imageTag }}"
{{ if .Values.resources }}
resources:
{{- toYaml .Values.resources | nindent 6 }}
{{- end }}
{{ toYaml .Values.resources | nindent 6 }}
{{ end }}
{{ if .Values.nodeSelectorLabel }}
nodeSelector:
{{ toYaml .Values.nodeSelectorLabel | nindent 6 }}
Expand Down Expand Up @@ -122,10 +140,14 @@ spec:
{{ if .Values.priorityClassName }}
priorityClassName: {{ .Values.priorityClassName }}
{{ end }}
{{ if .Values.additionalServiceConfig }}
service:
{{ toYaml .Values.additionalServiceConfig | nindent 4 }}
{{ if .Values.additionalServiceConfig }}
{{ toYaml .Values.additionalServiceConfig | nindent 6 }}
{{ end }}
{{ if .Values.service.publishSvcDuringSync }}
rpcTemplate:
publishNotReadyAddresses: true
{{ end }}
volumeClaimTemplate:
resources:
requests:
Expand Down
Loading
Loading