-
Notifications
You must be signed in to change notification settings - Fork 18
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
6 changed files
with
136 additions
and
75 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# Defragmenting Data | ||
For dense Kubernetes clusters, `etcd` can suffer from poor performance if the keyspace grows too large and exceeds the space quota. Periodically maintain and defragment `etcd` to free up space in the data store. See details [here](https://etcd.io/docs/v3.5/op-guide/maintenance/). | ||
|
||
Monitor Prometheus for `etcd` metrics and defragment it when required, otherwise, `etcd` can raise a cluster-wide alarm that puts the cluster into a maintenance mode accepting only key reads and deletes. | ||
|
||
To keep track of defragmentation requirements, monitor these key metrics: | ||
|
||
- `etcd_server_quota_backend_bytes`: which is the current quota limit | ||
- `etcd_mvcc_db_total_size_in_use_in_bytes`: which indicates the actual database usage after a history compaction | ||
- `etcd_mvcc_db_total_size_in_bytes`, which shows the database size, including free space waiting for defragmentation | ||
|
||
You can also determine whether defragmentation is needed by checking the `etcd` database size in MB that will be freed by defragmentation with the PromQL expression: | ||
|
||
- `(etcd_mvcc_db_total_size_in_bytes - etcd_mvcc_db_total_size_in_use_in_bytes)/1024/1024` | ||
|
||
Defragmentation is an expensive operation, so it should be executed as infrequently as possible. On the other hand, it's also necessary to make sure any `etcd` member will not exceed the storage quota. The Kubernetes project recommends that when you perform defragmentation, you use a tool such as [etcd-defrag](https://github.com/ahrtr/etcd-defrag). | ||
|
||
The `defrag.sh` script is designed to create and schedule jobs for periodically defragment data on a `kamaji-etcd` instance. The script generates Kubernetes CronJob manifests and applies them to the specified namespace. Make sure you set the defragmentation criteria according to your environment needs. | ||
|
||
|
||
## Usage | ||
To run the script, use the following command: | ||
|
||
```bash | ||
./defrag.sh [-e etcd_name] [-s etcd_service] [-n etcd_namespace] [-j schedule] | ||
``` | ||
|
||
## Parameters | ||
|
||
- `-e etcd_name`: Name of the etcd StatefulSet (default: `kamaji-etcd`) | ||
- `-s etcd_service`: Name of the etcd service (default: `kamaji-etcd`) | ||
- `-n etcd_namespace`: Namespace of the etcd StatefulSet (default: `kamaji-system`) | ||
- `-j schedule`: Cron schedule for the defrag job (default: `"0 0 * * *"`, which means daily at midnight) | ||
|
||
## Example | ||
|
||
To run the script with custom parameters: | ||
|
||
```bash | ||
./defrag.sh -e kamaji-etcd -s kamaji-etcd -n kamaji-system -j "14 9 * * 1-5" | ||
``` | ||
This will create a Kubernetes CronJob manifest with the specified parameters and apply it to the cluster. | ||
|
||
## Debug mode | ||
To run the script in debug mode set the environment variable `DEBUG`: | ||
|
||
``` bash | ||
export DEBUG=1 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
#!/bin/bash | ||
|
||
# Enable debugging, exit on errors, and ensure the script fails if any command in a pipeline fails | ||
if [ "${DEBUG}" = 1 ]; then | ||
set -x | ||
fi | ||
set -eu -o pipefail | ||
|
||
# Default values for the parameters | ||
ETCD_NAME="kamaji-etcd" | ||
ETCD_SERVICE="kamaji-etcd" | ||
ETCD_NAMESPACE="kamaji-system" | ||
SCHEDULE="0 0 * * *" # every day at midnight | ||
|
||
# Parse script parameters | ||
while getopts "e:s:n:j:" opt; do | ||
case ${opt} in | ||
e ) ETCD_NAME=$OPTARG ;; | ||
s ) ETCD_SERVICE=$OPTARG ;; | ||
n ) ETCD_NAMESPACE=$OPTARG ;; | ||
j ) SCHEDULE=$OPTARG ;; | ||
\? ) echo "Usage: ./defrag.sh [-e etcd_name] [-s etcd_service] [-n etcd_namespace] [-j schedule]" | ||
exit 1 ;; | ||
esac | ||
done | ||
|
||
# Function to create the CronJob manifest for defrag etcd | ||
create_defrag_cronjob() { | ||
local etcd_name=$1 | ||
local etcd_service=$2 | ||
local etcd_namespace=$3 | ||
local schedule=$4 # Add a parameter for the cron schedule | ||
|
||
cat <<EOF > ${etcd_name}-defrag-job.yaml | ||
apiVersion: batch/v1 | ||
kind: CronJob | ||
metadata: | ||
name: ${etcd_name}-defrag-job | ||
namespace: $etcd_namespace | ||
spec: | ||
schedule: "$schedule" # Use the provided schedule | ||
jobTemplate: | ||
spec: | ||
template: | ||
spec: | ||
containers: | ||
- name: etcd-defrag | ||
image: ghcr.io/ahrtr/etcd-defrag:v0.15.0 # Please replace the version with the latest version. | ||
args: | ||
- --endpoints=https://${etcd_name}-0.${etcd_service}.${etcd_namespace}.svc.cluster.local:2379,https://${etcd_name}-1.${etcd_service}.${etcd_namespace}.svc.cluster.local:2379,https://${etcd_name}-2.${etcd_service}.${etcd_namespace}.svc.cluster.local:2379 | ||
- --cacert=/opt/certs/ca/ca.crt | ||
- --cert=/opt/certs/root-client-certs/tls.crt | ||
- --key=/opt/certs/root-client-certs/tls.key | ||
- --cluster | ||
- --defrag-rule | ||
- "dbQuotaUsage > 0.8 || dbSize - dbSizeInUse > 200*1024*1024" | ||
volumeMounts: | ||
- mountPath: /opt/certs/root-client-certs | ||
name: root-client-certs | ||
- mountPath: /opt/certs/ca | ||
name: certs | ||
restartPolicy: OnFailure | ||
securityContext: | ||
runAsUser: 0 | ||
volumes: | ||
- name: root-client-certs | ||
secret: | ||
secretName: ${etcd_name}-root-client-certs | ||
- name: certs | ||
secret: | ||
secretName: ${etcd_name}-certs | ||
EOF | ||
} | ||
|
||
# Main script to defrag etcd | ||
main() { | ||
# Create and apply defrag CronJob | ||
create_defrag_cronjob "$ETCD_NAME" "$ETCD_SERVICE" "$ETCD_NAMESPACE" "$SCHEDULE" | ||
kubectl apply -f $ETCD_NAME-defrag-job.yaml | ||
} | ||
|
||
# Execute the main script | ||
main |