Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snapshotEngine: DigitalOcean complete migration #586

Merged
merged 66 commits into from
Aug 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
0f28979
add value to skip snap web generation
orcutt989 Mar 17, 2023
b096ac4
add configurable value for s3 bucket
orcutt989 Mar 17, 2023
8f89b18
web build condition on domain name instead
orcutt989 Mar 20, 2023
2766bf1
Merge branch 'skip-snapshot-website' into user-configurable-s3-bucket
orcutt989 Mar 20, 2023
256c82c
Merge branch 'master' into user-configurable-s3-bucket
orcutt989 Mar 31, 2023
c0c4b34
add secret and configurable s3 bucket override
orcutt989 Mar 31, 2023
2ad25e5
switch name and mountpath to match format
orcutt989 Mar 31, 2023
7bf9506
update secret name and use in zip and upload job
orcutt989 Mar 31, 2023
4bbbaf5
use export instead of temp var
orcutt989 Apr 3, 2023
2f7871a
secret name change
orcutt989 Apr 5, 2023
d9b0371
expect correct names on secret volume mount
orcutt989 Apr 5, 2023
5a4a1ef
correct path to secret mount
orcutt989 Apr 5, 2023
fe9ff5c
rework credential override to provide logs and error messages
orcutt989 Apr 5, 2023
a6c0238
use double quotes for early expansion
orcutt989 Apr 5, 2023
3522607
remove variable checking since we are feeding in files
orcutt989 Apr 5, 2023
1d71833
bug: container is gone so we cant delete a volume
orcutt989 Apr 5, 2023
251c257
show commands for debug
orcutt989 Apr 5, 2023
51a67ad
wrong default s3 bucket var
orcutt989 Apr 5, 2023
4d887c1
turn of tar output for debug
orcutt989 Apr 10, 2023
5a7731d
undo command verbosity
orcutt989 Apr 11, 2023
a494823
Verbose variables
orcutt989 Apr 11, 2023
a756585
Enable interactive for alias to work
orcutt989 Apr 11, 2023
9c78800
More useful alias message and rm debug messages
orcutt989 Apr 11, 2023
f840167
Need space after !
orcutt989 Apr 11, 2023
9f1b437
expand aliases instead of interactive
orcutt989 Apr 11, 2023
0dfd178
add public-read and move index.html
orcutt989 Apr 11, 2023
70b8fb5
Website redirects stay in AWS
orcutt989 Apr 13, 2023
218024a
Set alias only for filesystem artifact upload
orcutt989 Apr 14, 2023
256c7e3
rolling redirects working
orcutt989 Apr 25, 2023
d70adb8
fix volume indexing
orcutt989 Apr 26, 2023
83aae95
helpful messages
orcutt989 Apr 26, 2023
36e071d
Useful comments for new indexing format
orcutt989 Apr 28, 2023
846eb82
Omit alias functionality in lieu of variable parameters
orcutt989 Apr 28, 2023
3981732
Fix rolling tarball filename
orcutt989 Apr 28, 2023
cc8d24e
configmap needs fqdn
orcutt989 Apr 28, 2023
d73aa61
cdn isnt working so we're using bucket url
orcutt989 Apr 28, 2023
1a3032c
unsilence lz4 logs
orcutt989 Apr 28, 2023
21152dd
wrong aws bucket name
orcutt989 May 1, 2023
8e58f43
get all snapshot metadata from do spaces
orcutt989 May 25, 2023
618381b
upload metadatas to alt s3 bucket
orcutt989 May 25, 2023
3536081
fix metadata related to website build
orcutt989 May 26, 2023
d2a2b4b
initial commit demo functionality
orcutt989 Jun 7, 2023
2f5daa0
put redirects back
orcutt989 Jun 7, 2023
943bd34
remove merged files
orcutt989 Jun 8, 2023
e061183
update zip and upload commands for dual creds
orcutt989 Jun 8, 2023
f8431ce
sleep for debug
orcutt989 Jun 8, 2023
3a2264e
allow override of storage class for scratch volumes
orcutt989 Jun 8, 2023
921403e
use storage class as set
orcutt989 Jun 8, 2023
8fa7a52
Container-running OS will not resolve localhost
orcutt989 Jun 9, 2023
c018181
Remove infinite sleep from debugging
orcutt989 Jun 9, 2023
e0c76be
Empty-Commit to trigger CI test
orcutt989 Jul 6, 2023
2996cfc
Merge branch 'master' into do-only
orcutt989 Jul 6, 2023
213f9f0
bucket name change to do space
orcutt989 Jul 6, 2023
05e1186
Merge branch 'master' into do-only
orcutt989 Jul 6, 2023
39592b3
rm fqdn from cm
orcutt989 Jul 7, 2023
baa7216
increase warmer timeout
orcutt989 Jul 7, 2023
f01fee9
increase timeout after artifact job create
orcutt989 Jul 7, 2023
6711700
DO rate limits snapshots per 10m
orcutt989 Jul 11, 2023
c6dc4a1
sleep between creation for rate limiting
orcutt989 Jul 11, 2023
76cfb63
need different command for site upload
orcutt989 Jul 11, 2023
f634c30
block snapshot until node ready
orcutt989 Jul 11, 2023
cae3370
pause scheduler if node not ready
orcutt989 Jul 11, 2023
2d2bf4a
add sleep for cpu usage reduction
orcutt989 Jul 11, 2023
4a3ec3a
fix busy waits and document why
orcutt989 Jul 11, 2023
ac01d8c
fix busy wait on job and more better comments
orcutt989 Jul 11, 2023
d705ded
Merge branch 'master' into do-only
orcutt989 Aug 15, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,6 @@ build

# Ignore mkchain generated files
*_values.yaml
*-values.yaml

charts/tezos/charts
71 changes: 43 additions & 28 deletions charts/snapshotEngine/scripts/snapshot-warmer.sh
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ delete_old_volumesnapshots() {
local max_snapshots="${2##max_snapshots=}"

while [ "$(getNumberOfSnapshots readyToUse=true --selector="$selector")" -gt "$max_snapshots" ]; do
sleep 5
NUMBER_OF_SNAPSHOTS=$(getNumberOfSnapshots readyToUse=true --selector="$selector")
printf "%s Number of snapshots with selector '$selector' is too high at $NUMBER_OF_SNAPSHOTS. Deleting 1.\n" "$(timestamp)"
SNAPSHOTS=$(getSnapshotNames readyToUse=true --selector="$selector")
Expand All @@ -37,31 +38,31 @@ delete_old_volumesnapshots() {
done
}

delete_stuck_volumesnapshots() {
snapshot_list=$(kubectl get volumesnapshots -o jsonpath="{.items[*].metadata.name}")
arr=(`echo ${snapshot_list}`);
for snapshot_name in "${arr[@]}"; do
snapshot_creation_time_iso8601=$(kubectl get volumesnapshots $snapshot_name -o jsonpath='{.metadata.creationTimestamp}')
snapshot_creation_time_without_offset=${snapshot_creation_time_iso8601::-1}
snapshot_creation_time_unix=$(date -ud "$(echo $snapshot_creation_time_without_offset | sed 's/T/ /')" +%s)
current_date_unix=$(date -u +%s)
snapshot_age_minutes=$(( (current_date_unix - snapshot_creation_time_unix) / 60 ))
# Snapshots should never be older than 6 minutes
# If they are then there's a problem on AWS' end and the snapshot needs to be deleted.
if [ $snapshot_age_minutes -ge 6 ]; then
printf "%s Snasphot %s is %s minutes old. It must be stuck. Attempting to delete...\n" "$(timestamp)" "$snapshot_name" "$snapshot_age_minutes"
err=$(kubectl delete volumesnapshots $snapshot_name 2>&1 > /dev/null)
if [ $? -ne 0 ]; then
printf "%s ERROR##### Unable to delete stuck snapshot %s .\n" "$(timestamp)" "$snapshot_name"
printf "%s Error was: \"%s\"\n" "$(timestamp)" "$err"
sleep 10
exit 1
else
printf "%s Successfully deleted stuck snapshot %s! \n" "$(timestamp)" "$snapshot_name"
fi
fi
done
}
# delete_stuck_volumesnapshots() {
# snapshot_list=$(kubectl get volumesnapshots -o jsonpath="{.items[*].metadata.name}")
# arr=(`echo ${snapshot_list}`);
# for snapshot_name in "${arr[@]}"; do
# snapshot_creation_time_iso8601=$(kubectl get volumesnapshots $snapshot_name -o jsonpath='{.metadata.creationTimestamp}')
# snapshot_creation_time_without_offset=${snapshot_creation_time_iso8601::-1}
# snapshot_creation_time_unix=$(date -ud "$(echo $snapshot_creation_time_without_offset | sed 's/T/ /')" +%s)
# current_date_unix=$(date -u +%s)
# snapshot_age_minutes=$(( (current_date_unix - snapshot_creation_time_unix) / 60 ))
# # Snapshots should never be older than 6 minutes
# # If they are then there's a problem on AWS' end and the snapshot needs to be deleted.
# if [ $snapshot_age_minutes -ge 6 ]; then
# printf "%s Snasphot %s is %s minutes old. It must be stuck. Attempting to delete...\n" "$(timestamp)" "$snapshot_name" "$snapshot_age_minutes"
# err=$(kubectl delete volumesnapshots $snapshot_name 2>&1 > /dev/null)
# if [ $? -ne 0 ]; then
# printf "%s ERROR##### Unable to delete stuck snapshot %s .\n" "$(timestamp)" "$snapshot_name"
# printf "%s Error was: \"%s\"\n" "$(timestamp)" "$err"
# sleep 10
# exit 1
# else
# printf "%s Successfully deleted stuck snapshot %s! \n" "$(timestamp)" "$snapshot_name"
# fi
# fi
# done
# }

HISTORY_MODE="$(echo "$NODE_CONFIG" | jq -r ".history_mode")"
TARGET_VOLUME="$(echo "$NODE_CONFIG" | jq ".target_volume")"
Expand All @@ -83,12 +84,23 @@ yq e -i '.spec.volumeSnapshotClassName=strenv(VOLUME_SNAPSHOT_CLASS)' createVolu

while true; do

# Pause if nodes are not ready
until [ "$(kubectl get pods -n "${NAMESPACE}" -o 'jsonpath={..status.conditions[?(@.type=="Ready")].status}' -l appType=octez-node -l node_class_history_mode="${HISTORY_MODE}")" = "True" ]; do
printf "%s Tezos node is not ready for snapshot. Check node pod logs. \n" "$(date "+%Y-%m-%d %H:%M:%S" "$@")"
until [ "$(kubectl get pods -n "${NAMESPACE}" -o 'jsonpath={..status.conditions[?(@.type=="Ready")].status}' -l appType=octez-node -l node_class_history_mode="${HISTORY_MODE}")" = "True" ]; do
sleep 1m # without sleep, this loop is a "busy wait". this sleep vastly reduces CPU usage while we wait for node
if [ "$(kubectl get pods -n "${NAMESPACE}" -o 'jsonpath={..status.conditions[?(@.type=="Ready")].status}' -l appType=octez-node -l node_class_history_mode="${HISTORY_MODE}")" = "True" ]; then
break
fi
done
done

# Remove unlabeled snapshots
delete_old_volumesnapshots selector='!history_mode' max_snapshots=0
# Maintain 4 snapshots of a certain history mode
delete_old_volumesnapshots selector="history_mode=$HISTORY_MODE" max_snapshots=4
# Check for and delete old stuck snapshots
delete_stuck_volumesnapshots
# delete_stuck_volumesnapshots

if ! [ "$(getSnapshotNames readyToUse=false -l history_mode="${HISTORY_MODE}")" ]; then
# EBS Snapshot name based on current time and date
Expand All @@ -113,7 +125,7 @@ while true; do
while [ "$(getSnapshotNames readyToUse=false -l history_mode="${HISTORY_MODE}")" ]; do
printf "%s Snapshot is still creating...\n" "$(timestamp)"
sleep 10
delete_stuck_volumesnapshots
# delete_stuck_volumesnapshots
done
end_time=$(date +%s)
elapsed=$((end_time - start_time))
Expand All @@ -122,6 +134,9 @@ while true; do
else
printf "%s Snapshot already in progress...\n" "$(timestamp)"
sleep 10
delete_stuck_volumesnapshots
# delete_stuck_volumesnapshots
fi

printf "%s Sleeping for 10m due to Digital Ocean rate limit.\n" "$(timestamp)"
sleep 10m
done
2 changes: 1 addition & 1 deletion charts/snapshotEngine/templates/configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ data:
SCHEMA_URL: {{ $.Values.schemaUrl }}
S3_BUCKET: {{ $.Values.s3BucketOverride }}
CLOUD_PROVIDER: {{ $.Values.cloudProvider }}
FQDN: {{ $.Values.fqdn }}
STORAGE_CLASS: {{$.Values.volumeSnapClass }}
kind: ConfigMap
metadata:
name: snapshot-configmap
Expand Down
33 changes: 20 additions & 13 deletions snapshotEngine/mainJob.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,17 +53,18 @@ spec:

# These loops wait on the RPC to come online and prevent log from printing same line
# over and over and over again. This prints one line and waits for the RPC to come online for a clean log.
until wget -qO- http://localhost:8732/chains/main/blocks/head/header >/dev/null 2>&1; do
until wget -qO- http://127.0.0.1:8732/chains/main/blocks/head/header >/dev/null 2>&1; do
printf "%s Waiting for node RPC to come online.\n" "$(date "+%Y-%m-%d %H:%M:%S" "$@")"
until wget -qO- http://localhost:8732/chains/main/blocks/head/header >/dev/null 2>&1; do
if wget -qO- http://localhost:8732/chains/main/blocks/head/header >/dev/null 2>&1; then
until wget -qO- http://127.0.0.1:8732/chains/main/blocks/head/header >/dev/null 2>&1; do
sleep 1m # without sleep, this loop is a "busy wait". this sleep vastly reduces CPU usage while we wait for rpc
if wget -qO- http://127.0.0.1:8732/chains/main/blocks/head/header >/dev/null 2>&1; then
break
fi
done
done

# If somehow we skip the above waiting loop, this kills the job if the RPC is not online.
if ! wget -qO- http://localhost:8732/chains/main/blocks/head/header >/dev/null 2>&1; then
if ! wget -qO- http://127.0.0.1:8732/chains/main/blocks/head/header >/dev/null 2>&1; then
printf "%s RPC is not online! Exiting...\n" "$(date "+%Y-%m-%d %H:%M:%S" "$@")"
exit 1

Expand All @@ -76,15 +77,15 @@ spec:

# Tezos devs have advised us that it is safer to target HEAD~2 for rolling artifacts.
else
HEAD_BLOCK=$(wget -qO- http://localhost:8732/chains/main/blocks/head/header | sed -E 's/.*"hash":"?([^,"]*)"?.*/\1/')
HEAD_BLOCK=$(wget -qO- http://127.0.0.1:8732/chains/main/blocks/head/header | sed -E 's/.*"hash":"?([^,"]*)"?.*/\1/')
TARGET="${HEAD_BLOCK}~2"
fi

# Get BLOCK_HASH from RPC
wget -qO- http://localhost:8732/chains/main/blocks/"${TARGET}"/header | sed -E 's/.*"hash":"?([^,"]*)"?.*/\1/' > /"${HISTORY_MODE}"-snapshot-cache-volume/BLOCK_HASH
wget -qO- http://127.0.0.1:8732/chains/main/blocks/"${TARGET}"/header | sed -E 's/.*"hash":"?([^,"]*)"?.*/\1/' > /"${HISTORY_MODE}"-snapshot-cache-volume/BLOCK_HASH

# Get BLOCK_HEIGHT from RPC
wget -qO- http://localhost:8732/chains/main/blocks/"${TARGET}"/header | sed -E 's/.*"level":"?([^,"]*)"?.*/\1/' > /"${HISTORY_MODE}"-snapshot-cache-volume/BLOCK_HEIGHT
wget -qO- http://127.0.0.1:8732/chains/main/blocks/"${TARGET}"/header | sed -E 's/.*"level":"?([^,"]*)"?.*/\1/' > /"${HISTORY_MODE}"-snapshot-cache-volume/BLOCK_HEIGHT

# We need to check if the block is finalized for archive nodes since we aren't getting
# validation by a Tezos snapshot like our rolling tarball. We are just zipping up the data dir from an archive node.
Expand Down Expand Up @@ -117,13 +118,13 @@ spec:
fi

# Get BLOCK_TIMESTAMP from RPC
wget -qO- http://localhost:8732/chains/main/blocks/head/header | sed -E 's/.*"timestamp":"?([^,"]*)"?.*/\1/' > /"${HISTORY_MODE}"-snapshot-cache-volume/BLOCK_TIMESTAMP
wget -qO- http://127.0.0.1:8732/chains/main/blocks/head/header | sed -E 's/.*"timestamp":"?([^,"]*)"?.*/\1/' > /"${HISTORY_MODE}"-snapshot-cache-volume/BLOCK_TIMESTAMP

# Old version string
/usr/local/bin/octez-node --version > /"${HISTORY_MODE}"-snapshot-cache-volume/TEZOS_VERSION

# Get new version object from RPC
wget -qO- http://localhost:8732/version > /"${HISTORY_MODE}"-snapshot-cache-volume/TEZOS_RPC_VERSION_INFO
wget -qO- http://127.0.0.1:8732/version > /"${HISTORY_MODE}"-snapshot-cache-volume/TEZOS_RPC_VERSION_INFO

# Print variables for debug
printf "%s BLOCK_HASH is...$(cat /"${HISTORY_MODE}"-snapshot-cache-volume/BLOCK_HASH))\n" "$(date "+%Y-%m-%d %H:%M:%S" "$@")"
Expand Down Expand Up @@ -225,8 +226,10 @@ spec:
name: snapshot-cache-volume
- mountPath: /rolling-tarball-restore
name: rolling-tarball-restore
- mountPath: /cloud-provider
name: cloud-provider
- mountPath: /aws-secrets
name: aws-secrets
- mountPath: /do-secrets
name: do-secrets
env:
- name: HISTORY_MODE
value: ""
Expand All @@ -244,8 +247,12 @@ spec:
- name: rolling-tarball-restore
persistentVolumeClaim:
claimName: rolling-tarball-restore
- name: cloud-provider
- name: aws-secrets
secret:
secretName: cloud-provider
secretName: aws-secrets
optional: true
- name: do-secrets
secret:
secretName: do-secrets
optional: true
backoffLimit: 0
2 changes: 1 addition & 1 deletion snapshotEngine/scratchVolume.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ metadata:
name: snapshot-cache-volume
namespace: ""
spec:
storageClassName: ebs-sc
storageClassName: do-block-storage
accessModes:
- ReadWriteOnce
resources:
Expand Down
65 changes: 38 additions & 27 deletions snapshotEngine/snapshot-maker.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,6 @@ cd /

ZIP_AND_UPLOAD_JOB_NAME=zip-and-upload-"${HISTORY_MODE}"

# Pause if nodes are not ready
while [ "$(kubectl get pods -n "${NAMESPACE}" -o 'jsonpath={..status.conditions[?(@.type=="Ready")].status}' -l appType=octez-node -l node_class_history_mode="${HISTORY_MODE}")" = "False" ]; do
printf "%s Tezos node is not ready for snapshot. Check node pod logs. \n" "$(date "+%Y-%m-%d %H:%M:%S" "$@")"
sleep 30
done

# Delete zip-and-upload job
if kubectl get job "${ZIP_AND_UPLOAD_JOB_NAME}"; then
printf "%s Old zip-and-upload job exits. Attempting to delete.\n" "$(date "+%Y-%m-%d %H:%M:%S" "$@")"
Expand All @@ -26,27 +20,30 @@ fi
if [ "${HISTORY_MODE}" = rolling ]; then
if [ "$(kubectl get pvc rolling-tarball-restore)" ]; then
printf "%s PVC Exists.\n" "$(date "+%Y-%m-%d %H:%M:%S" "$@")"
sleep 5
kubectl delete pvc rolling-tarball-restore
sleep 5
fi
fi

if [ "$(kubectl get pvc "${HISTORY_MODE}"-snapshot-cache-volume)" ]; then
printf "%s PVC Exists.\n" "$(date "+%Y-%m-%d %H:%M:%S" "$@")"
sleep 5
kubectl delete pvc "${HISTORY_MODE}"-snapshot-cache-volume
sleep 5
fi

if [ "$(kubectl get pvc "${HISTORY_MODE}"-snap-volume)" ]; then
printf "%s PVC Exists.\n" "$(date "+%Y-%m-%d %H:%M:%S" "$@")"
sleep 5
kubectl delete pvc "${HISTORY_MODE}"-snap-volume
sleep 5
fi

while [ "$(kubectl get volumesnapshots -o jsonpath='{.items[?(.status.readyToUse==false)].metadata.name}' --namespace "${NAMESPACE}" -l history_mode="${HISTORY_MODE}")" ]; do
printf "%s Snapshot already in progress...\n" "$(date "+%Y-%m-%d %H:%M:%S" "$@")"
sleep 10
done
# while [ "$(kubectl get volumesnapshots -o jsonpath='{.items[?(.status.readyToUse==false)].metadata.name}' --namespace "${NAMESPACE}" -l history_mode="${HISTORY_MODE}")" ]; do
# printf "%s Snapshot already in progress...\n" "$(date "+%Y-%m-%d %H:%M:%S" "$@")"
# sleep 10
# done

printf "%s EBS Snapshot finished!\n" "$(date "+%Y-%m-%d %H:%M:%S" "$@")"

Expand All @@ -60,6 +57,11 @@ printf "%s Creating scratch volume for artifact processing...\n" "$(date "+%Y-%m
# Set namespace for both "${HISTORY_MODE}"-snapshot-cache-volume
NAMESPACE="${NAMESPACE}" yq e -i '.metadata.namespace=strenv(NAMESPACE)' scratchVolume.yaml

# Set storage class for sratch volume yaml
STORAGE_CLASS="${STORAGE_CLASS}" yq e -i '.spec.storageClassName=strenv(STORAGE_CLASS)' scratchVolume.yaml

sleep 5

# Create "${HISTORY_MODE}"-snapshot-cache-volume
printf "%s Creating PVC ${HISTORY_MODE}-snapshot-cache-volume.\n" "$(date "+%Y-%m-%d %H:%M:%S" "$@")"
NAME="${HISTORY_MODE}-snapshot-cache-volume" yq e -i '.metadata.name=strenv(NAME)' scratchVolume.yaml
Expand All @@ -73,6 +75,7 @@ printf "%s PVC %s created.\n" "$(date "+%Y-%m-%d %H:%M:%S" "$@")" "${HISTORY_MOD


if [ "${HISTORY_MODE}" = rolling ]; then
sleep 5
# Create rolling-tarball-restore
printf "%s Creating PVC rolling-tarball-restore..\n" "$(date "+%Y-%m-%d %H:%M:%S" "$@")"
NAME="rolling-tarball-restore" yq e -i '.metadata.name=strenv(NAME)' scratchVolume.yaml
Expand All @@ -87,6 +90,9 @@ fi
## Snapshot volume namespace
NAMESPACE="${NAMESPACE}" yq e -i '.metadata.namespace=strenv(NAMESPACE)' volumeFromSnap.yaml

# Set storageclass for restored volume
STORAGE_CLASS="${STORAGE_CLASS}" yq e -i '.spec.storageClassName=strenv(STORAGE_CLASS)' volumeFromSnap.yaml

## Snapshot volume name
VOLUME_NAME="${HISTORY_MODE}-snap-volume"
VOLUME_NAME="${VOLUME_NAME}" yq e -i '.metadata.name=strenv(VOLUME_NAME)' volumeFromSnap.yaml
Expand All @@ -111,6 +117,8 @@ printf "%s We're rounding up and adding 20%% , volume size will be %sGB.\n" "$(d

RESTORE_VOLUME_SIZE="${RESTORE_VOLUME_SIZE}Gi" yq e -i '.spec.resources.requests.storage=strenv(RESTORE_VOLUME_SIZE)' volumeFromSnap.yaml

sleep 5

printf "%s Creating volume from snapshot ${NEWEST_SNAPSHOT}.\n" "$(date "+%Y-%m-%d %H:%M:%S" "$@")"
if ! kubectl apply -f volumeFromSnap.yaml
then
Expand Down Expand Up @@ -175,22 +183,22 @@ if [ "${HISTORY_MODE}" = archive ]; then
yq eval -i "del(.spec.template.spec.containers[0].volumeMounts[2])" mainJob.yaml
fi

# Switch alternate cloud provider secret name based on actual cloud provider
if [[ -n "${CLOUD_PROVIDER}" ]]; then
# Need to account for dynamic volumes removed above. For example if not rolling node then rolling volume is deleted.
SECRET_NAME="${NAMESPACE}-secret"
# Index of zip-and-upload container changes depending on if rolling job or archive job
NUM_CONTAINERS=$(yq e '.spec.template.spec.containers | length' mainJob.yaml)
# Index of mounts also changes depending on history mode
NUM_CONTAINER_MOUNTS=$(yq e ".spec.template.spec.containers[$(( NUM_CONTAINERS - 1 ))].volumeMounts | length" mainJob.yaml )
# Secret volume mount is last item in list of volumeMounts for the zip and upload container
SECRET_NAME="${SECRET_NAME}" yq e -i ".spec.template.spec.containers[$(( NUM_CONTAINERS - 1 ))].volumeMounts[$(( NUM_CONTAINER_MOUNTS - 1 ))].name=strenv(SECRET_NAME)" mainJob.yaml
# Index of job volumes change depending on history mode
NUM_JOB_VOLUMES=$(yq e '.spec.template.spec.volumes | length' mainJob.yaml )
# Setting job secret volume to value set by workflow
SECRET_NAME="${SECRET_NAME}" yq e -i ".spec.template.spec.volumes[$(( NUM_JOB_VOLUMES - 1 ))].name=strenv(SECRET_NAME)" mainJob.yaml
SECRET_NAME="${SECRET_NAME}" yq e -i ".spec.template.spec.volumes[$(( NUM_JOB_VOLUMES - 1 ))].secret.secretName=strenv(SECRET_NAME)" mainJob.yaml
fi
# # Switch alternate cloud provider secret name based on actual cloud provider
# if [[ -n "${CLOUD_PROVIDER}" ]]; then
# # Need to account for dynamic volumes removed above. For example if not rolling node then rolling volume is deleted.
# SECRET_NAME="${NAMESPACE}-secret"
# # Index of zip-and-upload container changes depending on if rolling job or archive job
# NUM_CONTAINERS=$(yq e '.spec.template.spec.containers | length' mainJob.yaml)
# # Index of mounts also changes depending on history mode
# NUM_CONTAINER_MOUNTS=$(yq e ".spec.template.spec.containers[$(( NUM_CONTAINERS - 1 ))].volumeMounts | length" mainJob.yaml )
# # Secret volume mount is last item in list of volumeMounts for the zip and upload container
# SECRET_NAME="${SECRET_NAME}" yq e -i ".spec.template.spec.containers[$(( NUM_CONTAINERS - 1 ))].volumeMounts[$(( NUM_CONTAINER_MOUNTS - 1 ))].name=strenv(SECRET_NAME)" mainJob.yaml
# # Index of job volumes change depending on history mode
# NUM_JOB_VOLUMES=$(yq e '.spec.template.spec.volumes | length' mainJob.yaml )
# # Setting job secret volume to value set by workflow
# SECRET_NAME="${SECRET_NAME}" yq e -i ".spec.template.spec.volumes[$(( NUM_JOB_VOLUMES - 1 ))].name=strenv(SECRET_NAME)" mainJob.yaml
# SECRET_NAME="${SECRET_NAME}" yq e -i ".spec.template.spec.volumes[$(( NUM_JOB_VOLUMES - 1 ))].secret.secretName=strenv(SECRET_NAME)" mainJob.yaml
# fi

# Service account to be used by entire zip-and-upload job.
SERVICE_ACCOUNT="${SERVICE_ACCOUNT}" yq e -i '.spec.template.spec.serviceAccountName=strenv(SERVICE_ACCOUNT)' mainJob.yaml
Expand All @@ -204,12 +212,13 @@ then
exit 1
fi

sleep 5
sleep 20

# Wait for snapshotting job to complete
while [ "$(kubectl get jobs "zip-and-upload-${HISTORY_MODE}" --namespace "${NAMESPACE}" -o jsonpath='{.status.conditions[?(@.type=="Complete")].status}')" != "True" ]; do
printf "%s Waiting for zip-and-upload job to complete.\n" "$(date "+%Y-%m-%d %H:%M:%S" "$@")"
while [ "$(kubectl get jobs "zip-and-upload-${HISTORY_MODE}" --namespace "${NAMESPACE}" -o jsonpath='{.status.conditions[?(@.type=="Complete")].status}')" != "True" ]; do
sleep 1m # without sleep, this loop is a "busy wait". this sleep vastly reduces CPU usage while we wait for job
if [ "$(kubectl get pod -l job-name=zip-and-upload-"${HISTORY_MODE}" --namespace="${NAMESPACE}"| grep -i -e error -e evicted -e pending)" ] || \
[ "$(kubectl get jobs "zip-and-upload-${HISTORY_MODE}" --namespace="${NAMESPACE}" -o jsonpath='{.status.conditions[?(@.type=="Failed")].type}')" ] ; then
printf "%s Zip-and-upload job failed. This job will end and a new snapshot will be taken.\n" "$(date "+%Y-%m-%d %H:%M:%S" "$@")"
Expand All @@ -226,5 +235,7 @@ if ! [ "$(kubectl get jobs "zip-and-upload-${HISTORY_MODE}" --namespace "${NAMES
fi

printf "%s Deleting temporary snapshot volume.\n" "$(date "+%Y-%m-%d %H:%M:%S" "$@")"
sleep 5
kubectl delete -f volumeFromSnap.yaml | while IFS= read -r line; do printf '%s %s\n' "$(date "+%Y-%m-%d %H:%M:%S" "$@")" "$line"; done
sleep 5
kubectl delete job snapshot-maker --namespace "${NAMESPACE}"
Loading
Loading