Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snapshotEngine: DigitalOcean complete migration #586

Merged
merged 66 commits into from
Aug 15, 2023
Merged

snapshotEngine: DigitalOcean complete migration #586

merged 66 commits into from
Aug 15, 2023

Conversation

orcutt989
Copy link
Contributor

@orcutt989 orcutt989 commented Jun 7, 2023

This completes the migration of the snapshotEngine to work with DigitalOcean.

Changes

  • Converted busy waits to waits that sleep 1 minute. This resulted in a 30x-60x reduction in CPU time for idling pods. 60% of a core to 1%-2% of a core for idle pods.
  • DigitalOcean has a snapshot timeout of 10 minutes so the Snapshot Warmer has been reduced to only creating a snapshot and then waiting for 10 minutes. It maintains 5 snapshots still, but it does not delete "stuck" snapshots anymore.
  • For some reason localhost does not work from the Tezos container when trying to reach the RPC so this was changed to 127.0.0.1.
  • Both AWS and DigitalOcean credentials are stored as Kubernetes secrets.
  • Storage Class has been changed from AWS' ebs-sc to DigitalOceans' do-block-storage.
  • Snapshot Maker pod no longer waits for a node to be ready as this logic was moved to both the Snapshot Scheduler pod and the Snapshot Warmer pods. Before snapshots could be taken if a node was not synced, now that is not possible.
  • Sleeps were added to creation and deletion commands as DigitalOcean's API timeouts are stricter than AWS.
  • Snapshot Maker no longer waits for a snapshot to be finished and just uses the last snapshot as they are near instantaneous on DigitalOcean volumes.
  • Alternate cloud provider functionality was removed as we are currently supporting web resources in AWS and Kubernetes and artifacts on DigitalOcean.
  • Artifact metadata is pushed to DigitalOcean Spaces instead of AWS S3.
  • Jekyll build output was quieted as it fills the logs with non-useful information when troubleshooting. When Jekyll fails another message before or after is more helpful anyways.
  • Cloud provider credential switching logic was changed to default to Digital Ocean and aws if provided.

This branch is currently deployed and running via https://github.com/oxheadalpha/xtz-shots-infra

@orcutt989 orcutt989 changed the title Do only snapshotEngine: DigitalOcean complete migration Jul 6, 2023
@orcutt989 orcutt989 marked this pull request as ready for review July 12, 2023 18:16
Copy link
Collaborator

@nicolasochem nicolasochem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't test the code but I scrolled through. There is commented out code but I understand we may want to keep it for the record.

This branch has been deployed on xtz-shots for ~1 month and seems to be consistently producing artifacts on digitalocean. We should merge & release it. Thanks.

@orcutt989 orcutt989 merged commit 7e31daf into master Aug 15, 2023
20 checks passed
@orcutt989 orcutt989 deleted the do-only branch September 21, 2023 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants