-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add High Availability information for s3gw and Longhorn #845
Conversation
Fixes: https://github.com/aquarist-labs/s3gw/issues/841 Signed-off-by: Giuseppe Baccini <[email protected]>
1fd7299
to
126b7a0
Compare
s3gw can reasonably protect against; that's all undefined behavior and "restore | ||
from backup" time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can drop the that's all undefined behavior ...
, or maybe replace it with just In this case, restoring from backup might be the only option.
.
s3gw can reasonably protect against; that's all undefined behavior and "restore | ||
from backup" time. | ||
|
||
The *Active/Standby* model claims the following characteristics: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/claims/offers/
This has the advantage of being schedulable, so it can happen at times of low load | ||
if these exist. | ||
|
||
When any of these scenarios should happen, Kubernetes restarts the s3gw pod and we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/should happen/happens/
writing this), does not automatically restart a pod attached to a RWO volume | ||
in the event that the node running it suffers a failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd write "in case the node it is running on suffers from a failure"
Currently, Kubernetes ([1.28](https://kubernetes.io/releases/) at the time of | ||
writing this), does not automatically restart a pod attached to a RWO volume | ||
in the event that the node running it suffers a failure. | ||
Reasons behind this behavior is that workloads, such as RWO volumes require |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comma between "RWO volumes" and "require"?
Failures affecting these kind of workloads risk data loss and/or corruption | ||
if nodes (and the workloads running on them) are wrongly assumed to be dead. | ||
For this reason it is crucial to know that the node has reached a safe state | ||
before initiating recovery of the workload. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/recovery of the workload/workload recovery/
Longhorn offers the option to perform a [Pod Deletion Policy][pod-deletion-policy] | ||
when a node should go down unexpectedly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd write instead
offers the option to define a Pod Deletion Policy when the node goes down unexpectedly
|
||
Longhorn offers the option to perform a [Pod Deletion Policy][pod-deletion-policy] | ||
when a node should go down unexpectedly. | ||
This means that Longhorn will force delete StatefulSet/Deployment terminating pods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean "force-delete" or something? or maybe "forcefully delete"?
on nodes that are down to release Longhorn volumes so that Kubernetes | ||
can spin up replacement pods. | ||
|
||
Anyway, when employing this mitigation, the user must be aware that assuming a node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe drop the "Anyway" and start with "When employing ..." ?
|
||
The s3gw and the Longhorn team is currently investigating some | ||
[hypotheses of solutions][longhorn-issue-1] | ||
to address this problem at its roots. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/roots/root/
add High Availability information for s3gw and Longhorn
Fixes: https://github.com/aquarist-labs/s3gw/issues/841
Checklist before requesting a review