docs: add High Availability information for s3gw and Longhorn #845

giubacc · 2023-11-28T13:51:34Z

add High Availability information for s3gw and Longhorn

Fixes: https://github.com/aquarist-labs/s3gw/issues/841

Checklist before requesting a review

I have performed a self-review of my code.
If it is a core feature, I have added thorough tests.
CHANGELOG.md has been updated should there be relevant changes in this PR.

docs/high-availability.md

Fixes: https://github.com/aquarist-labs/s3gw/issues/841 Signed-off-by: Giuseppe Baccini <[email protected]>

jecluis · 2023-11-29T14:16:22Z

docs/high-availability.md

+s3gw can reasonably protect against; that's all undefined behavior and "restore
+from backup" time.


I think we can drop the that's all undefined behavior ..., or maybe replace it with just In this case, restoring from backup might be the only option..

jecluis · 2023-11-29T14:17:19Z

docs/high-availability.md

+s3gw can reasonably protect against; that's all undefined behavior and "restore
+from backup" time.
+
+The *Active/Standby* model claims the following characteristics:


s/claims/offers/

jecluis · 2023-11-29T14:18:33Z

docs/high-availability.md

+  This has the advantage of being schedulable, so it can happen at times of low load
+  if these exist.
+
+When any of these scenarios should happen, Kubernetes restarts the s3gw pod and we


s/should happen/happens/

jecluis · 2023-11-29T14:19:45Z

docs/high-availability.md

+writing this), does not automatically restart a pod attached to a RWO volume
+in the event that the node running it suffers a failure.


I'd write "in case the node it is running on suffers from a failure"

jecluis · 2023-11-29T14:20:17Z

docs/high-availability.md

+Currently, Kubernetes ([1.28](https://kubernetes.io/releases/) at the time of
+writing this), does not automatically restart a pod attached to a RWO volume
+in the event that the node running it suffers a failure.
+Reasons behind this behavior is that workloads, such as RWO volumes require


add a comma between "RWO volumes" and "require"?

jecluis · 2023-11-29T14:21:01Z

docs/high-availability.md

+Failures affecting these kind of workloads risk data loss and/or corruption
+if nodes (and the workloads running on them) are wrongly assumed to be dead.
+For this reason it is crucial to know that the node has reached a safe state
+before initiating recovery of the workload.


s/recovery of the workload/workload recovery/

jecluis · 2023-11-29T14:22:30Z

docs/high-availability.md

+Longhorn offers the option to perform a [Pod Deletion Policy][pod-deletion-policy]
+when a node should go down unexpectedly.


I'd write instead

offers the option to define a Pod Deletion Policy when the node goes down unexpectedly

jecluis · 2023-11-29T14:23:06Z

docs/high-availability.md

+
+Longhorn offers the option to perform a [Pod Deletion Policy][pod-deletion-policy]
+when a node should go down unexpectedly.
+This means that Longhorn will force delete StatefulSet/Deployment terminating pods


you mean "force-delete" or something? or maybe "forcefully delete"?

jecluis · 2023-11-29T14:23:50Z

docs/high-availability.md

+on nodes that are down to release Longhorn volumes so that Kubernetes
+can spin up replacement pods.
+
+Anyway, when employing this mitigation, the user must be aware that assuming a node


Maybe drop the "Anyway" and start with "When employing ..." ?

jecluis · 2023-11-29T14:24:15Z

docs/high-availability.md

+
+The s3gw and the Longhorn team is currently investigating some
+[hypotheses of solutions][longhorn-issue-1]
+to address this problem at its roots.


s/roots/root/

giubacc self-assigned this Nov 28, 2023

giubacc added kind/documentation Improvements or additions to documentation priority/0 Needs to go into the next release or force a patch labels Nov 28, 2023

giubacc added this to the v0.24.0 milestone Nov 28, 2023

giubacc requested review from tserong, jecluis and m-ildefons November 28, 2023 13:52

tserong reviewed Nov 29, 2023

View reviewed changes

docs/high-availability.md Outdated Show resolved Hide resolved

docs: add High Availability information for s3gw and Longhorn

126b7a0

Fixes: https://github.com/aquarist-labs/s3gw/issues/841 Signed-off-by: Giuseppe Baccini <[email protected]>

giubacc force-pushed the docs-current-HA-s3gw-LH branch from 1fd7299 to 126b7a0 Compare November 29, 2023 14:13

giubacc marked this pull request as ready for review November 29, 2023 14:17

jecluis requested changes Nov 29, 2023

View reviewed changes

jecluis removed this from the v0.24.0 milestone Mar 21, 2024

jecluis closed this Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add High Availability information for s3gw and Longhorn #845

docs: add High Availability information for s3gw and Longhorn #845

giubacc commented Nov 28, 2023

jecluis Nov 29, 2023

jecluis Nov 29, 2023

jecluis Nov 29, 2023

jecluis Nov 29, 2023

jecluis Nov 29, 2023

jecluis Nov 29, 2023

jecluis Nov 29, 2023

jecluis Nov 29, 2023

jecluis Nov 29, 2023

jecluis Nov 29, 2023

		s3gw can reasonably protect against; that's all undefined behavior and "restore
		from backup" time.

		writing this), does not automatically restart a pod attached to a RWO volume
		in the event that the node running it suffers a failure.

		Longhorn offers the option to perform a [Pod Deletion Policy][pod-deletion-policy]
		when a node should go down unexpectedly.

docs: add High Availability information for s3gw and Longhorn #845

docs: add High Availability information for s3gw and Longhorn #845

Conversation

giubacc commented Nov 28, 2023

Checklist before requesting a review

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment