-
Notifications
You must be signed in to change notification settings - Fork 54
volume leak during deployment rollout #733
Comments
The whole registration process is also problematic because of #729. It may also be a scalability problem (single instance of the controller), although we don't have any evidence of that because we haven't done scale testing. I'm currently leaning towards solving this problem by deploying the external-provisioner alongside each node driver instance and removing the central controller entirely. This was originally proposed in kubernetes-csi/external-provisioner#367 but wasn't finished. What was missing in that PR was support for immediate binding. With immediate binding, all external-provisioner instances need to collaboratively figure out who's the one who should create the volume. I was proposing leadership election for that (kubernetes-csi/external-provisioner#367 (comment)) but that'll need further thought. |
I think the other alternate/quick solution is to persist the controller state(known volume details) in a config map. |
The controller currently doesn't have access to a config map. Introducing that would introduce a dependency on Kubernetes into the CO-agnostic part of PMEM-CSI. Even with a config map, getting this right in all cases will be tricky. Consider the naive approach:
If the controller dies at any point during this flow, the volume may leak. A lot of effort went into external-provisioner to prevent such leaks; we would have to duplicate all of that. |
This is a valid alternative solution, but I wouldn't call it quick. |
The upstream work on enabling de-centralized deployment of external-provisioner is tracked in kubernetes-csi/external-provisioner#487 |
Not only when it dies - error handling also currently isn't sufficient to prevent volume leaks, see issue #823 |
By putting external-provisioner onto each node and letting it provision volumes directly on the node, we can remove the controller/node communication part in PMEM-CSI. This solves various issues in that part (race conditions that led to volume leaks) and simplifies the deployment (no need for two-way TLS certificates anymore). The webhooks check for capacity by discovering the PMEM-CSI node pods and retrieving metrics data from them via the normal metrics support. The combination of node drivers from 0.8 with a controller from 0.9 is harmless (no volume leaked) but can no longer create new volumes. Existing volumes on the nodes are still usable. Combining a controller from 0.8 with node drivers from 0.9 is more problematic because the old controller will cause volume leaks when volumes are deleted (intel#733). If this is a problem, then the old StatefulSet can be deleted manually before upgrading.
By putting external-provisioner onto each node and letting it provision volumes directly on the node, we can remove the controller/node communication part in PMEM-CSI. This solves various issues in that part (race conditions that led to volume leaks) and simplifies the deployment (no need for two-way TLS certificates anymore). The webhooks check for capacity by discovering the PMEM-CSI node pods and retrieving metrics data from them via the normal metrics support. The combination of node drivers from 0.8 with a controller from 0.9 is harmless (no volume leaked) but can no longer create new volumes. Existing volumes on the nodes are still usable. Combining a controller from 0.8 with node drivers from 0.9 is more problematic because the old controller will cause volume leaks when volumes are deleted (intel#733). If this is a problem, then the old StatefulSet can be deleted manually before upgrading.
By putting external-provisioner onto each node and letting it provision volumes directly on the node, we can remove the controller/node communication part in PMEM-CSI. This solves various issues in that part (race conditions that led to volume leaks) and simplifies the deployment (no need for two-way TLS certificates anymore). The webhooks check for capacity by discovering the PMEM-CSI node pods and retrieving metrics data from them via the normal metrics support. The combination of node drivers from 0.8 with a controller from 0.9 is harmless (no volume leaked) but can no longer create new volumes. Existing volumes on the nodes are still usable. Combining a controller from 0.8 with node drivers from 0.9 is more problematic because the old controller will cause volume leaks when volumes are deleted (intel#733). If this is a problem, then the old StatefulSet can be deleted manually before upgrading.
By putting external-provisioner onto each node and letting it provision volumes directly on the node, we can remove the controller/node communication part in PMEM-CSI. This solves various issues in that part (race conditions that led to volume leaks) and simplifies the deployment (no need for two-way TLS certificates anymore). The webhooks check for capacity by discovering the PMEM-CSI node pods and retrieving metrics data from them via the normal metrics support. The combination of node drivers from 0.8 with a controller from 0.9 is harmless (no volume leaked) but can no longer create new volumes. Existing volumes on the nodes are still usable. Combining a controller from 0.8 with node drivers from 0.9 is more problematic because the old controller will cause volume leaks when volumes are deleted (intel#733). If this is a problem, then the old StatefulSet can be deleted manually before upgrading.
By putting external-provisioner onto each node and letting it provision volumes directly on the node, we can remove the controller/node communication part in PMEM-CSI. This solves various issues in that part (race conditions that led to volume leaks) and simplifies the deployment (no need for two-way TLS certificates anymore). The webhooks check for capacity by discovering the PMEM-CSI node pods and retrieving metrics data from them via the normal metrics support. The combination of node drivers from 0.8 with a controller from 0.9 is harmless (no volume leaked) but can no longer create new volumes. Existing volumes on the nodes are still usable. Combining a controller from 0.8 with node drivers from 0.9 is more problematic because the old controller will cause volume leaks when volumes are deleted (intel#733). If this is a problem, then the old StatefulSet can be deleted manually before upgrading. The operator and tests will be updated in separate commits.
By putting external-provisioner onto each node and letting it provision volumes directly on the node, we can remove the controller/node communication part in PMEM-CSI. This solves various issues in that part (race conditions that led to volume leaks) and simplifies the deployment (no need for two-way TLS certificates anymore). The webhooks check for capacity by discovering the PMEM-CSI node pods and retrieving metrics data from them via the normal metrics support. The combination of node drivers from 0.8 with a controller from 0.9 is harmless (no volume leaked) but can no longer create new volumes. Existing volumes on the nodes are still usable. Combining a controller from 0.8 with node drivers from 0.9 is more problematic because the old controller will cause volume leaks when volumes are deleted (intel#733). If this is a problem, then the old StatefulSet can be deleted manually before upgrading. The operator and tests will be updated in separate commits.
By putting external-provisioner onto each node and letting it provision volumes directly on the node, we can remove the controller/node communication part in PMEM-CSI. This solves various issues in that part (race conditions that led to volume leaks) and simplifies the deployment (no need for two-way TLS certificates anymore). The webhooks check for capacity by discovering the PMEM-CSI node pods and retrieving metrics data from them via the normal metrics support. The combination of node drivers from 0.8 with a controller from 0.9 is harmless (no volume leaked) but can no longer create new volumes. Existing volumes on the nodes are still usable. Combining a controller from 0.8 with node drivers from 0.9 is more problematic because the old controller will cause volume leaks when volumes are deleted (intel#733). If this is a problem, then the old StatefulSet can be deleted manually before upgrading. The operator and tests will be updated in separate commits.
By putting external-provisioner onto each node and letting it provision volumes directly on the node, we can remove the controller/node communication part in PMEM-CSI. This solves various issues in that part (race conditions that led to volume leaks) and simplifies the deployment (no need for two-way TLS certificates anymore). The webhooks check for capacity by discovering the PMEM-CSI node pods and retrieving metrics data from them via the normal metrics support. The combination of node drivers from 0.8 with a controller from 0.9 is harmless (no volume leaked) but can no longer create new volumes. Existing volumes on the nodes are still usable. Combining a controller from 0.8 with node drivers from 0.9 is more problematic because the old controller will cause volume leaks when volumes are deleted (intel#733). If this is a problem, then the old StatefulSet can be deleted manually before upgrading. The operator and tests will be updated in separate commits.
Fixed by PR #838 |
The controller is designed such that it collects information about volumes from nodes as the nodes register themselves. This implies that the controller cannot know about existing volumes for nodes that haven't registered (yet).
This leads to the following problem:
=> volume leak
This problem was triggered by the new version skew tests which restart the driver while volumes exist, then does some operations (including removal) with them right after the driver deployment comes up again.
The text was updated successfully, but these errors were encountered: