-
Notifications
You must be signed in to change notification settings - Fork 620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cinder-csi-plugin] Add finalizer to VolumeSnapshot when create PVC with the VolumeSnapshot #2369
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @jeffyjf. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the issue, I do like the concept of adding a finalizer to the snapshot for each volume which requires it. However, I can't think of another instance where finalizers are used this way, so it might be worth some effort to discover if there are any problems with potentially very big lists of finalizers. Imagine, for example, a some CI system which uses a 'golden image' extremely heavily and has 1,000 volumes using it. Would that be ok?
However, I don't think we can safely add the finalizer asynchronously. A finalizer would typically be added synchronously before resource creation. It's also worth noting that the resource in this case would be the PV rather than the PVC.
Could we add code which adds and persists the snapshot finalizer immediately before creating the PV?
We would also need to add code to remove the finalizer in the deletion flow of the PV strictly before removing the PV finalizer.
Looks very WIP right now so probably not worth actually running tests, but /ok-to-test Note that tests won't run automatically while the PR is marked WIP, but you can run them manually on demand when you're ready with |
Hi @mdbooth, thanks very much for your comments. But I have a bit question.
I am a newbie. In my opinion, this cannot be implemented just by watcher. We need to add a webhook to do this. right? If it's not right, do you have any other suggestions? |
In the past few days, I tried a few methods to implement add finalizer before creating the PV synchronously:
I think the NO.1 method is more graceful and appropriate. But, the first methon require external-provisioner sidecar provide more necessary informations. So I think we should submit a KEP to CSI spec in order to we can get enough necessary informations about VolumeSnatshot in CreateVolume function to resolve this issue appropriately. |
ee04af8
to
22e3bf6
Compare
22e3bf6
to
2ff7f36
Compare
I've already commit a PR to external-provisioner, in order to we can add finalizer to VolumeSnapshot synchronously in future. But now, let's continue to complete this imperfect solution to fix the issue #2294 and hold back resource leak in production environment asap. |
2ff7f36
to
d49caa6
Compare
Hey, what is the problem with adding a finalizer asynchronously? We've did that all the time in Kuryr and it worked totally well when we made sure we're doing it by patching the resource properly. |
d49caa6
to
9c2d979
Compare
/hold cancel |
thanks for the PR, I thought our PR rely on kubernetes-csi/external-provisioner#1070 |
Yep, I am not sure when the kubernetes-csi/external-provisioner#1070 can be merged and released. This PR can be temporarily used to hold back resources leak. |
There may just be some potential problems, but I have no concrete example. Maybe in an extremely busy environment, the resource may be deleted before the finalizer be successfully added. |
e1d4650
to
00d7a31
Compare
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@jeffyjf can you rebase on latest master? |
Of course I will rebase it late on. |
00d7a31
to
ebc7baf
Compare
@jeffyjf: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@jeffyjf can you fix unit tests? |
Of course, I'll fix it asap. |
ebc7baf
to
4c412ca
Compare
This patch implement a PVC watcher. When the watcher receive a PVC ADDED event and PVC with VolumeSnapshot kind dataSource, the VolumeSnapshot will be added a finalizer. And the finalizer will be removed when the watcher receive the PVC's DELETED event. NOTE: Asynchronously add finalizer by watcher seems like unsafety, might result in some potential issues. After the PR [1] merged and released, we need to refactor it and implement a synchronous way to add finalizer. [1] kubernetes-csi/external-provisioner#1070
4c412ca
to
616d1b0
Compare
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like it should be a proper K8s controller and not just a simplified version of a watcher.
snap "github.com/kubernetes-csi/external-snapshotter/client/v6/clientset/versioned" | ||
"golang.org/x/net/context" | ||
corev1 "k8s.io/api/core/v1" | ||
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
"k8s.io/apimachinery/pkg/watch" | ||
"k8s.io/client-go/kubernetes" | ||
"k8s.io/client-go/rest" | ||
"k8s.io/klog/v2" | ||
"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is mixing K8s and non-K8s imports. No biggie.
if event.Type == watch.Added { | ||
klog.V(5).InfoS("Receive ADDED event try to add finalizer to VolumeSnapshot", "PersistentVolumeClaim", pvc.Name, "VolumeSnapshot", snapshotObj.Name) | ||
update = controllerutil.AddFinalizer(snapshotObj, finalizer) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm, what if we missed an ADDED event somehow, but then get the MODIFIED one? I believe we should rather look for ADDED/MODIFIED and check if finalizer is there and if it's not add it.
if event.Type == watch.Deleted { | ||
klog.V(5).InfoS("Receive DELETED event try to remove finalizer from VolumeSnapshot", "PersistentVolumeClaim", pvc.Name, "VolumeSnapshot", snapshotObj.Name) | ||
update = controllerutil.RemoveFinalizer(snapshotObj, finalizer) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if you've missed a deletion event here? Are we stuck with a snapshot?
update = controllerutil.RemoveFinalizer(snapshotObj, finalizer) | ||
} | ||
if update { | ||
_, err := cw.snapClient.SnapshotV1().VolumeSnapshots(pvc.Namespace).Update(context.Background(), snapshotObj, metav1.UpdateOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a Patch
, otherwise we might end up replacing changes made by some other controller.
@@ -65,6 +69,7 @@ func main() { | |||
|
|||
cmd.PersistentFlags().StringVar(&cluster, "cluster", "", "The identifier of the cluster that the plugin is running in.") | |||
cmd.PersistentFlags().StringVar(&httpEndpoint, "http-endpoint", "", "The TCP network address where the HTTP server for providing metrics for diagnostics, will listen (example: `:8080`). The default is empty string, which means the server is disabled.") | |||
cmd.PersistentFlags().BoolVar(&vsDeletionProtectionEnabled, "enable-vs-deletion-protection", false, "If set, The VS will be added a finalizer while a PVC created base on it, ensure the VS couldn't be deleted before the PVC be deleted.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VS
is not widely used I think you mean virtual server ?
watcher watch.Interface | ||
} | ||
|
||
func (cw *controllerWatcher) run() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure but this to me is general feature instead of CPO ? e.g every cloud has this problem?
should this be part of snapshot general thing e.g external snapshot controller to handle ?
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What this PR does / why we need it:
This PR implement a PVC watcher. When the watcher receive a PVC ADDED event and PVC with VolumeSnapshot kind dataSource, the VolumeSnapshot will be added a finalizer. And the finalizer will be removed when the watcher receive the PVC's DELETED event. It can used to protect VolumeSnapshot, avoid that the VS be deleted while there are any PersistentVolume on it.
Which issue this PR fixes(if applicable):
fixes #2294
Special notes for reviewers:
Asynchronously add finalizer by watcher seems like unsafety, might result in some potential issues. After the PR [1] merged and released, we need to refactor it and implement a synchronous way to add finalizer.
[1] kubernetes-csi/external-provisioner#1070
Release note: