-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dryrun before update as reconciliation option #358
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #358 +/- ##
==========================================
- Coverage 65.41% 65.40% -0.02%
==========================================
Files 35 35
Lines 3843 3853 +10
==========================================
+ Hits 2514 2520 +6
- Misses 1132 1134 +2
- Partials 197 199 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
|
// by dry-running the updates before | ||
DryRunFirst *bool | ||
|
||
GetOptions []client.GetOption |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not being used right now.
But I think that having them in place makes it easy for someone in the future to make use of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change may open up for more of #354 in the future. It aims to protect performance and, at the same time, be conservative about the reconciliation processes affected by #356. I'm afraid it may backfire at both goals.
For every kind of resource the operator reconciles, to which a mutation on the state could go under the radar of the function that compares existing vs desired, thus mistakenly missing a no-op case and performing a unnecessary update, I would expect a dry-run first. If the operator watches those kinds of resources and, directly or indirectly, loops back to the point of reconciling them again, then we're subject to another endless loop at anytime.
Unfortunately, it’s hard to control all possible factors that could cause the resource mutating the state after created or updated. Defaults enforced via OAS are one, but it could also be a mutating webhook.
Mutating webhooks can be almost completely transparent. One could set them without any acknowledgement by the owner of the resource or by the provider of the CRD the resource is an instance of.
I personally feel safer with the dry-run ensuring that the state (and therefore the generation/resourceVersion) will not change, than I'm concerned with the extra request to the API server at this point.
I disagree.
Take into account that the mutator has the existing object read from the cluster. And this object has all the mutations happened to it. So, the information is there. |
fd772fe
to
2c55208
Compare
// DryRunFirst sets the "dry run first" option to "true", ensuring all | ||
// validation, mutations etc first without persisting the change to storage. | ||
var DryRunFirst = dryRunFirst{} | ||
|
||
type dryRunFirst struct{} | ||
|
||
// ApplyToList applies this configuration to the given list options. | ||
func (dryRunFirst) ApplyToReconcile(opts *ReconcileOptions) { | ||
opts.DryRunFirst = ptr.To(true) | ||
} | ||
|
||
var _ ReconcileOption = &dryRunFirst{} | ||
|
||
// ReconcileOption adds some customization for the reconciliation process. | ||
type ReconcileOption interface { | ||
// ApplyToUpdate applies this configuration to the given update options. | ||
ApplyToReconcile(*ReconcileOptions) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a matter of preference, not action needed, but usually find easier to read when types and interfaces are defined before are being used.
// validation, mutations etc first without persisting the change to storage. | ||
var DryRunFirst = dryRunFirst{} | ||
|
||
type dryRunFirst struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could define dryRunFirst
as a bool to make it explicit? we could still attach methods to it or are we planning to expand the props of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does not need to be a bool. When you add the ReconcileOption
interface implementation called DryRunFirst
, you are adding the following configuration option.
opts.DryRunFirst = ptr.To(true)
The bool field exists in the ReconcileOptions
struct
2c55208
to
8e8d31c
Compare
After the meeting call, I am more convinced that the dry run update is no way at all. Maybe only for detecting mutating webhooks. Besides extra traffic load and the alternative approach of semantic comparison, there is the issue of comparing things the controller does not control. Let me put a simple example to illustrate: int desired = D;
int existing = E;
int desiredMutated = mutate(desired);
if (desiredMutated != existing) {
existing = desiredMutated
} This is basically what our controller does with the dry run update. And this defeats the purpose of the controller to have So, I do not think this is even good for the authconfig use case. The authconfig use case should be fixed implementing the defaults. |
Another way to solve the endless loop without the dry run update is to add an annotation on the reconciled resource, i.e. the auhconfig object, with the generation version (the one that should only change on an update on the The reconciliation would not update the resource when the generation version does not change. Thus, breaking the endless loop. For example, the generation version of the authpolicy is 1. The controller updates the authpolicy spec and adds the annotation with the version 1. When there is a reconciliation event, if the annotated generation is equal to the authpolicy's generation, the authconfig is not updated even though the spec does not match. This would prevent any healing from the controller when a external agent, mutating webhook or somebody doing manual update, changes the resource. |
I propose putting a pin on this one for now, while we research on different ways to work on the current implementation and tackle the issue of the overhead caused by too many dry-runs post #356. Right now, with the ubiquitous dry-running of all updates before any blind reconciliation of existing vs. desired state every time, Kuadrant Operator's implemented behaviour is of immediately giving up on the fight against any additional authorised piece that disagrees with the operator regarding the desired state of those watched resources, thus letting the state determined by those other components to always win. This is not much of a problem for defaults enforced at the level of the OAS, because these values can still be set by the operator and will prevail naturally over the defaults. As for mutating webhooks that modify the state of the resources post-being requested by our operator, if the cluster administrator granted to those services enough permissions to act on the same resources managed by our operator, the operator will concede rather than keep fighting back. An example of the above can be easily demonstrated by setting a mutating webhook that changes the value of a field originally set by the operator. A few steps to reproduce it, using Istio's EnvoyFilter CR as an example:
With the changes proposed in the PR, the above will activate an endless loop of reconciliation fight between Kuadrant Operator and the OPA Gatekeeper mutating webhook (executed by the Gatekeeper controller), where the former wants to enforce a timeout of 1s for the connection between the gateway and Limitador, and latter wants to change that to 2s. Roughly, every few seconds (depending on the sleep call set at step 1), you should see Kuadrant Operator trying to reconcile the EnvoyFilter CR, followed by the Gatekeeper controller patching it back. Without the changes proposed in this PR:
A possible way to improve this flow in the future could be trying to make the operator aware of the conflict and somehow flag it so it can be solved from the outside. This of course opens up for other issues, such as what state to enforce in the meantime. Should all policies be invalidated? Another possible enhancement involves relying more on caching in the operator, to not have to perform the dry-run every time, but keeping track of changes previously honoured instead, and avoid asking to the API server. cc @chirino |
@eguzki, the state of the reconciled resource is computed out of multiple original resources, rather than only the policy, such as the Gateway and the HTTPRoute. Should we store the generations of all these origins of information in the annotations of the reconciled resource as well? |
IMO,
|
Right, simple annotation does not fix the issue for all uses cases. Only for simple use cases. |
This is an approach which can be valid for some use cases. I am struggling to see such use case, but I reckon it can be valid. The operator gives up and delegates the ownership always to external agents. For these particular cases, it is not However, certainly, there is a more common use cases where the controller needs spec to be the desired one. And having the dry run update is breaking this, opening the door to deployment and functionality issues. So, with the ubiquitous dry-running of all updates we are applying the same approach for all. This PR specifically calls for having a reconciliation option for that. If you want the authconfig to be |
The problem is you cannot tell, right? That |
One common use case is "create only" mutators, where objects are created and not doing any further update. The mutators are plain "return false" methods. For these cases, the API server is being hit with no reason. |
That's like two agents authorized by the cluster admin racing for updates on the same fields. It's a conflict. One of them must give up or be removed. The cluster admin does not have the context to know if mutating some resource may or may not break something. Regarding authpolicies, what if some external controller (kind of a virus) is updating some opa policy defined in the authpolicy on the authconfig object? This is a very big security issue. Shouldn't the kuadrant controller try to enforce the desired opa policy? Moreover, if the authpolicy controller allows external updates of the authconfig, why watch for updates on the authconfig? removing the watcher, the endless loop issue is gone. The purpose of watching authconfig resource is to ensure the value of some fields (or the entire spec). |
You have to keep in mind that both agents were authorised, as you said it yourself. That in itself invalidates the example of the malicious software, which by definition is not authorised. Apart from the fact that it could just as easily target the source of truth directly, the one kind of resource that you have to be more permissive about. What you are asking for is that the operator tries to protect the cluster admin against themselves, by allowing itself going into an infinite loop, completely wastefully, on the account that the "cluster admin does not have the context to know." Well, if the cluster admin does not know, they should not authorise two agents acting on the same resource. This is a very simple RBAC rule – "on namespace X, only Kuadrant Operator can touch EnvoyFilters, WasmPlugins, AuthorizationPolicies; across the cluster, only Kuadrant Operator can touch AuthConfigs, Limitadors." This is up for the cluster admin to decide. The Kuadrant Operator cannot be more powerful than cluster admins! This will only create rigidness. Take the example of the hard-coded connection timeout. If Limitador fails to respond within 1s, first of all, Kuadrant sucks, but worse, it is unfixable. Somebody needs to change that line of code, cut a new release, build a new image, and, in the meantime, the system is unusable. There's no workaround. |
This is a very good point, which I can agree. If the Operator will not perform an update anyway (arguably something to reconsider), then neither it should execute the dry-run for those cases. |
I think this discussion is healthy and let us all learn and share concerns and vision. If somebody disagrees, shout out and I will stop writing text walls. That being said, my next 5 cents:
Of course not. Nobody is asking that. As well as nobody is asking the cluster admin to have knowledge in the scope of the kuadrant operator.
This does not cover all use cases.
First, the dry run update does not protect entirely from infinite loop. An additional controller updating the same fields leads to infinite loop, regardless of running dry run updates or not. A mutating webhook doing dynamic updates is also a source of endless infinite loop. regardless of running dry run updates or not. Think about a mutating webhook adding a timestamp in a field. The dry run update is very weak approach that can leak issues now and in the future. IIMO, the dry run update can only be valid for defaults. But only if us, as developers, are lazy and we do not want to implement defaults, which are known when they are owned CRD or not. And yes, they need to be maintained.
On the contrary, this PR calls for freedom. The title literally says "dryrun before update as reconciliation option". Each controller can decide what to do. Rigidness is when all controller are obliged to run dry run updates not matter what. Furthermore, the rigidness is a call each controller makes to manage the trade-off between fully controller resources and create-only resources. Taking your example, if the connection time-out should be updated externally because it depends on infra and other factors, the call should be having the operator not reconciling that field (semantic comparison jumping in here). It should not be some mutating webhook. The limitador operator does exactly this call for the replicas field. https://github.com/Kuadrant/limitador-operator/blob/main/controllers/limitador_controller.go#L170-L172 . If the replicas are set in the CR, the operator is rigid. The deployment must have that value and if somebody changes that, the operator will revert the changes. The operator will be opinionated about the replicas when the field is set in the Limitador CR. However, if the replicas is not set in the Limitador CR, the operator is not opinionated about the number of replicas in the deployment. Anyone (with permissions) can go and change the deployment replicas. The operator will not revert the changes. That external controller can be a HorizontalPodAutoscaler controller replacing the replication controller. The whole point is that being rigid or not is a call each controller does. But the controller should be able to make that call. With the dry run update the controller no longer can ensure one value in the resource. I am going to change the envoy filter controller to allow connection timeout to be externally updated ;) |
To be clear, I meant rigidness for one who uses Kuadrant, not for us (the developers of it.) I think that's undebatable. |
To be rigid or not is controller's call. Try to update the deployment associated to the Gateway API Gateway resource (automatic deployment). Istio made a call, they do not accept updates on the deployment (at least in the command field). And they do not accept updates on the command line because you could break the service otherwise. On the other hand, they are not rigid on the annotations. You can add annotations on the Gateway resource and those will be propagated to the deployment, service and so on... You, as a user, find some controller's behavior rigid? request a change on their code or behavior. Do not deploy a mutating webhook or a controller to update fields you do not own, even if you had permissions to do that. |
The apiserver and mutating webhooks have more authority about the final state of a resource than controllers since they have ultimate control of what resources look like when they are admitted into the system. From a controller's point of view there is no difference between the API server and a mutating web hook. A controller can NEVER win against them. So if a controller does detect that the apiserver has changed his update, (we check the result of the update call to see if the API server mutated it). We should accept that as the final desired state since we can't win against the API server. I'm 99% sure (can someone double check me) that the update call results can't contain an update from a different controller racing against our controller's update. We don't need to try to mitigate the reconcile loop that can occur from multiple controllers stepping on each other's changes. It is only the data race with the apiserver that concerns me. |
For updates, the API server requires a baseline from which create a delta. So the API server requires the resourceVersion from which you want to update. If the current resource version in etcd database is not the one in the update request, the API server rejects the update. For patch updates (server-side updates), there is no need for this. The request is itself the delta. Race conditions are handled by the API server instead of delegating it to the client as for the regular updates. Does this answer your doubt? |
@eguzki that means that the controller looking at the result of the update call will NEVER see a change due to a different controller doing an update of the same resource right? |
The controller will not see an update coming from another controller on the result of its own update. But will know that it lost the race and it needs to re-read the resource again before applying the changes to the latest version. That's regarding racing with another controller. A mutating webhook is somewhat different. As you said, the mutating webhook is kind of associate of the API server. They work together side-by-side. Provided you win the race against another controller (or being the only one in the cluster), hence the resourceVersion provided is the latest one, if there are changes done by the APIServer (defaults) or any mutating webhook, those changes will be seen in the object after the update. It is basically what the dry run update does. |
RBAC might be the solution to avoiding controller/controller data races. We should not try to mitigate this type of data race since there is no clear way to solve this problem. But the scenario of the apiserver/mutating webhook modifying your update is something we could cope with. Like you said it's similar to what dry run was doing. Perhaps we don't need to dry run at all. We should just analyse the result of the update to see it was changed from what we sent. If it is, we should cache the update result in some way so that we know that it is the final desired state after defaults/webhooks are applied. |
Reading over this thread I get the feeling that no matter what approach we take we can never fully solve the problem as the cluster admin can always grant more permission to something else. With that in mind, what metrics do we expose that indicates we have detected an infinite loop? And based on the metrics, do we give recommendations on what alerts should be created to help the admin monitor the state of the services on their cluster? As the Authconfig #354 is currently the only one that is causing the infinite loop that is problem we should be trying to solve. If/when more resources start to cause loops then we should address those problems. At which point we would have more examples of what causes the issue and then can make a more informed decision on what the best mitigation steps would be. |
No longer makes sense refactoring logic to the state of the world controller related #916 |
What
DryRun update operation for all the updates can cause too much load for the api server. Having the "Dry Run Update First" as an option to be passed to the base reconciler may mitigate the issue. At least for some resources. This PR only applies the "DryRun Update first" only to the reconciliation of the authconfig because it is the only one having server side defaults (kubebuilder defaults) and mutating webhook involved.