-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: enable drift detection + takeover experience in work applier #950
base: main
Are you sure you want to change the base?
feat: enable drift detection + takeover experience in work applier #950
Conversation
// | ||
// This check is done on the Work object scope, and is primarily added to address the case | ||
// where duplicate objects might appear in a Fleet resource envelope and lead to unexpected | ||
// behaviors. Duplication is a non-issue without Fleet resource envelopes, as the Fleet hub |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This statement is not true since we apply override policies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ryan! Sorry for the confusion; this comment is trying to clarify that at this moment for enveloped objects Fleet does not check for duplication (e.g., it is possible to place two objects with the same GVK/NS/name in an envelope, possibly with diff. specs); one definition will overwrite the other.
checked[wriStr] = true | ||
|
||
// Prepare the manifest conditions for the write-ahead process. | ||
manifestCondForWA := prepareManifestCondForWA(wriStr, bundle.id, work.Generation, existingManifestCondQIdx, work.Status.ManifestConditions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "WA" stand for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ryan! It's for the write-ahead process.
klog.ErrorS(err, "Failed to decode the manifest", "ordinal", pieces, "work", klog.KObj(work)) | ||
bundle.applyErr = fmt.Errorf("failed to decode manifest: %w", err) | ||
bundle.applyResTyp = ManifestProcessingApplyResultTypeDecodingErred | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't return any error back to the caller, thus there are bundles that have no gvr/ManifestObj. However, those are used extensively as pointers in the rest part of the controller logic. It seems that this can lead to Nullptr panic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that there are checks like "bundle.applyErr != nil" on some places but I am not sure if it covers all cases. Maybe we can add a step after preProcessManifests to remove those from the bundle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ryan! Yeah, at this moment the applier would skip a bundle for the processing step if it has failed the pre-processing; for the bundle removal part, a complication is that since we need to report occurrences of such manifests (cannot be decoded/malformed) to the users, if we remove them from the bundles before processing, in the status refreshing part we would need to take extra steps to make sure that they are re-incorporated, which can also be error-prone I fear.
Would you prefer if I:
a) shuffle the list of the bundles after pre-processing to make sure that bundles that failed pre-processing are set in the back, and before processing slice the array to leave them out; or
b) check for nil GVR/manifest object right before processing and throw an unexpected error?
} | ||
|
||
inMemberClusterObjLastAppliedManifestObjHash := inMemberClusterObj.GetAnnotations()[fleetv1beta1.ManifestHashAnnotation] | ||
return manifestObjHash == inMemberClusterObjLastAppliedManifestObjHash, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why check if the manifest matches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ryan! If the manifest itself has been updated (i.e., a new version of the manifest has become available), there is no need to do drift detection anymore; we will simply apply the new version.
@@ -0,0 +1,490 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please make sure that the previous test and integration test still works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. They will be sent in a separate PR if that's OK (some minor adjustments need to be done -> almost completed, e.g., the original test suites does not distinguish between hub/member envs; and there are some new/updated conditions, etc.).
…ift-detection-takeover
…ift-detection-takeover
Description of your changes
This PR includes a new implementation of the work applier that enables the drift detection + takeover experience.
I have:
make reviewable
to ensure this PR is ready for review.How has this code been tested
Special notes for your reviewer
Additional tests will be submitted separately.