-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: preserve NVIPAM CM after upgrade #672
Conversation
/retest-nic_operator_helm |
25e892f
to
6dc56e7
Compare
@@ -46,6 +49,10 @@ func migrate(ctx context.Context, log logr.Logger, c client.Client) error { | |||
// not critical for the operator operation, safer to ignore | |||
log.V(consts.LogLevelWarning).Info("ignore error during whereabouts CronJob removal") | |||
} | |||
if err := removeStateLabelFromNVIpamConfigMap(ctx, log, c); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to return error if we ignore it? We've got logging inside removeStateLabelFromNVIpamConfigMap so no need to return an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed the same pattern as removeWhereaboutsIPReconcileCronJob
.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep it as is. removeStateLabelFromNVIpamConfigMap
function do the action and an error can happen during this action - it is fair to return the error from the function. It is up to caller to decide what to do with the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this should not block the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I have one more thing. It is pretty safe to run removeWhereaboutsIPReconcileCronJob
in a "best effort" mode and ignore any error if it happens (we will have an unneeded Job in the cluster, but this will not break anything). But with removeStateLabelFromNVIpamConfigMap
, we are in a bit different situation - if we fail to remove the label, we will probably lose all nv-ipam allocations. Maybe we should crash in this case and rely on k8s to restart the Pod? WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, returning error to main.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove 'request changes' label since we've got the same patter in other part of code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the PR on my env. I tried to upgrade from 23.7 to the version from this PR. All nv-ipam allocations were preserved. I added one comment about the error handling for annotation removal call - maybe we should react on an error
/retest-all |
6dc56e7
to
c78b5f5
Compare
/retest-image_scan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. It could be merged once CI pass
deployment/network-operator/templates/mellanox.com_v1alpha1_nicclusterpolicy_cr.yaml
Show resolved
Hide resolved
pkg/migrate/migrate.go
Outdated
log.V(consts.LogLevelError).Error(err, "fail to get NVIPAM configmap") | ||
return err | ||
} | ||
log.V(consts.LogLevelDebug).Info("clear State label from NVIPAM configmap") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we first check the label exists ? or will patch op suceed if label is not there ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a check before the patch
In order to allow migration from config map to IP Pool CRs in NV IPAM, the config map needs to be kept. - Removing the state label will make sure it is not deleted as a stale object. - Remove the helm hook annotation on NiClusterPolicy CR template, that causes delete and recreate of CR and the delete-cr hook. Note that this will prevent CR validation in install stage. - Make sure to stop existing Network Operator before updating CR, so that it does not delete the config map since config is removed. Signed-off-by: Fred Rolland <[email protected]>
c78b5f5
to
fc6aca4
Compare
In order to allow migration from config map to IP Pool CRs in NV IPAM, the config map needs to be kept.