-
Notifications
You must be signed in to change notification settings - Fork 544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more logging around catalog source sync #3414
base: master
Are you sure you want to change the base?
Add more logging around catalog source sync #3414
Conversation
a5ec49a
to
e13881e
Compare
logger = logger.WithField("step", "syncRegistryServer") | ||
logger.Info("registry server sync start") | ||
defer func() { | ||
logger.WithError(syncError).WithField("continueSync", continueSync).Info("registry server sync end") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens with WithError(syncError)
if syncError
is nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will come out as <nil>
. Here's an example:
time="2024-10-18T11:53:19Z" level=info msg="registry server sync end" catalogsource.name=operatorhubio-catalog catalogsource.namespace=operator-lifecycle-manager continueSync=false error="<nil>" health=false id=PYOCO step=syncRegistryServer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at it now, I'm wondering if we should use WithField("syncError") just to have more fidelity with the code and be more helpful for the next person that has to do a live manual debug XD
@@ -897,6 +897,11 @@ func (o *Operator) handleCatSrcDeletion(obj interface{}) { | |||
} | |||
|
|||
func validateSourceType(logger *logrus.Entry, in *v1alpha1.CatalogSource) (out *v1alpha1.CatalogSource, continueSync bool, _ error) { | |||
logger = logger.WithField("step", "validateSourceType") | |||
logger.Info("validate source type start") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in practice you'll find method book-ending is extremely chatty - on the order of MiB/s logging rate during high churn, especially when the system is just verifying correctness of current state and not changing. Not sure if you really require it. The logs for error cases and outcomes are very useful and will not emit if "nothing" is going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah - I tend to agree. The verbosity is high for low utility. I'll cull most of the non-error logs. At least in these paths, anyway. Wondering if just leaving pre-return log would still be a good idea to get signal that it went into the method - even if only for the methods higher up the stack....?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okie - see if that sparks joy, or if you reckon it's still too verbose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try it out and see! It might still be too chatty to be useful, but you can't know without running an end-to-end test.
var updateLabel bool | ||
// Get the catalog source's config map | ||
configMap, err := o.lister.CoreV1().ConfigMapLister().ConfigMaps(in.GetNamespace()).Get(in.Spec.ConfigMap) | ||
// Attempt to look up the CM via api call if there is a cache miss | ||
if apierrors.IsNotFound(err) { | ||
logger.Info("configmap not found in cache - attempting with live client") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n.b. this is not a safe thing to do, as you no longer operate against a consistent view of cluster state that existed in one point in time (your lister is behind - but you got an event for this configmap? how?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. Thanks for calling that out. I'll poke at it to see if we can just re-enqueue and let the next iteration pick it up from the cache, rather than trying to jump ahead of it. Though the code here can get so convoluted its hard to know what taking this jenga piece out will do to the tower XD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, how's Bill treating you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a wild and whacky time! Best of luck with this Jenga :)
Signed-off-by: Per Goncalves da Silva <[email protected]>
d6c2abb
to
b48b163
Compare
Description of the change:
A recent live debugging session revealed that the catalog source reconciliation loop is under logged. This PR adds more logging. Generally, I've tried to:
Motivation for the change:
Architectural changes:
Testing remarks:
Reviewer Checklist
/doc
[FLAKE]
are truly flaky and have an issue