Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more logging around catalog source sync #3414

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

perdasilva
Copy link
Collaborator

@perdasilva perdasilva commented Oct 18, 2024

Description of the change:

A recent live debugging session revealed that the catalog source reconciliation loop is under logged. This PR adds more logging. Generally, I've tried to:

  • log start and end of method calls
  • log errors as close as possible to source
  • call out steps in the methods

Motivation for the change:

Architectural changes:

Testing remarks:

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Bug fixes are accompanied by regression test(s)
  • e2e tests and flake fixes are accompanied evidence of flake testing, e.g. executing the test 100(0) times
  • tech debt/todo is accompanied by issue link(s) in comments in the surrounding code
  • Tests are comprehensible, e.g. Ginkgo DSL is being used appropriately
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive
  • Tests marked as [FLAKE] are truly flaky and have an issue
  • Code is properly formatted

@perdasilva perdasilva force-pushed the perdasilva/catsrc/logging branch 3 times, most recently from a5ec49a to e13881e Compare October 18, 2024 11:26
logger = logger.WithField("step", "syncRegistryServer")
logger.Info("registry server sync start")
defer func() {
logger.WithError(syncError).WithField("continueSync", continueSync).Info("registry server sync end")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with WithError(syncError) if syncError is nil?

Copy link
Collaborator Author

@perdasilva perdasilva Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will come out as <nil>. Here's an example:

time="2024-10-18T11:53:19Z" level=info msg="registry server sync end" catalogsource.name=operatorhubio-catalog catalogsource.namespace=operator-lifecycle-manager continueSync=false error="<nil>" health=false id=PYOCO step=syncRegistryServer

Copy link
Collaborator Author

@perdasilva perdasilva Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at it now, I'm wondering if we should use WithField("syncError") just to have more fidelity with the code and be more helpful for the next person that has to do a live manual debug XD

@@ -897,6 +897,11 @@ func (o *Operator) handleCatSrcDeletion(obj interface{}) {
}

func validateSourceType(logger *logrus.Entry, in *v1alpha1.CatalogSource) (out *v1alpha1.CatalogSource, continueSync bool, _ error) {
logger = logger.WithField("step", "validateSourceType")
logger.Info("validate source type start")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in practice you'll find method book-ending is extremely chatty - on the order of MiB/s logging rate during high churn, especially when the system is just verifying correctness of current state and not changing. Not sure if you really require it. The logs for error cases and outcomes are very useful and will not emit if "nothing" is going on.

Copy link
Collaborator Author

@perdasilva perdasilva Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah - I tend to agree. The verbosity is high for low utility. I'll cull most of the non-error logs. At least in these paths, anyway. Wondering if just leaving pre-return log would still be a good idea to get signal that it went into the method - even if only for the methods higher up the stack....?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okie - see if that sparks joy, or if you reckon it's still too verbose

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try it out and see! It might still be too chatty to be useful, but you can't know without running an end-to-end test.

var updateLabel bool
// Get the catalog source's config map
configMap, err := o.lister.CoreV1().ConfigMapLister().ConfigMaps(in.GetNamespace()).Get(in.Spec.ConfigMap)
// Attempt to look up the CM via api call if there is a cache miss
if apierrors.IsNotFound(err) {
logger.Info("configmap not found in cache - attempting with live client")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n.b. this is not a safe thing to do, as you no longer operate against a consistent view of cluster state that existed in one point in time (your lister is behind - but you got an event for this configmap? how?)

Copy link
Collaborator Author

@perdasilva perdasilva Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. Thanks for calling that out. I'll poke at it to see if we can just re-enqueue and let the next iteration pick it up from the cache, rather than trying to jump ahead of it. Though the code here can get so convoluted its hard to know what taking this jenga piece out will do to the tower XD

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, how's Bill treating you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a wild and whacky time! Best of luck with this Jenga :)

Signed-off-by: Per Goncalves da Silva <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants