Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: error logging when failed to get exit code #24121

Closed
wants to merge 1 commit into from

Conversation

pb82
Copy link

@pb82 pb82 commented Oct 1, 2024

The error logging when podman fails to read the exit code from the last event seems wrong: it's referring to err which is handled previously and should always be nil in that line. I think we should log eventsErr instead.

Does this PR introduce a user-facing change?

Fixes logging when podman fails to read the exit code

Copy link
Contributor

openshift-ci bot commented Oct 1, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pb82
Once this PR has been reviewed and has the lgtm label, please assign jakecorrenti for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@@ -972,7 +972,7 @@ func (ic *ContainerEngine) ContainerRun(ctx context.Context, opts entities.Conta
// Wait for all events to be read
mutex.Lock()
if eventsErr != nil || lastEvent == nil {
logrus.Errorf("Cannot get exit code: %v", err)
logrus.Errorf("Cannot get exit code: %v, last event: %v", eventsErr, lastEvent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new message doesn't make much sense, now you are just logging

"Cannot get exit code: <nil>, last event: <nil>"

in case where the last event is nil and no error

In general it is not clear how do you reproduce this? Overall we made many changes to how Wait() works recently so I really do not see any point of trying reading events when wait failed so IMO it would be much better to get rid of this event code instead of trying to fix.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, but at least it's logging eventsErr if it is not nil. I agree the case were both are nil produces a non-helpful message. Maybe we log only if eventsErr is not nil?

I'm not familiar with this code, but we're currently reproducing it by running a large number of jobs in Ansible Tower. At some point a job fails with Cannot get exit code: <nil> and we would like to be able to see the real error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no real error, the condition is || which means we log when lastEvent == nil even when there was no error. What happens here is that we can read events but it will simply not contain our wanted exited events as such event the event is nil.

What podman version are you using? Any chance you can test with podman from main?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're running podman 4.9.4. Will check if it's possible to test with the latest one.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tried latest podman (5.2.2), still seeing the same error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

podman (5.2.2)

This has not most of the wait fixes, you need to wait for 5.3 or use podman from main.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 10, 2024
@openshift-merge-robot
Copy link
Collaborator

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@Luap99
Copy link
Member

Luap99 commented Oct 10, 2024

This can be closed with #24208 being merged

@Luap99 Luap99 closed this Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants