-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: error logging when failed to get exit code #24121
Conversation
Signed-off-by: Peter Braun <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: pb82 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -972,7 +972,7 @@ func (ic *ContainerEngine) ContainerRun(ctx context.Context, opts entities.Conta | |||
// Wait for all events to be read | |||
mutex.Lock() | |||
if eventsErr != nil || lastEvent == nil { | |||
logrus.Errorf("Cannot get exit code: %v", err) | |||
logrus.Errorf("Cannot get exit code: %v, last event: %v", eventsErr, lastEvent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new message doesn't make much sense, now you are just logging
"Cannot get exit code: <nil>, last event: <nil>"
in case where the last event is nil and no error
In general it is not clear how do you reproduce this? Overall we made many changes to how Wait() works recently so I really do not see any point of trying reading events when wait failed so IMO it would be much better to get rid of this event code instead of trying to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, but at least it's logging eventsErr
if it is not nil. I agree the case were both are nil produces a non-helpful message. Maybe we log only if eventsErr
is not nil?
I'm not familiar with this code, but we're currently reproducing it by running a large number of jobs in Ansible Tower. At some point a job fails with Cannot get exit code: <nil>
and we would like to be able to see the real error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no real error, the condition is ||
which means we log when lastEvent == nil even when there was no error. What happens here is that we can read events but it will simply not contain our wanted exited events as such event the event is nil.
What podman version are you using? Any chance you can test with podman from main?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're running podman 4.9.4. Will check if it's possible to test with the latest one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We tried latest podman (5.2.2), still seeing the same error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
podman (5.2.2)
This has not most of the wait fixes, you need to wait for 5.3 or use podman from main.
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This can be closed with #24208 being merged |
The error logging when podman fails to read the exit code from the last event seems wrong: it's referring to
err
which is handled previously and should always be nil in that line. I think we should logeventsErr
instead.Does this PR introduce a user-facing change?