fix: error logging when failed to get exit code #24121

pb82 · 2024-10-01T10:24:14Z

The error logging when podman fails to read the exit code from the last event seems wrong: it's referring to err which is handled previously and should always be nil in that line. I think we should log eventsErr instead.

Does this PR introduce a user-facing change?

Fixes logging when podman fails to read the exit code

Signed-off-by: Peter Braun <[email protected]>

openshift-ci · 2024-10-01T10:24:22Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pb82
Once this PR has been reviewed and has the lgtm label, please assign jakecorrenti for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Luap99 · 2024-10-01T11:20:37Z

pkg/domain/infra/tunnel/containers.go

@@ -972,7 +972,7 @@ func (ic *ContainerEngine) ContainerRun(ctx context.Context, opts entities.Conta
 	// Wait for all events to be read
 	mutex.Lock()
 	if eventsErr != nil || lastEvent == nil {
-		logrus.Errorf("Cannot get exit code: %v", err)
+		logrus.Errorf("Cannot get exit code: %v, last event: %v", eventsErr, lastEvent)


The new message doesn't make much sense, now you are just logging

"Cannot get exit code: <nil>, last event: <nil>"

in case where the last event is nil and no error

In general it is not clear how do you reproduce this? Overall we made many changes to how Wait() works recently so I really do not see any point of trying reading events when wait failed so IMO it would be much better to get rid of this event code instead of trying to fix.

Correct, but at least it's logging eventsErr if it is not nil. I agree the case were both are nil produces a non-helpful message. Maybe we log only if eventsErr is not nil?

I'm not familiar with this code, but we're currently reproducing it by running a large number of jobs in Ansible Tower. At some point a job fails with Cannot get exit code: <nil> and we would like to be able to see the real error.

There is no real error, the condition is || which means we log when lastEvent == nil even when there was no error. What happens here is that we can read events but it will simply not contain our wanted exited events as such event the event is nil.

What podman version are you using? Any chance you can test with podman from main?

We're running podman 4.9.4. Will check if it's possible to test with the latest one.

We tried latest podman (5.2.2), still seeing the same error.

podman (5.2.2)

This has not most of the wait fixes, you need to wait for 5.3 or use podman from main.

openshift-merge-robot · 2024-10-10T00:57:45Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Luap99 · 2024-10-10T09:19:26Z

This can be closed with #24208 being merged

fix: error logging when failed to get exit code

1982a87

Signed-off-by: Peter Braun <[email protected]>

openshift-ci bot added the release-note label Oct 1, 2024

Luap99 reviewed Oct 1, 2024

View reviewed changes

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 10, 2024

Luap99 closed this Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: error logging when failed to get exit code #24121

fix: error logging when failed to get exit code #24121

pb82 commented Oct 1, 2024

openshift-ci bot commented Oct 1, 2024

Luap99 Oct 1, 2024

pb82 Oct 1, 2024

Luap99 Oct 1, 2024

pb82 Oct 1, 2024

pb82 Oct 8, 2024

Luap99 Oct 8, 2024

openshift-merge-robot commented Oct 10, 2024

Luap99 commented Oct 10, 2024

fix: error logging when failed to get exit code #24121

fix: error logging when failed to get exit code #24121

Conversation

pb82 commented Oct 1, 2024

Does this PR introduce a user-facing change?

openshift-ci bot commented Oct 1, 2024

Luap99 Oct 1, 2024

Choose a reason for hiding this comment

pb82 Oct 1, 2024

Choose a reason for hiding this comment

Luap99 Oct 1, 2024

Choose a reason for hiding this comment

pb82 Oct 1, 2024

Choose a reason for hiding this comment

pb82 Oct 8, 2024

Choose a reason for hiding this comment

Luap99 Oct 8, 2024

Choose a reason for hiding this comment

openshift-merge-robot commented Oct 10, 2024

Luap99 commented Oct 10, 2024