Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid connection error should retry #13578

Open
4 tasks done
shuangkun opened this issue Sep 9, 2024 · 0 comments · May be fixed by #13580
Open
4 tasks done

Invalid connection error should retry #13578

shuangkun opened this issue Sep 9, 2024 · 0 comments · May be fixed by #13580
Assignees
Labels

Comments

@shuangkun
Copy link
Member

Pre-requisites

  • I have double-checked my configuration
  • I have tested with the :latest image tag (i.e. quay.io/argoproj/workflow-controller:latest) and can confirm the issue still exists on :latest. If not, I have explained why, in detail, in my description below.
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened? What did you expect to happen?

My large workflow failed when encounter "invalid connection", Because my cluster has been working fine, this is an occasional problem.
We should retry when offload info encounters this error.

Version(s)

latest

Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

time="2024-09-09T13:24:51.702Z" level=warning msg="Non-transient error: invalid connection"
31605 time="2024-09-09T13:24:51.702Z" level=warning msg="Failed to dehydrate: workflow is longer than maximum allowed size. compressed size 1113230 > maxSize 1048576Tried to of      fload but encountered error: invalid connection" namespace=default workflow=large-workflow-t696s
31606 time="2024-09-09T13:24:51.703Z" level=info msg="cleaning up pod" action=deletePod key=default/large-workflow-t696s-sleep-545703870/deletePod
31607 time="2024-09-09T13:24:51.705Z" level=info msg="Updated phase Running -> Error" namespace=default workflow=large-workflow-t696s
31608 time="2024-09-09T13:24:51.705Z" level=info msg="Updated message  -> workflow is longer than maximum allowed size. compressed size 1113230 > maxSize 1048576Tried to offloa      d but encountered error: invalid connection" namespace=default workflow=large-workflow-t696s

Logs from the workflow controller

kubectl logs -n argo deploy/workflow-controller | grep ${workflow}

Logs from in your workflow's wait container

kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded
@shuangkun shuangkun self-assigned this Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant