Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s: improve logging and debugging #432

Open
mdonadoni opened this issue Feb 15, 2024 · 0 comments
Open

k8s: improve logging and debugging #432

mdonadoni opened this issue Feb 15, 2024 · 0 comments

Comments

@mdonadoni
Copy link
Member

It's currently very hard to understand what goes wrong when a workflow gets stuck in the "running" phase.

Let's improve the logging (in particular of the job monitor) to clearly understand:

  • What events are coming from the cluster (e.g. pod evicted)
  • What actions are being taken (e.g. storing logs, setting as failed, skipping as job is still running)
  • Why these actions are being taken (e.g. cause of failure)

Some additional ideas:

  • make sure that id used in reana-run-job-<id> is the same as the job's id (no need for two different identifiers!)
  • if multiple ids are used to identify the same job then let's always print them together in the logs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

No branches or pull requests

1 participant