Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LogPipeline AgentHealthy status is flaky at agent rollout #1545

Closed
a-thaler opened this issue Oct 22, 2024 · 1 comment
Closed

LogPipeline AgentHealthy status is flaky at agent rollout #1545

a-thaler opened this issue Oct 22, 2024 · 1 comment
Assignees
Labels
area/logs LogPipeline kind/bug Categorizes issue or PR as related to a bug.
Milestone

Comments

@a-thaler
Copy link
Collaborator

Description

I observed a fluentbit rollout which caused a AgentNotReady reason for one datapoint only. Before that there was one datapoint of RolloutInProgress. As the rollout status was only one point, I assume there was no timeout scenario.

The related log in the manager:

{"level":"ERROR","timestamp":"2024-10-21T04:39:31Z","caller":"commonstatus/checker.go:76","message":"Failed to probe agent - set condition as not healthy","controller":"logpipeline","controllerGroup":"telemetry.kyma-project.io","controllerKind":"LogPipeline","LogPipeline":{"name":"cls"},"namespace":"","name":"cls","reconcileID":"5f882c60-d56f-4113-8986-c4a111b0ca54","error":"Pod has failed: "}

Usually the manager is looking for the status.message field and it looks like unset. Otherwise we should have seen some message here. Pod has failed: "

Feedback from @rakesh-garimella:
There api has two fields that can be set Reason and Message both are optional. I was setting Message till now. I will also add Reason here to see more info. May be this is set

[types.go](https://github.com/kubernetes/api/blob/master/core/v1/types.go)
    Message string `json:"message,omitempty" protobuf:"bytes,3,opt,name=message"`
    // A brief CamelCase message indicating details about why the pod is in this state.
    // e.g. 'Evicted'
    // +optional
    Reason string `json:"reason,omitempty" protobuf:"bytes,4,opt,name=reason"`

probably printing all conditions would also be useful. Need to think how to incorporate this in the code

Expected result

Actual result

Steps to reproduce

Troubleshooting

Release Notes


@skhalash
Copy link
Collaborator

The problem is fixed, but we need to add an E2E test to make sure it won't happen in the future: #1566

@a-thaler a-thaler added this to the 1.27.0 milestone Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/logs LogPipeline kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants