-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: fluent-bit process once again hangs sometimes after being restarted #1407
Comments
I think this issue might be due to Fluent Bit. I will try to reproduce and test it. |
Sure, I've been testing livenessProbe as a workaround to restart the pod when it happens, not sure if it works yet. Here is a log from when the issue happens:
And nothing happens after that. The process is still running in the pod, but logs are not collected. I have not thought to check if the server is responsive, but I will if it see it happen again. Ultimately I think this issue could be closed and moved to fluent-bit's repo, perhaps this shouldn't be fixed on fluent-operator as any "fix" would be just a workaround. |
I've added some more info on an existing fluent-bit issue: fluent/fluent-bit#9354 (comment) @wenchajun @benjaminhuo - what is your opinion on this, is a workaround for this problem something that should be once again added to fluent-operator? Or should we wait until this problem is resolved on fluent-bit (uncertain when)? |
Describe the issue
Some time ago the fluentbit-watcher has been reworked to utilise the hot-reload feature
90d364b
This also meant removal of the SIGKILL call when the process is hanging. And so the issue that I initially reported in #510 has been reintroduced.
This is something that ideally would be fixed in fluent-bit itself (and I will report it there as well once I investigate this problem more in-depth and can reproduce it consistently...), but in the meantime I think it would be great to have handling for these situations reintroduced in fluent-operator.
To Reproduce
No clear steps to reproduce. Seems to happen when fluent-bit is restarted many times in a row, but not always
Expected behavior
Fluent-bit is restarted and works
Your Environment
How did you install fluent operator?
No response
Additional context
Keeping this as somewhat of a remainder go get back to this after 18.11 or so
The text was updated successfully, but these errors were encountered: