You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Executing a hot reload with from a sidecar container in the same pod as fluent-bit with curl -X POST http://localhost:2020/api/v2/reload -d '{}' gets stuck and if you try to reload again you get {"reload":"in progress","status":-2}
In the fluent-bit configuration we can see this and its stuck like this forever:
[engine] caught signal (SIGHUP)
[ info] reloading instance pid=1 tid=0x7f20c800ee40
[ info] [reload] stop everything of the old context
[ warn] [engine] service will shutdown when all remaining tasks are flushed
[ info] [input] pausing input-kube-pods-logs
[ info] [input] pausing input-kube-node-logs
[ info] [input] pausing cpu.2
[ info] [engine] service has stopped (0 pending tasks)
[ info] [input] pausing input-kube-pods-logs
[ info] [input] pausing input-kube-node-logs
[ info] [input] pausing cpu.2
Normally when the reload works the lines that come after this should be:
Execute curl -X POST http://localhost:2020/api/v2/reload -d '{}' multiple times until the reload hangs like I mentioned above
Expected behavior
The reload should fail/continue but not hang.
Your Environment
Version used: FluentBit 3.1.7, Linux docker on Amazon EKS cluster node
Additional context
I am running a sidecar next to my fluent-bit container that dynamically creates a filter with throttling configuration that is per container on the same node as fluent-bit(fluent-bit runs as daemonset in my kubernetes cluster).
Everytime theres a change to the filter configuration the sidecar performs a hot reload with curl -X POST http://localhost:2020/api/v2/reload -d '{}'
Is there maybe an option to verify when the reload fails and be able to retry it because if I try to run the reload again it just says it's in progress forever.
Also is there a way to monitor or know that it happens? Since health check shows as ok, I only noticed that happened because I stopped receiving logs.
Would be nice if there was a way to know about it and maybe restart the fluent-bit container or perform some sort of forced reload.
The text was updated successfully, but these errors were encountered:
Seems like reproducing this issue is not as easy as I thought but I have some more insight:
I managed to find all the pods where this issue happens by querying this in prometheus:
increase(fluentbit_uptime[5m]) == 0
I noticed that the pods that stop sending logs also have the fluentbit_uptime metric stop increasing
I looked at the resource consumption of the fluent-bit container and the memory in more than half of them is around 20-30mb memory usage where my requests is 64mb and limits is 256mb
As for CPU resource consumption it was pretty much the same (not fully used).
I checked how many hot reloads back to back it took for it to fail and it seems like it was 2 reloads in 30s increments but in some it was even 2mins.
In one of them I noticed it took just 1 reload to make it stop working and both cpu/memory were low in the container.
I tried reproducing it spamming the hot reload curl in multiple different kind of pods with different scenarios:
container with a lot of pods in the same node
container with pods that write a lot of logs
container with high cpu/memory usage
I didn’t find any of these criterias to affect if the hot reload will fail even when spamming the hot reload curl nonstop without any wait.
Not really sure what else it could be but it’s a tough one 😛
Bug Report
Executing a hot reload with from a sidecar container in the same pod as fluent-bit with
curl -X POST http://localhost:2020/api/v2/reload -d '{}'
gets stuck and if you try to reload again you get{"reload":"in progress","status":-2}
In the fluent-bit configuration we can see this and its stuck like this forever:
Normally when the reload works the lines that come after this should be:
To Reproduce
Create fluent-bit.yaml:
curl -X POST http://localhost:2020/api/v2/reload -d '{}'
multiple times until the reload hangs like I mentioned aboveExpected behavior
The reload should fail/continue but not hang.
Your Environment
Additional context
I am running a sidecar next to my fluent-bit container that dynamically creates a filter with throttling configuration that is per container on the same node as fluent-bit(fluent-bit runs as daemonset in my kubernetes cluster).
Everytime theres a change to the filter configuration the sidecar performs a hot reload with
curl -X POST http://localhost:2020/api/v2/reload -d '{}'
Is there maybe an option to verify when the reload fails and be able to retry it because if I try to run the reload again it just says it's in progress forever.
Also is there a way to monitor or know that it happens? Since health check shows as ok, I only noticed that happened because I stopped receiving logs.
Would be nice if there was a way to know about it and maybe restart the fluent-bit container or perform some sort of forced reload.
The text was updated successfully, but these errors were encountered: