Description
Bug Report
Executing a hot reload with from a sidecar container in the same pod as fluent-bit with curl -X POST http://localhost:2020/api/v2/reload -d '{}'
gets stuck and if you try to reload again you get {"reload":"in progress","status":-2}
In the fluent-bit configuration we can see this and its stuck like this forever:
[engine] caught signal (SIGHUP)
[ info] reloading instance pid=1 tid=0x7f20c800ee40
[ info] [reload] stop everything of the old context
[ warn] [engine] service will shutdown when all remaining tasks are flushed
[ info] [input] pausing input-kube-pods-logs
[ info] [input] pausing input-kube-node-logs
[ info] [input] pausing cpu.2
[ info] [engine] service has stopped (0 pending tasks)
[ info] [input] pausing input-kube-pods-logs
[ info] [input] pausing input-kube-node-logs
[ info] [input] pausing cpu.2
Normally when the reload works the lines that come after this should be:
[input] pausing input-kube-node-logs
[input] pausing cpu.2
[output:forward:forward-to-fluentd] thread worker #0 stopping...
[output:forward:forward-to-fluentd] thread worker #0 stopped
[output:forward:forward-to-fluentd] thread worker #1 stopping...
[output:forward:forward-to-fluentd] thread worker #1 stopped
[input:tail:input-kube-node-logs] inotify_fs_remove(): inode=3146021 watch_fd=1
[input:tail:input-kube-node-logs] inotify_fs_remove(): inode=3145852 watch_fd=2
[input:tail:input-kube-node-logs] inotify_fs_remove(): inode=3145851 watch_fd=3
[input:tail:input-kube-node-logs] inotify_fs_remove(): inode=3146020 watch_fd=4
[reload] start everything
To Reproduce
Create fluent-bit.yaml:
service:
flush: 5
daemon: off
http_server: on
http_listen: 0.0.0.0
http_port: 2020
health_check: on
hot_reload: on
hc_errors_count: 5
hc_retry_failure_count: 5
hc_period: 5
log_level: info
parsers_file: /fluent-bit/etc/parsers.conf
pipeline:
inputs:
- name: cpu
tag: temp_cpu
filters:
- name: throttle
match: kubernetes.var.log.containers.example1-*
alias: example1
rate: 999999
window: 999999
interval: 999999s
outputs:
- name: forward
alias: forward-to-fluentd
match: kubernetes.*
upstream: /fluent-bit/etc/upstream.conf
port: 24224
retry_limit: false
includes:
- /fluent-bit/filters/*.yaml
- /fluent-bit/config/filters.yaml
- Steps to reproduce the problem:
- Start the container
- Execute
curl -X POST http://localhost:2020/api/v2/reload -d '{}'
multiple times until the reload hangs like I mentioned above
Expected behavior
The reload should fail/continue but not hang.
Your Environment
- Version used: FluentBit 3.1.7, Linux docker on Amazon EKS cluster node
Additional context
I am running a sidecar next to my fluent-bit container that dynamically creates a filter with throttling configuration that is per container on the same node as fluent-bit(fluent-bit runs as daemonset in my kubernetes cluster).
Everytime theres a change to the filter configuration the sidecar performs a hot reload with
curl -X POST http://localhost:2020/api/v2/reload -d '{}'
Is there maybe an option to verify when the reload fails and be able to retry it because if I try to run the reload again it just says it's in progress forever.
Also is there a way to monitor or know that it happens? Since health check shows as ok, I only noticed that happened because I stopped receiving logs.
Would be nice if there was a way to know about it and maybe restart the fluent-bit container or perform some sort of forced reload.