Hot reload stuck in progress after pausing inputs

## Bug Report

Executing a hot reload with from a sidecar container in the same pod as fluent-bit with `curl -X POST http://localhost:2020/api/v2/reload -d '{}'` gets stuck and if you try to reload again you get `{"reload":"in progress","status":-2}`

In the fluent-bit configuration we can see this and its stuck like this forever:
```
[engine] caught signal (SIGHUP)
[ info] reloading instance pid=1 tid=0x7f20c800ee40
[ info] [reload] stop everything of the old context
[ warn] [engine] service will shutdown when all remaining tasks are flushed
[ info] [input] pausing input-kube-pods-logs
[ info] [input] pausing input-kube-node-logs
[ info] [input] pausing cpu.2
[ info] [engine] service has stopped (0 pending tasks)
[ info] [input] pausing input-kube-pods-logs
[ info] [input] pausing input-kube-node-logs
[ info] [input] pausing cpu.2
```

Normally when the reload works the lines that come after this should be:
```
[input] pausing input-kube-node-logs
[input] pausing cpu.2
[output:forward:forward-to-fluentd] thread worker #0 stopping...
[output:forward:forward-to-fluentd] thread worker #0 stopped
[output:forward:forward-to-fluentd] thread worker #1 stopping...
[output:forward:forward-to-fluentd] thread worker #1 stopped
[input:tail:input-kube-node-logs] inotify_fs_remove(): inode=3146021 watch_fd=1
[input:tail:input-kube-node-logs] inotify_fs_remove(): inode=3145852 watch_fd=2
[input:tail:input-kube-node-logs] inotify_fs_remove(): inode=3145851 watch_fd=3
[input:tail:input-kube-node-logs] inotify_fs_remove(): inode=3146020 watch_fd=4
[reload] start everything
```

**To Reproduce**

Create fluent-bit.yaml:

```
service:
    flush: 5
    daemon: off
    http_server: on
    http_listen: 0.0.0.0
    http_port: 2020
    health_check: on
    hot_reload: on
    hc_errors_count: 5
    hc_retry_failure_count: 5
    hc_period: 5
    log_level: info
    parsers_file: /fluent-bit/etc/parsers.conf
    
pipeline:
  inputs:
    - name: cpu
      tag: temp_cpu
  filters:
    - name: throttle
      match: kubernetes.var.log.containers.example1-*
      alias: example1
      rate: 999999
      window: 999999
      interval: 999999s
  outputs:
    - name: forward
      alias: forward-to-fluentd
      match: kubernetes.*
      upstream: /fluent-bit/etc/upstream.conf
      port: 24224
      retry_limit: false
      
includes:
    - /fluent-bit/filters/*.yaml
    - /fluent-bit/config/filters.yaml
```

- Steps to reproduce the problem:
  -  Start the container
  -  Execute `curl -X POST http://localhost:2020/api/v2/reload -d '{}'` multiple times until the reload hangs like I mentioned above

**Expected behavior**
The reload should fail/continue but not hang.

**Your Environment**
* Version used: FluentBit 3.1.7, Linux docker on Amazon EKS cluster node

**Additional context**
I am running a sidecar next to my fluent-bit container that dynamically creates a filter with [throttling configuration](https://docs.fluentbit.io/manual/pipeline/filters/throttle) that is per container on the same node as fluent-bit(fluent-bit runs as daemonset in my kubernetes cluster).
Everytime theres a change to the filter configuration the sidecar performs a hot reload with
`curl -X POST http://localhost:2020/api/v2/reload -d '{}'`

Is there maybe an option to verify when the reload fails and be able to retry it because if I try to run the reload again it just says it's in progress forever.
Also is there a way to monitor or know that it happens? Since health check shows as ok, I only noticed that happened because I stopped receiving logs.
Would be nice if there was a way to know about it and maybe restart the fluent-bit container or perform some sort of forced reload.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hot reload stuck in progress after pausing inputs #9354

Bug Report

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hot reload stuck in progress after pausing inputs #9354

Description

Bug Report

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions