Skip to content

Hot reload stuck in progress after pausing inputs #9354

Closed as not planned
Closed as not planned
@yuvalhnoga

Description

@yuvalhnoga

Bug Report

Executing a hot reload with from a sidecar container in the same pod as fluent-bit with curl -X POST http://localhost:2020/api/v2/reload -d '{}' gets stuck and if you try to reload again you get {"reload":"in progress","status":-2}

In the fluent-bit configuration we can see this and its stuck like this forever:

[engine] caught signal (SIGHUP)
[ info] reloading instance pid=1 tid=0x7f20c800ee40
[ info] [reload] stop everything of the old context
[ warn] [engine] service will shutdown when all remaining tasks are flushed
[ info] [input] pausing input-kube-pods-logs
[ info] [input] pausing input-kube-node-logs
[ info] [input] pausing cpu.2
[ info] [engine] service has stopped (0 pending tasks)
[ info] [input] pausing input-kube-pods-logs
[ info] [input] pausing input-kube-node-logs
[ info] [input] pausing cpu.2

Normally when the reload works the lines that come after this should be:

[input] pausing input-kube-node-logs
[input] pausing cpu.2
[output:forward:forward-to-fluentd] thread worker #0 stopping...
[output:forward:forward-to-fluentd] thread worker #0 stopped
[output:forward:forward-to-fluentd] thread worker #1 stopping...
[output:forward:forward-to-fluentd] thread worker #1 stopped
[input:tail:input-kube-node-logs] inotify_fs_remove(): inode=3146021 watch_fd=1
[input:tail:input-kube-node-logs] inotify_fs_remove(): inode=3145852 watch_fd=2
[input:tail:input-kube-node-logs] inotify_fs_remove(): inode=3145851 watch_fd=3
[input:tail:input-kube-node-logs] inotify_fs_remove(): inode=3146020 watch_fd=4
[reload] start everything

To Reproduce

Create fluent-bit.yaml:

service:
    flush: 5
    daemon: off
    http_server: on
    http_listen: 0.0.0.0
    http_port: 2020
    health_check: on
    hot_reload: on
    hc_errors_count: 5
    hc_retry_failure_count: 5
    hc_period: 5
    log_level: info
    parsers_file: /fluent-bit/etc/parsers.conf
    
pipeline:
  inputs:
    - name: cpu
      tag: temp_cpu
  filters:
    - name: throttle
      match: kubernetes.var.log.containers.example1-*
      alias: example1
      rate: 999999
      window: 999999
      interval: 999999s
  outputs:
    - name: forward
      alias: forward-to-fluentd
      match: kubernetes.*
      upstream: /fluent-bit/etc/upstream.conf
      port: 24224
      retry_limit: false
      
includes:
    - /fluent-bit/filters/*.yaml
    - /fluent-bit/config/filters.yaml
  • Steps to reproduce the problem:
    • Start the container
    • Execute curl -X POST http://localhost:2020/api/v2/reload -d '{}' multiple times until the reload hangs like I mentioned above

Expected behavior
The reload should fail/continue but not hang.

Your Environment

  • Version used: FluentBit 3.1.7, Linux docker on Amazon EKS cluster node

Additional context
I am running a sidecar next to my fluent-bit container that dynamically creates a filter with throttling configuration that is per container on the same node as fluent-bit(fluent-bit runs as daemonset in my kubernetes cluster).
Everytime theres a change to the filter configuration the sidecar performs a hot reload with
curl -X POST http://localhost:2020/api/v2/reload -d '{}'

Is there maybe an option to verify when the reload fails and be able to retry it because if I try to run the reload again it just says it's in progress forever.
Also is there a way to monitor or know that it happens? Since health check shows as ok, I only noticed that happened because I stopped receiving logs.
Would be nice if there was a way to know about it and maybe restart the fluent-bit container or perform some sort of forced reload.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions