Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter_log_to_metrics: Add timer callback for emitting metrics #9251

Merged
merged 1 commit into from
Sep 16, 2024

Conversation

drbugfinder-work
Copy link
Contributor

@drbugfinder-work drbugfinder-work commented Aug 20, 2024

This PR will change the log_to_metrics filter to use a timer based metric inject and not directly inject metrics on every incoming log record anymore. This will lower the overall load and memory consumption especially in high-volume and high-cardinality situations.

cc @leonardo-albertovich @edsiper

Fixes #9189

Testing
Before we can approve your change; please submit the following in a comment:

  • [N/A] Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [N/A] Run local packaging test showing all targets (including any new ones) build.
  • [N/A] Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@drbugfinder-work
Copy link
Contributor Author

drbugfinder-work commented Aug 20, 2024

root@fb-build:~/test# valgrind --leak-check=full -s ./fluent-bit-3-1-6-fix -c fb-sink.conf 
==100534== Memcheck, a memory error detector
==100534== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==100534== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==100534== Command: ./fluent-bit-3-1-6-fix -c fb-sink.conf
==100534== 
Fluent Bit v3.1.6
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _           _____  __  
|  ___| |                | |   | ___ (_) |         |____ |/  | 
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __   / /`| | 
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / /   \ \ | | 
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /.___/ /_| |_
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/ \____(_)___/

[2024/08/20 09:28:04] [ info] Configuration:
[2024/08/20 09:28:04] [ info]  flush time     | 1.000000 seconds
[2024/08/20 09:28:04] [ info]  grace          | 5 seconds
[2024/08/20 09:28:04] [ info]  daemon         | 0
[2024/08/20 09:28:04] [ info] ___________
[2024/08/20 09:28:04] [ info]  inputs:
[2024/08/20 09:28:04] [ info]      fluentbit_metrics
[2024/08/20 09:28:04] [ info]      forward
[2024/08/20 09:28:04] [ info] ___________
[2024/08/20 09:28:04] [ info]  filters:
[2024/08/20 09:28:04] [ info]      log_to_metrics.0
[2024/08/20 09:28:04] [ info] ___________
[2024/08/20 09:28:04] [ info]  outputs:
[2024/08/20 09:28:04] [ info]      prometheus_exporter.0
[2024/08/20 09:28:04] [ info] ___________
[2024/08/20 09:28:04] [ info]  collectors:
[2024/08/20 09:28:04] [ info] [fluent bit] version=3.1.6, commit=bda73d8850, pid=100534
[2024/08/20 09:28:04] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2024/08/20 09:28:04] [ info] [storage] ver=1.5.2, type=memory+filesystem, sync=full, checksum=on, max_chunks_up=128
[2024/08/20 09:28:04] [ info] [storage] backlog input plugin: storage_backlog.2
[2024/08/20 09:28:04] [ info] [cmetrics] version=0.9.4
[2024/08/20 09:28:04] [ info] [ctraces ] version=0.5.5
[2024/08/20 09:28:04] [ info] [input:fluentbit_metrics:fluentbit_metrics.0] initializing
[2024/08/20 09:28:04] [ info] [input:fluentbit_metrics:fluentbit_metrics.0] storage_strategy='memory' (memory only)
[2024/08/20 09:28:04] [debug] [fluentbit_metrics:fluentbit_metrics.0] created event channels: read=21 write=22
[2024/08/20 09:28:04] [ info] [input:forward:forward.1] initializing
[2024/08/20 09:28:04] [ info] [input:forward:forward.1] storage_strategy='filesystem' (memory + filesystem)
[2024/08/20 09:28:04] [debug] [forward:forward.1] created event channels: read=23 write=24
[2024/08/20 09:28:04] [debug] [in_fw] Listen='0.0.0.0' TCP_Port=24224
[2024/08/20 09:28:04] [debug] [downstream] listening on 0.0.0.0:24224
[2024/08/20 09:28:04] [ info] [input:forward:forward.1] listening on 0.0.0.0:24224
[2024/08/20 09:28:04] [ info] [input:storage_backlog:storage_backlog.2] initializing
[2024/08/20 09:28:04] [ info] [input:storage_backlog:storage_backlog.2] storage_strategy='memory' (memory only)
[2024/08/20 09:28:04] [debug] [storage_backlog:storage_backlog.2] created event channels: read=26 write=27
[2024/08/20 09:28:04] [ info] [input:storage_backlog:storage_backlog.2] queue memory limit: 4.8M
[2024/08/20 09:28:04] [ info] [input:emitter:emitter_for_log_metrics_filter_messages_forwarded_total] initializing
[2024/08/20 09:28:04] [ info] [input:emitter:emitter_for_log_metrics_filter_messages_forwarded_total] storage_strategy='memory' (memory only)
[2024/08/20 09:28:04] [debug] [emitter:emitter_for_log_metrics_filter_messages_forwarded_total] created event channels: read=28 write=29
[2024/08/20 09:28:04] [debug] [filter:log_to_metrics:log_metrics_filter_messages_forwarded_total] Creating metric timer with frequency 250ms
[2024/08/20 09:28:04] [debug] [prometheus_exporter:prometheus_exporter.0] created event channels: read=31 write=32
[2024/08/20 09:28:05] [ info] [output:prometheus_exporter:prometheus_exporter.0] listening iface=0.0.0.0 tcp_port=2023
[2024/08/20 09:28:05] [debug] [router] match rule fluentbit_metrics.0:prometheus_exporter.0
[2024/08/20 09:28:05] [debug] [router] match rule forward.1:prometheus_exporter.0
[2024/08/20 09:28:05] [debug] [router] match rule storage_backlog.2:prometheus_exporter.0
[2024/08/20 09:28:05] [debug] [router] match rule emitter.3:prometheus_exporter.0
[2024/08/20 09:28:05] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2024
[2024/08/20 09:28:05] [ info] [sp] stream processor started
[2024/08/20 09:28:07] [debug] [task] created task=0x52cc3e0 id=0 OK
[2024/08/20 09:28:07] [debug] [out flush] cb_destroy coro_id=0
[2024/08/20 09:28:07] [debug] [task] destroy task=0x52cc3e0 (task_id=0)
[2024/08/20 09:28:09] [debug] [task] created task=0x9a7eb60 id=0 OK
[2024/08/20 09:28:09] [debug] [out flush] cb_destroy coro_id=1
[2024/08/20 09:28:09] [debug] [task] destroy task=0x9a7eb60 (task_id=0)
^C[2024/08/20 09:28:10] [engine] caught signal (SIGINT)
[2024/08/20 09:28:10] [ warn] [engine] service will shutdown in max 5 seconds
[2024/08/20 09:28:10] [ info] [input] pausing fluentbit_metrics.0
[2024/08/20 09:28:10] [ info] [input] pausing forward.1
[2024/08/20 09:28:10] [ info] [input] pausing storage_backlog.2
[2024/08/20 09:28:10] [ info] [input] pausing emitter_for_log_metrics_filter_messages_forwarded_total
[2024/08/20 09:28:10] [debug] [filter:log_to_metrics:log_metrics_filter_messages_forwarded_total] Stopping callback timer
[2024/08/20 09:28:10] [ info] [engine] service has stopped (0 pending tasks)
[2024/08/20 09:28:10] [ info] [input] pausing fluentbit_metrics.0
[2024/08/20 09:28:10] [ info] [input] pausing forward.1
[2024/08/20 09:28:10] [ info] [input] pausing storage_backlog.2
[2024/08/20 09:28:10] [ info] [input] pausing emitter_for_log_metrics_filter_messages_forwarded_total
[2024/08/20 09:28:10] [debug] [filter:log_to_metrics:log_metrics_filter_messages_forwarded_total] Destroying callback timer
==100534== 
==100534== HEAP SUMMARY:
==100534==     in use at exit: 0 bytes in 0 blocks
==100534==   total heap usage: 31,365 allocs, 31,365 frees, 57,260,387 bytes allocated
==100534== 
==100534== All heap blocks were freed -- no leaks are possible
==100534== 
==100534== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Copy link
Collaborator

@leonardo-albertovich leonardo-albertovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of adding the option to the plugin to keep a metrics context and periodically flush it but I don't think imposing it as the new default behavior is acceptable.

Please restore the original "immediate flush" code and modify your code to select between immediate and delayed flushing (maybe by defaulting the value of timer_interval_ms to 0 and interpretting 0 as immediate?).

Also, please rename timer_interval_ms to something more intuitive like flush_interval and use the FLB_CONFIG_MAP_TIME type instead (which automatically interprets units and makes it easier for the user).

@drbugfinder-work
Copy link
Contributor Author

drbugfinder-work commented Aug 20, 2024

Hi @leonardo-albertovich!
Thanks for your quick feedback! I just implemented the changes. I did not use the FLB_CONFIG_MAP_TIME, as I want the option for sub-seconds. (Or should I go back to my initial interval_ms/flush_ms - like done in filter_multiline?

FLB_CONFIG_MAP_INT, "flush_ms", "2000",
)

Can you please check the changes? Thank you very much!

Copy link
Collaborator

@leonardo-albertovich leonardo-albertovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with these minimal changes it should be ready to be merged.
I just want to be clear that I'm not familiar with this plugin, my review is based on the code I'm seeing and the intended outcome.

Thanks a lot Doctor!

plugins/filter_log_to_metrics/log_to_metrics.c Outdated Show resolved Hide resolved
plugins/filter_log_to_metrics/log_to_metrics.c Outdated Show resolved Hide resolved
plugins/filter_log_to_metrics/log_to_metrics.c Outdated Show resolved Hide resolved
plugins/filter_log_to_metrics/log_to_metrics.c Outdated Show resolved Hide resolved
This commit will change the log_to_metrics filter to use a timer based
metric inject and not directly inject metrics on every incoming log
record anymore. This will lower the overall load and memory consumption
 especially in high-volume and high-cardinality situations.

Signed-off-by: Richard Treu <[email protected]>
@drbugfinder-work
Copy link
Contributor Author

I think with these minimal changes it should be ready to be merged. I just want to be clear that I'm not familiar with this plugin, my review is based on the code I'm seeing and the intended outcome.

Thank you very much @leonardo-albertovich!

@edsiper edsiper added this to the Fluent Bit v3.2.0 milestone Aug 20, 2024
@drbugfinder-work
Copy link
Contributor Author

Docs PR: fluent/fluent-bit-docs#1440

@edsiper edsiper merged commit 724940e into fluent:master Sep 16, 2024
43 checks passed
@edsiper
Copy link
Member

edsiper commented Sep 16, 2024

(cc: docs @lecaros )

@drbugfinder-work drbugfinder-work deleted the log_to_metrics_timer branch September 17, 2024 06:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory leak in in_emitter mem_buf when appending metric
3 participants