Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter grep exclude do not work properly #9753

Open
msoszka opened this issue Dec 19, 2024 · 2 comments
Open

Filter grep exclude do not work properly #9753

msoszka opened this issue Dec 19, 2024 · 2 comments

Comments

@msoszka
Copy link

msoszka commented Dec 19, 2024

Bug Report

Describe the bug
According to:

https://docs.fluentbit.io/manual/pipeline/filters/grep

It's possible to use regex in grep filter, but even if regex properly catch phrase on rubular it's still send to elasticsearch.

To Reproduce

  • Rubular link if applicable:

https://rubular.com/r/SgsgprT8Ndnk6d (qmailnospam parser)
https://rubular.com/r/TjcZTQ7T0iY3K7 (qmailspam parser)

  • Example log message if applicable:
Dec 19 13:27:58 mail01 qmail-scanner[25988]: SA:SPAM-REJECTED:RC:0(1.2.3.4):SA:1(139.0/6.8): 2.136989 9429 [email protected] [email protected] Some_random_email_subject <[email protected]> mail01173461127680125988-unpacked:9429
  • Steps to reproduce the problem:

Create two inputs on the same log with diffrent databases and tags:

[INPUT]
    Name tail
    Path /var/log/mail.log
    Parser qmailnospam
    Tag logs.qmail
    DB /etc/fluent-bit/mail.db

[INPUT]
    Name tail
    Path /var/log/mail.log
    Parser qmailspam
    Tag logs.spam
    DB /etc/fluent-bit/mail-spam.db

Apply filters:

[FILTER]
    Name grep
    Match logs.qmail
    Exclude log /SA:SPAM\-\D+\:RC:/

[FILTER]
    Name grep
    Match logs.spam
    Exclude log /Clear\:RC\:0/

[FILTER]
    Name grep
    Match logs.*
    Exclude log /Clear\:RC\:1/

[FILTER]
    Name parser
    Parser qmailnospam
    Match logs.qmail
    Key_Name log
    Preserve_Key True
    Reserve_data True

[FILTER]
    Name parser
    Parser qmailspam
    Match logs.spam
    Key_Name log
    Preserve_Key True
    Reserve_data True

And I see the same log in elasticsearch with sascore and without sascore - as it's parsed two times, but Exclude log /SA:SPAM\-\D+\:RC:/ should exclude this log from logs.qmail so only log with should be send to ES.

According to: https://stackoverflow.com/questions/58032099/exclude-pattern-on-a-grep-filter-on-fluent-bit-does-not-seem-to-be-working
I've tried with Exclude in //, without etc

Expected behavior

Exclude will exclude logs based on regex

Your Environment

  • Version used: fluent-bit 3.2.2, elasticsearch 8.15, kibana 8.15
  • Environment name and version (e.g. Kubernetes? What version?): fluent-bit is installed standalone on host side
  • Operating System and version: Debian 10
@patrick-stephens
Copy link
Contributor

It's hard to tell without sample data and the examples.
My suggestion is to debug this and provide a reproducer using dummy input and stdout with sample data showing actual vs expected output.
Some tips here: https://chronosphere.io/learn/fluent-bit-tips-tricks/

@msoszka
Copy link
Author

msoszka commented Dec 20, 2024

Example log in /var/log/mail.log looks like this:

Dec 20 13:34:11 mail01 qmail-scanner[26036]: SA:SPAM-REJECTED:RC:0(1.1.1.1):SA:1(141.0/6.8): 2.116756 15072 [email protected] [email protected] There_is_some_mail_subject <[email protected]> mail01173469804980126036-unpacked:15072

This produces in fluentbit two outputs:

Dec 20 13:34:11 mail01 fluent-bit[31378]: {"create":{"_index":"qmail-logs"}}
Dec 20 13:34:11 mail01 fluent-bit[31378]: {"@timestamp":"2024-12-20T13:34:11.000Z","server":"mail01","ip":"1.1.1.1","sascore":"141.0/6.8","from":"[email protected]","to":"[email protected]","subject":"There_is_some_mail_subject","Status":"SPAM"}

Dec 20 13:34:11 mail01 fluent-bit[31378]: {"create":{"_index":"qmail-logs"}}
Dec 20 13:34:11 mail01 fluent-bit[31378]: {"@timestamp":"2024-12-20T13:34:11.000Z","server":"mail01","ip":"1.1.1.1","from":"[email protected]","to":"[email protected]","subject":"There_is_some_mail_subject"}

My parsers looks like this:

[PARSER]
    Name  qmailnospam
    Format  regex
    Regex ^(?<customdate>\w+\s+\d+\s+\d+:\d+:\d+)\s+(?<server>[\w.-]+)\s+qmail-scanner\[\d+\]:.+:RC:0\((?<ip>[\d.]+)\)(?::SA:0\((?<sascore>-?[\d.]+\/[\d.]+)\))?.+?(?<from>[^ ]*@[^ ]*)\s+(?<to>[^ ]*@[^ ]*)\s+(?<subject>.+?)\s+<
    Time_Key  customdate
    Time_Format  %b %d %H:%M:%S
    Key_Name log

[PARSER]
    Name  qmailspam
    Format  regex
    Regex ^(?<customdate>\w+\s+\d+\s+\d+:\d+:\d+)\s+(?<server>[\w.-]+)\s+qmail-scanner\[\d+\]:.+SA:\D+RC:0\((?<ip>[\d.]+)\)(?::SA:1\((?<sascore>-?[\d.]+\/[\d.]+)\))?.+?(?<from>[^ ]*@[^ ]*)\s+(?<to>[^ ]*@[^ ]*)\s+(?<subject>.+?)\s+<
    Time_Key  customdate
    Time_Format  %b %d %H:%M:%S
    Key_Name log

and filters:

[FILTER]
    Name grep
    Match logs.qmail
    Exclude log /SA:SPAM\-\D+\:RC:/

[FILTER]
    Name grep
    Match logs.spam
    Exclude log /Clear\:RC\:0/

[FILTER]
    Name grep
    Match logs.*
    Exclude log /Clear\:RC\:1/

[FILTER]
    Name grep
    Match logs.*
    Exclude log /<>/

[FILTER]
    Name modify
    Match logs.spam
    Add Status SPAM

[FILTER]
    Name parser
    Parser qmailnospam
    Match logs.qmail
    Key_Name log
    Preserve_Key True
    Reserve_data True

[FILTER]
    Name parser
    Parser qmailspam
    Match logs.spam
    Key_Name log
    Preserve_Key True
    Reserve_data True

So parser and modify are applied properly (added Status: SPAM) but seems like Exclude is not applied when it should
(Exclude log /SA:SPAM\-\D+\:RC:/ is proper regex for SA:SPAM-REJECTED).

As a final my OUTPUT looks like this:

[OUTPUT]
    Name es
    Match logs.*
    Host somewhere
    Port 9200
    Index qmail-logs
    Suppress_Type_Name On

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants