Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string) #55

Open
himanshigpta opened this issue Nov 1, 2020 · 0 comments

Comments

@himanshigpta
Copy link

Problem

I'm getting below error while shipping logs to ES via td-agent 1.11.1:

2020-11-01 17:11:42 +0530 [error]: #0 incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)
  2020-11-01 17:11:42 +0530 [error]: #0 suppressed same stacktrace
2020-11-01 17:11:42 +0530 [error]: #0 incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/parser_regexp.rb:50:in `match'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/parser_regexp.rb:50:in `parse'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluent-plugin-grok-parser-2.6.1/lib/fluent/plugin/parser_multiline_grok.rb:21:in `block in parse'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluent-plugin-grok-parser-2.6.1/lib/fluent/plugin/parser_multiline_grok.rb:20:in `each'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluent-plugin-grok-parser-2.6.1/lib/fluent/plugin/parser_multiline_grok.rb:20:in `parse'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:546:in `block in parse_multilines'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:544:in `each'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:544:in `parse_multilines'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:469:in `call'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:469:in `receive_lines'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:845:in `block in handle_notify'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:877:in `with_io'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:825:in `handle_notify'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:808:in `block in on_notify'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:808:in `synchronize'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:808:in `on_notify'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:653:in `on_notify'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:325:in `block in setup_watcher'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:596:in `on_timer'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run_once'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2020-11-01 17:11:43 +0530 [error]: #0 incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)
  2020-11-01 17:11:43 +0530 [error]: #0 suppressed same stacktrace

I've added the parameter suggested here 👍 https://github.com/repeatedly/fluent-plugin-record-modifier#char_encoding as it was recommended here https://docs.fluentd.org/quickstart/faq but the issue persists.

...

Steps to replicate

Provide example config and message

# encoding: utf-8
<source>
  @type tail
  path /var/log/messages
  pos_file /etc/td-agent/new_var_log_msg_grok.log.pos
  #time_format %Y-%m-%dT%H:%M:%S.%NZ
  time_format %b %dT%H:%M:%SZ
  tag var.msg
  <parse>
    @type multiline_grok
    <grok>
     pattern %{SYSLOGTIMESTAMP:time}%{SPACE}%{HOSTNAME:hostname}%{SPACE}%{GREEDYDATA:service_name}:%{GREEDYDATA:log_message}
    </grok>
  </parse>
</source>

<filter var.msg>
    @type record_modifier
     <record>
     hostname "#{Socket.gethostname}"
     formatted_time ${Time.at(time).iso8601(3)}
     char_encoding utf-8
     char_encoding utf-8:euc-jp
     </record>
</filter>

<match var.msg>
  @type elasticsearch
#  type_name "_doc"
  hosts redacted:9200
  scheme "https"
  ssl_version TLSv1_2
  ssl_verify false
  ca_file "/etc/td-agent/cert.crt"
  user redacted
  password redacted
  reload_connections false
  reconnect_on_error true
  reload_on_failure true
  log_es_400_reason false
  logstash_prefix messages_logs
  logstash_format true
  logstash_dateformat %V
  index_name "messages_logs"
  type_name "fluentd"
  include_timestamp true
  <buffer>
    @type file
    path /etc/td-agent/messages/buffers
    chunk_limit_size 1M
    flush_interval 5s
    retry_forever false
    retry_max_times 3
    retry_wait 10
    retry_max_interval 300
    flush_thread_count 8
  </buffer>
</match>

`

Expected Behavior or What you need to ask

The same config is working fine for most servers even without char_encoding parameter. Td-agent of same version should have same behaviour across servers with same configuration. The error should go after adding the encoding parameter.
...

Using Fluentd and ES plugin versions

  • OS version
    Red Hat Enterprise Linux Server release 7.9 (Maipo)

  • Fluentd v0.12 or v0.14/v1.0

    td-agent --version

    td-agent 1.11.1

  • ES plugin 3.x.y/2.x.y or 1.x.y

    • paste result of fluent-gem list, td-agent-gem list or your Gemfile.lock
  td-agent-gem list

*** LOCAL GEMS ***

addressable (2.7.0)
async (1.26.2)
async-http (0.52.4)
async-io (1.30.0)
async-pool (0.3.2)
aws-eventstream (1.1.0)
aws-partitions (1.337.0)
aws-sdk-core (3.102.1)
aws-sdk-kms (1.35.0)
aws-sdk-s3 (1.72.0)
aws-sdk-sqs (1.29.0)
aws-sigv4 (1.2.1)
benchmark (default: 0.1.0)
bigdecimal (default: 2.0.0)
bundler (2.1.4)
cgi (default: 0.1.0)
concurrent-ruby (1.1.6)
console (1.8.2)
cool.io (1.6.0)
csv (default: 3.1.2)
date (default: 3.0.0)
delegate (default: 0.1.0)
did_you_mean (default: 1.4.0)
digest-crc (0.6.1)
elasticsearch (7.8.0)
elasticsearch-api (7.8.0)
elasticsearch-transport (7.8.0)
elasticsearch-xpack (7.9.0)
etc (default: 1.1.0)
excon (0.75.0)
faraday (1.0.1)
fcntl (default: 1.0.0)
ffi (1.13.1)
fiddle (default: 1.0.0)
fileutils (default: 1.4.1)
fluent-config-regexp-type (1.0.0)
fluent-logger (0.8.2)
fluent-plugin-concat (2.4.0)
fluent-plugin-elasticsearch (4.1.1, 4.0.9)
fluent-plugin-grok-parser (2.6.1)
fluent-plugin-kafka (0.13.0)
fluent-plugin-prometheus (1.8.0)
fluent-plugin-prometheus_pushgateway (0.0.2)
fluent-plugin-record-modifier (2.1.0)
fluent-plugin-rewrite-tag-filter (2.3.0)
fluent-plugin-s3 (1.3.3)
fluent-plugin-systemd (1.0.2)
fluent-plugin-td (1.1.0)
fluent-plugin-td-monitoring (1.0.0)
fluent-plugin-webhdfs (1.2.5)
fluentd (1.11.1)
forwardable (default: 1.3.1)
getoptlong (default: 0.1.0)
hirb (0.7.3)
http_parser.rb (0.6.0)
httpclient (2.8.2.4)
io-console (default: 0.5.6)
ipaddr (default: 1.2.2)
ipaddress (0.8.3)
irb (default: 1.2.3)
jmespath (1.4.0)
json (default: 2.3.0)
logger (default: 1.4.2)
ltsv (0.1.2)
matrix (default: 0.2.0)
mini_portile2 (2.5.0)
minitest (5.13.0)
mixlib-cli (1.7.0)
mixlib-config (2.2.3)
mixlib-log (1.7.1)
mixlib-shellout (2.2.7)
msgpack (1.3.3)
multi_json (1.14.1)
multipart-post (2.1.1)
mutex_m (default: 0.1.0)
net-pop (default: 0.1.0)
net-smtp (default: 0.1.0)
net-telnet (0.2.0)
nio4r (2.5.2)
nokogiri (1.11.0.rc2)
observer (default: 0.1.0)
ohai (6.20.0)
oj (3.10.6)
open3 (default: 0.1.0)
openssl (default: 2.1.2)
ostruct (default: 0.2.0)
parallel (1.19.2)
power_assert (1.1.7)
prime (default: 0.1.1)
prometheus-client (0.9.0)
protocol-hpack (1.4.2)
protocol-http (0.20.0)
protocol-http1 (0.13.0)
protocol-http2 (0.14.0)
pstore (default: 0.1.0)
psych (default: 3.1.0)
public_suffix (4.0.5)
quantile (0.2.1)
racc (default: 1.4.16)
rake (13.0.1)
rdkafka (0.8.0)
rdoc (default: 6.2.1)
readline (default: 0.0.2)
readline-ext (default: 0.1.0)
reline (default: 0.1.3)
rexml (default: 3.2.3)
rss (default: 0.2.8)
ruby-kafka (1.1.0)
ruby-progressbar (1.10.1)
rubyzip (1.3.0)
sdbm (default: 1.0.0)
serverengine (2.2.1)
sigdump (0.2.4)
singleton (default: 0.1.0)
stringio (default: 0.1.0)
strptime (0.2.4)
strscan (default: 1.0.3)
systemd-journal (1.3.3)
systemu (2.5.2)
td (0.16.9)
td-client (1.0.7)
td-logger (0.3.27)
test-unit (3.3.4)
timeout (default: 0.1.0)
timers (4.3.0)
tracer (default: 0.1.0)
tzinfo (2.0.2)
tzinfo-data (1.2020.1)
uri (default: 0.10.0)
webhdfs (0.9.0)
webrick (default: 1.6.0)
xmlrpc (0.3.0)
yajl-ruby (1.4.1)
yaml (default: 0.1.0)
zip-zip (0.3)
zlib (default: 1.1.0)
  • ES version (optional)
    7.5.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant