Skip to content

Doesn't properly handle non-ASCII UTF-8 characters in GELF input #54

Open
@mikaelstaldal

Description

@mikaelstaldal
  • Version: Logstash 5.3.2
  • Operating System: Ubuntu 16.04
  • Config File:
input { gelf { } }
output {
  elasticsearch { hosts => ["localhost:9200"] }
  stdout { codec => json_lines }
}
  • Sample Data:
{"version":"1.1","host":"udp-zlib","timestamp":1493384075.248,"level":6,"_thread":"main","_logger":"HelloWorld","_additionalField1":"constant value","_additionalField2":"foo bar","_bar":"BAR","_foo":"FOO","short_message":"Hello, world! åäö 1"}

When sending in the above GELF message (compressed with ZLIB, over UDP) to Logstash (using Log4j 2.8.2), the non-ASCII characters in short_message gets garbled. It ends up like this in ElasticSearch:

 {
          "source_host": "127.0.0.1",
          "level": 6,
          "logger": "HelloWorld",
          "foo": "FOO",
          "thread": "main",
          "message": "Hello, world! åäö 1",
          "version": "1.1",
          "bar": "BAR",
          "@timestamp": "2017-04-28T10:28:06.263Z",
          "host": "udp-zlib",
          "@version": "1",
          "additionalField1": "constant value",
          "additionalField2": "foo bar"       
}

It seems that Logstash GELF input doesn't decode the GELF message with UTF-8 as it should according to GELF spec.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions