Skip to content
This repository has been archived by the owner on Apr 2, 2024. It is now read-only.

New sandbox decoders for Microsft IIS and ULS log formats #1607

Open
wants to merge 7 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 127 additions & 0 deletions sandbox/lua/decoders/iis.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
-- This Source Code Form is subject to the terms of the Mozilla Public
-- License, v. 2.0. If a copy of the MPL was not distributed with this
-- file, You can obtain one at http://mozilla.org/MPL/2.0/.

--[[[hekad]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a [hekad] accidentally got pasted in here.


Parses the iis logs based on the Microsoft iis log formats. This decoder is tested for iis verions 7 and 8.


Config:

iis_version_7 = true
Default configuration asssumes iis log format for version 8.
For version 7 and similar formats, set the decoder config variable iis_version_7 to true

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to document the payload_keep setting.

*Example Heka Configuration*

.. code-block:: ini


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That [hekad] from above belongs here. :)

share_dir="C:\\heka-agent\\heka\\share\\heka"
base_dir = "C:\\var\\cache\\hekad"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These work, but for Windows paths you might prefer using single quotes, which indicate literal strings that don't require escaping, as described in the TOML spec.


[IISLogs]
type = "LogstreamerInput"
log_directory = "F:\\Web_Logs"
file_match = '(?P<dir>\w+)(?P<s>\S+)u_ex(?P<Index>\d+)\.log'
differentiator = ["dir"]
priority = ["Index"]
decoder = "IISDecoder"

[IISDecoder]
type = "SandboxDecoder"
script_type = "lua"
filename = "lua_decoders\\iis.lua"

[IISDecoder.config]
payload_keep = true
iis_version_7 = true
tz = "UTC"

*Example Heka Message*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heading is here for an example message of what this decoder would generate, but the example itself is missing.


--]]

local dt = require "date_time"
local l = require 'lpeg'
l.locale(l)

local sp = l.space
num = l.digit^1 / tonumber
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for num to not be local?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it can be local. Will change it.


function extract_quote(openp,endp)
openp = l.P(openp)
endp = endp and l.P(endp) or openp
local upto_endp = (1 - endp)^1
return openp * l.C(upto_endp) * endp
end

local sp = l.space

local timestamp = l.Cg(dt.build_strftime_grammar("%Y-%m-%d %H:%M:%S") / dt.time_to_ns, "timestamp")
local host_ip = l.Cg(extract_quote(" ", " "), "host_ip")
local cs_method = l.Cg(extract_quote("", " "), "cs_method")
local cs_uri_stem = l.Cg(extract_quote("", " "), "cs_uri_stem")
local cs_uri_query = l.Cg(extract_quote("", " "), "cs_uri_query")
local port = l.Cg(extract_quote("", " "), "port")
local cs_username = l.Cg(extract_quote("", " "), "cs_username")
local client_ip = l.Cg(extract_quote("", " "), "client_ip")
local cs_user_agent = l.Cg(extract_quote("", " "), "cs_user_agent")
local cs_referer = l.Cg(extract_quote("", " "), "cs_referer")
local status = l.Cg(num, "status")
local substatus = l.Cg(extract_quote(" ", " "), "substatus")
local win32_status = l.Cg(extract_quote("", " "), "win32_status")
local time_taken = l.Cg(num, "time_taken")
local version_8 = timestamp * host_ip * cs_method * cs_uri_stem * cs_uri_query * port * cs_username * client_ip * cs_user_agent * cs_referer * status * substatus * win32_status * time_taken
local version_7 = timestamp * host_ip * cs_method * cs_uri_stem * cs_uri_query * port * cs_username * client_ip * cs_user_agent * status * substatus * win32_status * time_taken

grammar = l.Ct(version_8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I don't see any reason for grammar to be global, and variable references will be resolved more quickly if it's local.



local iis_version = read_config("iis_version_7")

if iis_version then
grammar = l.Ct(version_7)
end

local payload_keep = read_config("payload_keep")


function process_message()

local msg = {
Timestamp = nil,
Payload = nil,
Hostname = nil,
Fields = nil,
Type = nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll get less GC churn if we define msg outside the process_message function, like so:

local msg = {
    Timestamp = nil,
    Payload = nil,
    Hostname = nil,
    Type = "iis",
    Fields = nil,
}


local data = read_message("Payload")
local host = read_message("Hostname")
local fields = grammar:match(data)

if not fields then
return -1
end
msg.Type = "iis"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initializing as above let's us remove this line.

msg.Timestamp = fields.timestamp
msg.Hostname = string.lower(host)
fields.timestamp = nil
msg.Fields = fields
if msg.Fields.cs_username == "-" then
msg.Fields.cs_username = ""
end
if msg.Fields.cs_uri_query == "-" then
msg.Fields.cs_uri_query = ""
end


if payload_keep then
msg.Payload = data
end

inject_message(msg)
return 0
end
99 changes: 99 additions & 0 deletions sandbox/lua/decoders/sharepoint_uls.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
-- This Source Code Form is subject to the terms of the Mozilla Public
-- License, v. 2.0. If a copy of the MPL was not distributed with this
-- file, You can obtain one at http://mozilla.org/MPL/2.0/.

--[[

Parses the Microsft sharepoint uls logs based on the uls log format.


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add a note about the payload_keep setting.

*Example Heka Configuration*

.. code-block:: ini

[hekad]
share_dir="C:\\heka-agent\\heka\\share\\heka"
base_dir = "C:\\var\\cache\\hekad"

[SharePointULSLogs]
type = "LogstreamerInput"
log_directory = "F:\\Trace_log"
file_match = '(?P<first>\w+)-(?P<second>\S+)-(?P<Year>\d{4})(?P<Month>\d{2})(?P<Day>\d{2})-(?P<time>\d+).log'
priority = ["Year","Month","Day","time"]
decoder = "SharePointDecoder"

[SharePointDecoder]
type = "SandboxDecoder"
script_type = "lua"
filename = "lua_decoders\\sharepoint.lua"

[SharePointDecoder.config]
payload_keep = true
tz = "Local"

*Example Heka Message*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same re: missing example message.


--]]

local dt = require "date_time"
local l = require 'lpeg'
l.locale(l)

local sp = l.space
num = l.digit^1 / tonumber
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here re: local num.


function extract_quote(openp,endp)
openp = l.P(openp)
endp = endp and l.P(endp) or openp
local upto_endp = (1 - endp)^0
return openp * l.C(upto_endp) * endp
end

local sp = l.space

local datetime = dt.build_strftime_grammar("%m/%d/%Y %H:%M:%S") / dt.time_to_ns * "." * l.R("09","*/")^0
local process = l.Cg(extract_quote(sp^1, "\t"), "Process")
local t_id = l.Cg(extract_quote("", "\t"), "TID")
local area = l.Cg(extract_quote("", "\t"), "Area")
local category = l.Cg(extract_quote("", "\t"), "Category")
local event_id = l.Cg(extract_quote("", "\t"), "EventID")
local level = l.Cg(extract_quote("", "\t"), "Level")
local message = l.Cg(extract_quote("", "\t"), "Message")
local correlation = l.Cg(l.P(1)^0, "Correlation")

local request = l.Cg(datetime,"DateTime") * process * t_id * area * category * event_id * level * message * correlation

grammar = l.Ct(request)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grammar can also be local.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract_quote should be local too


local payload_keep = read_config("payload_keep")

function process_message()

local msg = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define this outside process_message and the same memory space will be reused for every message, less allocation and GC.

Timestamp = nil,
Payload = nil,
Hostname = nil,
Fields = nil,
Type = nil
}

local data = read_message("Payload")
local host = read_message("Hostname")
local fields = grammar:match(data)

if not fields then
return -1
end
msg.Type = "uls"
msg.Timestamp = fields.DateTime
msg.Hostname = string.lower(host)
fields.DateTime = nil
msg.Fields = fields

if payload_keep then
msg.Payload = data
end

inject_message(msg)
return 0
end