Vector config for better collecting metrics perfomance #10881
-
Hello. I'm testing vector as a replacement for promtail and experienced some performance problems with straightforward configuration. Logs are collected and sent to loki from nginx in a JSON format with ~1500 lines per second rate. {
"msec": "1642071602.041",
"bytes_sent": "14372",
"request_uri": "/some-url",
"http_host": "some.host.example",
"server_name": "some.server",
"server_port": "443",
"request_time": "0.001",
"request_method": "GET",
"cdn_upstream": "someupstream",
"upstream_addr": "127.0.0.1:8081",
"upstream_status": "200",
"upstream_cache_status": "EXPIRED"
}
server:
http_listen_port: 30200
grpc_listen_address: 127.0.0.1
grpc_listen_port: 30201
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: var_log
pipeline_stages:
- json:
expressions:
timestamp: msec
request_time: request_time
bytes_sent: bytes_sent
status: status
upstream_status: upstream_status
upstream_addr: upstream_addr
cdn_upstream: cdn_upstream
upstream_cache_status: upstream_cache_status
- regex:
source: timestamp
expression: "(?P<timestamp>[0-9]+)"
- timestamp:
source: timestamp
format: Unix
action_on_failure: fudge
- labels:
cdn_upstream:
status:
upstream_addr:
upstream_status:
upstream_cache_status:
- metrics:
nginx_bytes_sent:
type: Counter
description: "total bytes sent"
prefix: promtail_custom_
max_idle_duration: 10m
source: bytes_sent
config:
action: add
nginx_request_time:
type: Histogram
description: "request time ms"
prefix: promtail_custom_
max_idle_duration: 10m
source: request_time
config:
buckets: [0.050,0.100,0.500,0.800,1.0,1.5,2.0,5.0]
nginx_request:
type: Counter
description: "request"
prefix: promtail_custom_
max_idle_duration: 10m
config:
match_all: true
action: inc
static_configs:
- targets:
- localhost
labels:
job: nginx_access_log
host: myhost
agent: promtail
__path__: /var/log/nginx/*_json.log [sources.var_json_log]
type = "file"
read_from = "end"
include = [ "/var/log/nginx/*_json.log" ]
[transforms.parser]
type = "remap"
inputs = [ "var_json_log", "data_json_log" ]
source = """
. |= parse_json!(string!(.message))
del(.message)
"""
[transforms.meter]
type = "log_to_metric"
inputs = [ "parser" ]
[[transforms.meter.metrics]]
field = "bytes_sent"
name = "bytes_sent"
type = "counter"
increment_by_value = true
namespace = "vector_custom_nginx"
[transforms.meter.metrics.tags]
cdn_upstream = "{{ cdn_upstream }}"
status = "{{ status }}"
upstream_addr = "{{ upstream_addr }}"
upstream_cache_status = "{{ upstream_cache_status }}"
upstream_status = "{{ upstream_status }}"
[[transforms.meter.metrics]]
field = "request_time"
name = "request_time"
type = "histogram"
increment_by_value = true
namespace = "vector_custom_nginx"
[transforms.meter.metrics.tags]
cdn_upstream = "{{ cdn_upstream }}"
status = "{{ status }}"
upstream_addr = "{{ upstream_addr }}"
upstream_cache_status = "{{ upstream_cache_status }}"
upstream_status = "{{ upstream_status }}"
[[transforms.meter.metrics]]
field = "msec"
name = "request"
type = "counter"
namespace = "vector_custom_nginx"
[transforms.meter.metrics.tags]
cdn_upstream = "{{ cdn_upstream }}"
status = "{{ status }}"
upstream_addr = "{{ upstream_addr }}"
upstream_cache_status = "{{ upstream_cache_status }}"
upstream_status = "{{ upstream_status }}"
[sinks.loki]
type = "loki"
inputs = [ "parser" ]
endpoint = "http://loki:3100"
encoding = "json"
[sinks.loki.labels]
cdn_upstream = "{{ cdn_upstream }}"
status = "{{ status }}"
upstream_addr = "{{ upstream_addr }}"
upstream_cache_status = "{{ upstream_cache_status }}"
upstream_status = "{{ upstream_status }}"
[sinks.prometheus]
type = "prometheus_exporter"
inputs = [ "meter" ]
address = "0.0.0.0:30201"
buckets = [0.050,0.100,0.500,0.800,1.0,1.5,2.0,5.0] [transforms.parser]
type = "remap"
inputs = [ "var_json_log", "data_json_log" ]
drop_on_abort = true
drop_on_error = true
reroute_dropped = true
source = """
._let_me_flush = floor(to_float(to_unix_timestamp!(.timestamp)) / 10 ?? 0)
. |= object!(parse_json!(string!(.message)))
del(.message)
._i = 1
._bytes_sent = to_int(.bytes_sent) ?? 0
._request_time = to_float(.request_time) ?? 0.0
"""
[transforms.reducer]
type = "reduce"
inputs = ["parser"]
expire_after_ms = 8000
group_by = [
"_let_me_flush",
"cdn_upstream",
"file",
"status",
"upstream_addr",
"upstream_cache_status",
"upstream_status"
]
[transforms.reducer.merge_strategies]
_request_time = "array"
[transforms.histogram]
type = "lua"
version = "2"
inputs = [ "reducer" ]
[transforms.histogram.hooks]
process = """
function (event, emit)
local freq = {}
for _, v in ipairs(event.log._request_time) do
freq[v] = (freq[v] or 0) + 1
end
local sample_rates = {}
local values = {}
for k, v in pairs(freq) do
table.insert(sample_rates, v)
table.insert(values, k + 0.0)
end
emit({
metric = {
name = "request_time",
namespace = "vector_custom_nginx",
tags = {
cdn_upstream = event.log.cdn_upstream,
file = event.log.file,
status = event.log.status,
upstream_addr = event.log.upstream_addr,
upstream_cache_status = event.log.upstream_cache_status,
upstream_status = event.log.upstream_status
},
timestamp = event.log.timestamp,
kind = "incremental",
distribution = {
sample_rates = sample_rates,
values = values,
statistic = "histogram"
}
}
})
end
"""
[transforms.meter]
type = "log_to_metric"
inputs = [ "reducer" ]
[[transforms.meter.metrics]]
field = "_bytes_sent"
name = "bytes_sent"
type = "counter"
increment_by_value = true
namespace = "vector_custom_nginx"
[transforms.meter.metrics.tags]
cdn_upstream = "{{ cdn_upstream }}"
file = "{{ file }}"
status = "{{ status }}"
upstream_addr = "{{ upstream_addr }}"
upstream_cache_status = "{{ upstream_cache_status }}"
upstream_status = "{{ upstream_status }}"
[[transforms.meter.metrics]]
field = "_i"
name = "request"
type = "counter"
increment_by_value = true
namespace = "vector_custom_nginx"
[transforms.meter.metrics.tags]
cdn_upstream = "{{ cdn_upstream }}"
file = "{{ file }}"
status = "{{ status }}"
upstream_addr = "{{ upstream_addr }}"
upstream_cache_status = "{{ upstream_cache_status }}"
upstream_status = "{{ upstream_status }}"
|
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Any news here? |
Beta Was this translation helpful? Give feedback.
-
In answer to the questions the solution you have here is a pretty good one. You could potentially do the whole thing in the lua transform since that transform could maintain the histogram in global memory and only emit once every 10 seconds, but that could get complex. However, the main question is why this is actually having a significant impact on performance. The processing required to reduce the events and generate the histogram shouldn't be less than what the prometheus exporter already does. It is possible you are running into the issue here #10635, which has now been fixed. If so it would be worth trying your original config with the latest version of Vector - either v0.19.2 or v0.20, both are due to be released later today. |
Beta Was this translation helpful? Give feedback.
-
Could be solved with this PR (still open): #8749 |
Beta Was this translation helpful? Give feedback.
In answer to the questions the solution you have here is a pretty good one. You could potentially do the whole thing in the lua transform since that transform could maintain the histogram in global memory and only emit once every 10 seconds, but that could get complex.
However, the main question is why this is actually having a significant impact on performance. The processing required to reduce the events and generate the histogram shouldn't be less than what the prometheus exporter already does.
It is possible you are running into the issue here #10635, which has now been fixed. If so it would be worth trying your original config with the latest version of Vector - either v0.19.2 or v0.…