-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(codec,sources): influxdb line protcol decoder #19637
feat(codec,sources): influxdb line protcol decoder #19637
Conversation
a85c402
to
7d57649
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @MichaHoffmann, thank you for this PR! I know this is a late response but this fell through the cracks. Left a few comments.
} | ||
} | ||
|
||
#[cfg(test)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test per metric type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on this comment, it should be okay to use the gauge data type for Influx's 'measurement,' right? Since we don't have encoding for InfluxDB output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we treat other metric types as an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pront Sorry to get back to you. Do we need that? The format for Influx line protocol is like this and doesn't specify any metric types: cpu,host=A,region=west usage_system=64i,usage_user=10i 1590488773254420000
. We're just parsing it into the gauge metric type. Since the input is always like this, does the metric type matter? I fixed the formatting issue. In the upcoming PR, I can add support for InfluxDB encoding so we can specify the metric type tag and support all metric types. Is it okay to add encoding for InfluxDB line protocol?
Also, please add a changelog fragment here: https://github.com/vectordotdev/vector/tree/master/changelog.d |
Hello! Just stumbled onto this PR. I am very interested into Influx line protocol support for vector and looking at the issues list others are too. @MichaHoffmann Do you have a timeline when the outstanding comments could be resolved? Appreciate the work done so far. |
Ah I didnt continue to work on it because I had the impression that we cannot use kafka as datasource since it forces the type to be "log". Ever since I have changed employers and since this was work related, I didnt think about it much anymore. I dont want to squat this issue either, if someone wants to pick it up, thats would be perfect. |
8d03b7b
to
89b6361
Compare
@pront added the changelog, please review this PR, thanks |
89b6361
to
7431061
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this contribution!
Head branch was pushed to by a user without write access
7431061
to
b929192
Compare
Head branch was pushed to by a user without write access
b929192
to
f607357
Compare
@sebinsunny thanks for iterating on this. A small request to try to avoid force pushing. It makes viewing the change history more difficult. |
(the commits will all be squashed during merge) |
Head branch was pushed to by a user without write access
@jszwedko Sorry about this. The pipeline was failing due to missing docs, but it has been fixed now. Hope pipeline will be green now. |
👍 re-running CI checks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit: Decodes the raw bytes as an [Influxdb Line Protocol][influxdb] message.
@@ -108,6 +108,11 @@ base: components: sources: amqp: configuration: { | |||
[gelf]: https://docs.graylog.org/docs/gelf | |||
[implementation]: https://github.com/Graylog2/go-gelf/blob/v2/gelf/reader.go | |||
""" | |||
influxdb: """ | |||
Decodes the raw bytes as a [Influxdb Line Protocol][influxdb] message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decodes the raw bytes as a [Influxdb Line Protocol][influxdb] message. | |
Decodes the raw bytes as an [Influxdb Line Protocol][influxdb] message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed the typo
b3ecbde
to
29a8e7c
Compare
forced merge to resolve conflict |
Signed-off-by: Michael Hoffmann <[email protected]> Co-authored-by: Sebin Sunny <[email protected]> * Changelog Signed-off-by: Sebin Sunny <[email protected]>
Signed-off-by: Sebin Sunny <[email protected]>
Signed-off-by: Sebin Sunny <[email protected]>
Head branch was pushed to by a user without write access
29a8e7c
to
6302a7b
Compare
Fixed the conflicts correctly. Earlier, the old reference for DeserializerConfig::Gelf(_) was added in DeserializerConfig::Bytes, which caused the test failure. I have now corrected the conflict. |
Regression Detector ResultsRun ID: f0ecfe7a-d839-4b8d-ade3-2b580a904964 Metrics dashboard Baseline: 787e1ec Performance changes are noted in the perf column of each table:
Significant changes in experiment optimization goalsConfidence level: 90.00%
|
perf | experiment | goal | Δ mean % | Δ mean % CI | links |
---|---|---|---|---|---|
❌ | file_to_blackhole | egress throughput | -7.73 | [-14.39, -1.08] |
Fine details of change detection per experiment
perf | experiment | goal | Δ mean % | Δ mean % CI | links |
---|---|---|---|---|---|
➖ | otlp_http_to_blackhole | ingress throughput | +2.90 | [+2.72, +3.07] | |
➖ | socket_to_socket_blackhole | ingress throughput | +1.78 | [+1.71, +1.85] | |
➖ | datadog_agent_remap_datadog_logs_acks | ingress throughput | +1.02 | [+0.80, +1.23] | |
➖ | otlp_grpc_to_blackhole | ingress throughput | +0.93 | [+0.81, +1.05] | |
➖ | http_to_s3 | ingress throughput | +0.26 | [-0.00, +0.53] | |
➖ | datadog_agent_remap_datadog_logs | ingress throughput | +0.25 | [+0.05, +0.45] | |
➖ | syslog_splunk_hec_logs | ingress throughput | +0.06 | [-0.03, +0.15] | |
➖ | splunk_hec_to_splunk_hec_logs_acks | ingress throughput | +0.00 | [-0.09, +0.10] | |
➖ | splunk_hec_to_splunk_hec_logs_noack | ingress throughput | -0.02 | [-0.12, +0.08] | |
➖ | http_to_http_noack | ingress throughput | -0.02 | [-0.14, +0.10] | |
➖ | splunk_hec_indexer_ack_blackhole | ingress throughput | -0.03 | [-0.12, +0.06] | |
➖ | http_to_http_json | ingress throughput | -0.12 | [-0.19, -0.06] | |
➖ | datadog_agent_remap_blackhole | ingress throughput | -0.39 | [-0.51, -0.28] | |
➖ | syslog_loki | ingress throughput | -0.74 | [-0.82, -0.67] | |
➖ | http_elasticsearch | ingress throughput | -0.96 | [-1.19, -0.73] | |
➖ | http_to_http_acks | ingress throughput | -1.22 | [-2.53, +0.10] | |
➖ | syslog_regex_logs2metric_ddmetrics | ingress throughput | -1.33 | [-1.51, -1.16] | |
➖ | syslog_log2metric_tag_cardinality_limit_blackhole | ingress throughput | -1.51 | [-1.61, -1.42] | |
➖ | fluent_elasticsearch | ingress throughput | -1.63 | [-2.14, -1.13] | |
➖ | http_text_to_http_json | ingress throughput | -2.02 | [-2.17, -1.88] | |
➖ | syslog_log2metric_humio_metrics | ingress throughput | -2.05 | [-2.18, -1.91] | |
➖ | datadog_agent_remap_blackhole_acks | ingress throughput | -2.94 | [-3.05, -2.84] | |
➖ | syslog_humio_logs | ingress throughput | -3.06 | [-3.18, -2.94] | |
➖ | syslog_log2metric_splunk_hec_metrics | ingress throughput | -3.06 | [-3.17, -2.95] | |
❌ | splunk_hec_route_s3 | ingress throughput | -5.57 | [-5.88, -5.26] | |
❌ | file_to_blackhole | egress throughput | -7.73 | [-14.39, -1.08] |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
Regression Detector ResultsRun ID: 3d125c8b-0cd2-4ec8-b31f-7357020832ba Metrics dashboard Baseline: 56bd0dd Performance changes are noted in the perf column of each table:
No significant changes in experiment optimization goalsConfidence level: 90.00% There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.
|
perf | experiment | goal | Δ mean % | Δ mean % CI | links |
---|---|---|---|---|---|
❌ | file_to_blackhole | egress throughput | -14.31 | [-20.24, -8.38] |
Fine details of change detection per experiment
perf | experiment | goal | Δ mean % | Δ mean % CI | links |
---|---|---|---|---|---|
➖ | splunk_hec_route_s3 | ingress throughput | +3.06 | [+2.75, +3.37] | |
➖ | syslog_splunk_hec_logs | ingress throughput | +3.05 | [+2.92, +3.19] | |
➖ | syslog_log2metric_splunk_hec_metrics | ingress throughput | +1.57 | [+1.46, +1.67] | |
➖ | syslog_log2metric_tag_cardinality_limit_blackhole | ingress throughput | +1.25 | [+1.15, +1.34] | |
➖ | datadog_agent_remap_datadog_logs | ingress throughput | +1.15 | [+0.92, +1.38] | |
➖ | http_elasticsearch | ingress throughput | +0.92 | [+0.73, +1.12] | |
➖ | http_text_to_http_json | ingress throughput | +0.79 | [+0.67, +0.91] | |
➖ | http_to_s3 | ingress throughput | +0.79 | [+0.53, +1.05] | |
➖ | fluent_elasticsearch | ingress throughput | +0.78 | [+0.28, +1.29] | |
➖ | splunk_hec_indexer_ack_blackhole | ingress throughput | +0.06 | [-0.02, +0.14] | |
➖ | splunk_hec_to_splunk_hec_logs_noack | ingress throughput | +0.01 | [-0.09, +0.11] | |
➖ | splunk_hec_to_splunk_hec_logs_acks | ingress throughput | +0.01 | [-0.12, +0.13] | |
➖ | http_to_http_noack | ingress throughput | -0.08 | [-0.19, +0.04] | |
➖ | syslog_log2metric_humio_metrics | ingress throughput | -0.08 | [-0.26, +0.10] | |
➖ | http_to_http_json | ingress throughput | -0.14 | [-0.22, -0.07] | |
➖ | otlp_grpc_to_blackhole | ingress throughput | -0.69 | [-0.81, -0.56] | |
➖ | http_to_http_acks | ingress throughput | -0.75 | [-2.07, +0.56] | |
➖ | otlp_http_to_blackhole | ingress throughput | -0.81 | [-0.97, -0.65] | |
➖ | datadog_agent_remap_blackhole_acks | ingress throughput | -0.90 | [-1.00, -0.79] | |
➖ | datadog_agent_remap_blackhole | ingress throughput | -0.97 | [-1.07, -0.87] | |
➖ | syslog_loki | ingress throughput | -1.35 | [-1.43, -1.28] | |
➖ | datadog_agent_remap_datadog_logs_acks | ingress throughput | -1.38 | [-1.55, -1.20] | |
➖ | syslog_humio_logs | ingress throughput | -2.23 | [-2.35, -2.11] | |
➖ | socket_to_socket_blackhole | ingress throughput | -2.73 | [-2.80, -2.66] | |
➖ | syslog_regex_logs2metric_ddmetrics | ingress throughput | -3.20 | [-3.37, -3.03] | |
❌ | file_to_blackhole | egress throughput | -14.31 | [-20.24, -8.38] |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
* feat(codec,sources): influxdb line protocol decoder Signed-off-by: Michael Hoffmann <[email protected]> Co-authored-by: Sebin Sunny <[email protected]> * Changelog Signed-off-by: Sebin Sunny <[email protected]> * chore(docs): update doc about InfluxDB Decoder Signed-off-by: Sebin Sunny <[email protected]> * chore(docs): fix typo about InfluxDB Decoder Signed-off-by: Sebin Sunny <[email protected]> --------- Signed-off-by: Sebin Sunny <[email protected]> Co-authored-by: Sebin Sunny <[email protected]>
* feat(codec,sources): influxdb line protocol decoder Signed-off-by: Michael Hoffmann <[email protected]> Co-authored-by: Sebin Sunny <[email protected]> * Changelog Signed-off-by: Sebin Sunny <[email protected]> * chore(docs): update doc about InfluxDB Decoder Signed-off-by: Sebin Sunny <[email protected]> * chore(docs): fix typo about InfluxDB Decoder Signed-off-by: Sebin Sunny <[email protected]> --------- Signed-off-by: Sebin Sunny <[email protected]> Co-authored-by: Sebin Sunny <[email protected]>
address #11337