Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(codec,sources): influxdb line protcol decoder #19637

Merged

Conversation

MichaHoffmann
Copy link
Contributor

@MichaHoffmann MichaHoffmann commented Jan 17, 2024

address #11337

@MichaHoffmann MichaHoffmann force-pushed the mhoffm-influx-line-protocol-decoder branch from a85c402 to 7d57649 Compare January 17, 2024 20:29
@MichaHoffmann MichaHoffmann changed the title WIP: influxdb line protcol decoder feat(codec,sources): influxdb line protcol decoder Jan 17, 2024
@MichaHoffmann MichaHoffmann marked this pull request as ready for review January 18, 2024 08:06
@jszwedko jszwedko requested a review from pront February 6, 2024 18:09
Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @MichaHoffmann, thank you for this PR! I know this is a late response but this fell through the cracks. Left a few comments.

lib/codecs/src/decoding/format/influxdb.rs Show resolved Hide resolved
}
}

#[cfg(test)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test per metric type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on this comment, it should be okay to use the gauge data type for Influx's 'measurement,' right? Since we don't have encoding for InfluxDB output.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we treat other metric types as an error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pront Sorry to get back to you. Do we need that? The format for Influx line protocol is like this and doesn't specify any metric types: cpu,host=A,region=west usage_system=64i,usage_user=10i 1590488773254420000. We're just parsing it into the gauge metric type. Since the input is always like this, does the metric type matter? I fixed the formatting issue. In the upcoming PR, I can add support for InfluxDB encoding so we can specify the metric type tag and support all metric types. Is it okay to add encoding for InfluxDB line protocol?

@pront
Copy link
Member

pront commented Feb 12, 2024

Also, please add a changelog fragment here: https://github.com/vectordotdev/vector/tree/master/changelog.d

@RedForza
Copy link

RedForza commented Mar 20, 2024

Hello! Just stumbled onto this PR. I am very interested into Influx line protocol support for vector and looking at the issues list others are too.

@MichaHoffmann Do you have a timeline when the outstanding comments could be resolved? Appreciate the work done so far.

@RedForza RedForza mentioned this pull request Jul 2, 2024
@MichaHoffmann
Copy link
Contributor Author

Hello! Just stumbled onto this PR. I am very interested into Influx line protocol support for vector and looking at the issues list others are too.

@MichaHoffmann Do you have a timeline when the outstanding comments could be resolved? Appreciate the work done so far.

Ah I didnt continue to work on it because I had the impression that we cannot use kafka as datasource since it forces the type to be "log". Ever since I have changed employers and since this was work related, I didnt think about it much anymore. I dont want to squat this issue either, if someone wants to pick it up, thats would be perfect.

@sebinsunny sebinsunny force-pushed the mhoffm-influx-line-protocol-decoder branch 6 times, most recently from 8d03b7b to 89b6361 Compare July 6, 2024 12:10
@sebinsunny
Copy link
Contributor

Also, please add a changelog fragment here: https://github.com/vectordotdev/vector/tree/master/changelog.d

@pront added the changelog, please review this PR, thanks

@pront pront self-requested a review July 8, 2024 14:24
@sebinsunny sebinsunny force-pushed the mhoffm-influx-line-protocol-decoder branch from 89b6361 to 7431061 Compare July 16, 2024 23:34
Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution!

@jszwedko jszwedko enabled auto-merge July 17, 2024 16:12
auto-merge was automatically disabled July 18, 2024 00:05

Head branch was pushed to by a user without write access

@sebinsunny sebinsunny force-pushed the mhoffm-influx-line-protocol-decoder branch from 7431061 to b929192 Compare July 18, 2024 00:05
@jszwedko jszwedko enabled auto-merge July 18, 2024 14:10
auto-merge was automatically disabled July 19, 2024 02:32

Head branch was pushed to by a user without write access

@sebinsunny sebinsunny force-pushed the mhoffm-influx-line-protocol-decoder branch from b929192 to f607357 Compare July 19, 2024 02:32
@jszwedko
Copy link
Member

@sebinsunny thanks for iterating on this. A small request to try to avoid force pushing. It makes viewing the change history more difficult.

@jszwedko jszwedko enabled auto-merge July 19, 2024 13:30
@jszwedko
Copy link
Member

(the commits will all be squashed during merge)

auto-merge was automatically disabled July 20, 2024 05:25

Head branch was pushed to by a user without write access

@sebinsunny sebinsunny requested review from a team as code owners July 20, 2024 05:25
@github-actions github-actions bot added the domain: external docs Anything related to Vector's external, public documentation label Jul 20, 2024
@sebinsunny
Copy link
Contributor

sebinsunny commented Jul 20, 2024

A small request to try to avoid force pushing. It makes viewing the change history more difficult.

@jszwedko Sorry about this. The pipeline was failing due to missing docs, but it has been fixed now. Hope pipeline will be green now.

@pront
Copy link
Member

pront commented Jul 22, 2024

A small request to try to avoid force pushing. It makes viewing the change history more difficult.

@jszwedko Sorry about this. The pipeline was failing due to missing docs, but it has been fixed now. Hope pipeline will be green now.

👍 re-running CI checks

Copy link
Contributor

@estherk15 estherk15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit: Decodes the raw bytes as an [Influxdb Line Protocol][influxdb] message.

@@ -108,6 +108,11 @@ base: components: sources: amqp: configuration: {
[gelf]: https://docs.graylog.org/docs/gelf
[implementation]: https://github.com/Graylog2/go-gelf/blob/v2/gelf/reader.go
"""
influxdb: """
Decodes the raw bytes as a [Influxdb Line Protocol][influxdb] message.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Decodes the raw bytes as a [Influxdb Line Protocol][influxdb] message.
Decodes the raw bytes as an [Influxdb Line Protocol][influxdb] message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed the typo

@sebinsunny sebinsunny force-pushed the mhoffm-influx-line-protocol-decoder branch from b3ecbde to 29a8e7c Compare July 23, 2024 22:10
@sebinsunny
Copy link
Contributor

sebinsunny commented Jul 23, 2024

forced merge to resolve conflict

@jszwedko jszwedko enabled auto-merge July 23, 2024 22:19
MichaHoffmann and others added 3 commits July 24, 2024 08:48
Signed-off-by: Michael Hoffmann <[email protected]>
Co-authored-by: Sebin Sunny <[email protected]>

* Changelog

Signed-off-by: Sebin Sunny <[email protected]>
auto-merge was automatically disabled July 23, 2024 22:54

Head branch was pushed to by a user without write access

@sebinsunny sebinsunny force-pushed the mhoffm-influx-line-protocol-decoder branch from 29a8e7c to 6302a7b Compare July 23, 2024 22:54
@sebinsunny
Copy link
Contributor

Fixed the conflicts correctly. Earlier, the old reference for DeserializerConfig::Gelf(_) was added in DeserializerConfig::Bytes, which caused the test failure. I have now corrected the conflict.

@jszwedko jszwedko enabled auto-merge July 24, 2024 00:29
@jszwedko jszwedko added this pull request to the merge queue Jul 24, 2024
Copy link

Regression Detector Results

Run ID: f0ecfe7a-d839-4b8d-ade3-2b580a904964 Metrics dashboard

Baseline: 787e1ec
Comparison: edbe7ca

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

Significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

perf experiment goal Δ mean % Δ mean % CI links
splunk_hec_route_s3 ingress throughput -5.57 [-5.88, -5.26]

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf experiment goal Δ mean % Δ mean % CI links
file_to_blackhole egress throughput -7.73 [-14.39, -1.08]

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI links
otlp_http_to_blackhole ingress throughput +2.90 [+2.72, +3.07]
socket_to_socket_blackhole ingress throughput +1.78 [+1.71, +1.85]
datadog_agent_remap_datadog_logs_acks ingress throughput +1.02 [+0.80, +1.23]
otlp_grpc_to_blackhole ingress throughput +0.93 [+0.81, +1.05]
http_to_s3 ingress throughput +0.26 [-0.00, +0.53]
datadog_agent_remap_datadog_logs ingress throughput +0.25 [+0.05, +0.45]
syslog_splunk_hec_logs ingress throughput +0.06 [-0.03, +0.15]
splunk_hec_to_splunk_hec_logs_acks ingress throughput +0.00 [-0.09, +0.10]
splunk_hec_to_splunk_hec_logs_noack ingress throughput -0.02 [-0.12, +0.08]
http_to_http_noack ingress throughput -0.02 [-0.14, +0.10]
splunk_hec_indexer_ack_blackhole ingress throughput -0.03 [-0.12, +0.06]
http_to_http_json ingress throughput -0.12 [-0.19, -0.06]
datadog_agent_remap_blackhole ingress throughput -0.39 [-0.51, -0.28]
syslog_loki ingress throughput -0.74 [-0.82, -0.67]
http_elasticsearch ingress throughput -0.96 [-1.19, -0.73]
http_to_http_acks ingress throughput -1.22 [-2.53, +0.10]
syslog_regex_logs2metric_ddmetrics ingress throughput -1.33 [-1.51, -1.16]
syslog_log2metric_tag_cardinality_limit_blackhole ingress throughput -1.51 [-1.61, -1.42]
fluent_elasticsearch ingress throughput -1.63 [-2.14, -1.13]
http_text_to_http_json ingress throughput -2.02 [-2.17, -1.88]
syslog_log2metric_humio_metrics ingress throughput -2.05 [-2.18, -1.91]
datadog_agent_remap_blackhole_acks ingress throughput -2.94 [-3.05, -2.84]
syslog_humio_logs ingress throughput -3.06 [-3.18, -2.94]
syslog_log2metric_splunk_hec_metrics ingress throughput -3.06 [-3.17, -2.95]
splunk_hec_route_s3 ingress throughput -5.57 [-5.88, -5.26]
file_to_blackhole egress throughput -7.73 [-14.39, -1.08]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 24, 2024
@jszwedko jszwedko added this pull request to the merge queue Jul 24, 2024
Copy link

Regression Detector Results

Run ID: 3d125c8b-0cd2-4ec8-b31f-7357020832ba Metrics dashboard

Baseline: 56bd0dd
Comparison: 3fbb66c

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

No significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf experiment goal Δ mean % Δ mean % CI links
file_to_blackhole egress throughput -14.31 [-20.24, -8.38]

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI links
splunk_hec_route_s3 ingress throughput +3.06 [+2.75, +3.37]
syslog_splunk_hec_logs ingress throughput +3.05 [+2.92, +3.19]
syslog_log2metric_splunk_hec_metrics ingress throughput +1.57 [+1.46, +1.67]
syslog_log2metric_tag_cardinality_limit_blackhole ingress throughput +1.25 [+1.15, +1.34]
datadog_agent_remap_datadog_logs ingress throughput +1.15 [+0.92, +1.38]
http_elasticsearch ingress throughput +0.92 [+0.73, +1.12]
http_text_to_http_json ingress throughput +0.79 [+0.67, +0.91]
http_to_s3 ingress throughput +0.79 [+0.53, +1.05]
fluent_elasticsearch ingress throughput +0.78 [+0.28, +1.29]
splunk_hec_indexer_ack_blackhole ingress throughput +0.06 [-0.02, +0.14]
splunk_hec_to_splunk_hec_logs_noack ingress throughput +0.01 [-0.09, +0.11]
splunk_hec_to_splunk_hec_logs_acks ingress throughput +0.01 [-0.12, +0.13]
http_to_http_noack ingress throughput -0.08 [-0.19, +0.04]
syslog_log2metric_humio_metrics ingress throughput -0.08 [-0.26, +0.10]
http_to_http_json ingress throughput -0.14 [-0.22, -0.07]
otlp_grpc_to_blackhole ingress throughput -0.69 [-0.81, -0.56]
http_to_http_acks ingress throughput -0.75 [-2.07, +0.56]
otlp_http_to_blackhole ingress throughput -0.81 [-0.97, -0.65]
datadog_agent_remap_blackhole_acks ingress throughput -0.90 [-1.00, -0.79]
datadog_agent_remap_blackhole ingress throughput -0.97 [-1.07, -0.87]
syslog_loki ingress throughput -1.35 [-1.43, -1.28]
datadog_agent_remap_datadog_logs_acks ingress throughput -1.38 [-1.55, -1.20]
syslog_humio_logs ingress throughput -2.23 [-2.35, -2.11]
socket_to_socket_blackhole ingress throughput -2.73 [-2.80, -2.66]
syslog_regex_logs2metric_ddmetrics ingress throughput -3.20 [-3.37, -3.03]
file_to_blackhole egress throughput -14.31 [-20.24, -8.38]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

Merged via the queue into vectordotdev:master with commit 3fbb66c Jul 24, 2024
49 checks passed
ym pushed a commit to ym/vector that referenced this pull request Aug 18, 2024
* feat(codec,sources): influxdb line protocol decoder

Signed-off-by: Michael Hoffmann <[email protected]>
Co-authored-by: Sebin Sunny <[email protected]>

* Changelog

Signed-off-by: Sebin Sunny <[email protected]>

* chore(docs): update doc about InfluxDB Decoder

Signed-off-by: Sebin Sunny <[email protected]>

* chore(docs): fix typo about InfluxDB Decoder

Signed-off-by: Sebin Sunny <[email protected]>

---------

Signed-off-by: Sebin Sunny <[email protected]>
Co-authored-by: Sebin Sunny <[email protected]>
AndrooTheChen pushed a commit to discord/vector that referenced this pull request Sep 23, 2024
* feat(codec,sources): influxdb line protocol decoder

Signed-off-by: Michael Hoffmann <[email protected]>
Co-authored-by: Sebin Sunny <[email protected]>

* Changelog

Signed-off-by: Sebin Sunny <[email protected]>

* chore(docs): update doc about InfluxDB Decoder

Signed-off-by: Sebin Sunny <[email protected]>

* chore(docs): fix typo about InfluxDB Decoder

Signed-off-by: Sebin Sunny <[email protected]>

---------

Signed-off-by: Sebin Sunny <[email protected]>
Co-authored-by: Sebin Sunny <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: external docs Anything related to Vector's external, public documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants