chore(config): Add option to turn missing env vars in config into an error #19393

bruceg · 2023-12-15T02:23:36Z

Using missing environment variables in configurations without a default is a common cause of broken configurations. This starts the process of turning them into a hard error by allowing users to opt into the new behavior.

…error Using missing environment variables in configurations without a default is a common cause of broken configurations. This starts the process of turning them into a hard error by allowing users to opt into the new behavior.

jszwedko

My suggestion would be that we start the deprecation process in the next release, v0.35.0, rather than have it hanging around longer in the intermediate state.

Requesting changes for the documentation.

jszwedko · 2023-12-15T13:21:28Z

docs/DEPRECATIONS.md


 ## To be deprecated

+- v0.37.0 strict_env_vars Add deprecation warning for missing environment variable interpolation.


Suggested change

- v0.37.0 strict_env_vars Add deprecation warning for missing environment variable interpolation.

- v0.35.0 strict_env_vars Add deprecation warning for missing environment variable interpolation.

I think you could deprecate it in the next release if you wanted to. This would mean adding it to the upgrade guide.

Fixed in 93dff07

jszwedko · 2023-12-15T13:25:05Z

src/cli.rs

+    /// warning, which will result in a failure to load any such configuration file. This defaults
+    /// to false, but that default is deprecated and will be changed to strict in future versions.
+    #[arg(long, env = "VECTOR_STRICT_ENV_VARS", default_value = "false")]
+    pub strict_env_vars: bool,


I think this is missing from the manually maintained website/cue/reference/cli.cue

Fixed in c9abff9

hhromic · 2023-12-15T14:46:12Z

The functionality to enforce required env variables is already implemented (I contributed this in the past).
Documentation (see "required variables"): https://vector.dev/docs/reference/configuration/#environment-variables

This PR and the feature implemented would then essentially make all environemnt variables to act as required and the syntax ${ENVVAR:?err} and ${ENVVAR?err} would be obsolete/redundant?

jszwedko · 2023-12-15T14:55:37Z

This PR and the feature implemented would then essentially make all environemnt variables to act as required and the syntax ${ENVVAR:?err} and ${ENVVAR?err} would be obsolete/redundant?

I think this would still be useful syntax for people that want to customize the error messages.

But yes, the intention is to no longer silently (well mostly silently, we do output a warning) replace environment variables with an empty string when they are undefined since this seems to have been a common footgun for people.

hhromic · 2023-12-15T15:03:06Z

Yes, I can see how non-strict interpolation can shoot users in the foot indeed. That was one of the reasons I contributed the syntax to error out in the first place back then :)

Since the extended syntax was implemented back then, we have replaced all variable interpolation in our pipelines to either be ${ENVVAR:-default-value} or ${ENVVAR:?error message}. In this way a blank interpolation is never performed and appropriate default values are used where variables are not strictly required.

But I do agree that mistakes can be made and developers could forget to use :- or :?. In these cases this PR seems like a useful safety protection indeed.

Thanks for clarifying the direction of this feature and that we can keep using the previous syntax! :)

datadog-vectordotdev · 2023-12-21T21:37:57Z

Datadog Report

Branch report: OPW-85-add-strict-env-vars-option
Commit report: 7cda1ec

✅ vector: 0 Failed, 0 New Flaky, 2075 Passed, 0 Skipped, 1m 23.06s Wall Time

jszwedko

Great!

…nv-vars-option

github-actions · 2024-01-03T02:44:19Z

Regression Detector Results

Run ID: 2d8c5e9a-0c62-459e-a420-17b603871f5e
Baseline: 42ad075
Comparison: 7796a79
Total CPUs: 7

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

Significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

perf	experiment	goal	Δ mean %	Δ mean % CI
❌	otlp_http_to_blackhole	ingress throughput	-5.47	[-5.60, -5.34]

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI
➖	http_elasticsearch	ingress throughput	+2.90	[+2.83, +2.97]
➖	splunk_hec_route_s3	ingress throughput	+2.25	[+1.73, +2.78]
➖	http_to_http_acks	ingress throughput	+0.82	[-0.50, +2.13]
➖	http_text_to_http_json	ingress throughput	+0.23	[+0.11, +0.34]
➖	http_to_http_noack	ingress throughput	+0.12	[+0.03, +0.20]
➖	http_to_http_json	ingress throughput	+0.03	[-0.04, +0.10]
➖	splunk_hec_indexer_ack_blackhole	ingress throughput	+0.00	[-0.14, +0.14]
➖	splunk_hec_to_splunk_hec_logs_acks	ingress throughput	+0.00	[-0.14, +0.14]
➖	syslog_log2metric_tag_cardinality_limit_blackhole	ingress throughput	-0.01	[-0.13, +0.11]
➖	splunk_hec_to_splunk_hec_logs_noack	ingress throughput	-0.05	[-0.16, +0.07]
➖	syslog_splunk_hec_logs	ingress throughput	-0.08	[-0.17, +0.01]
➖	file_to_blackhole	egress throughput	-0.08	[-2.61, +2.46]
➖	enterprise_http_to_http	ingress throughput	-0.17	[-0.25, -0.09]
➖	http_to_s3	ingress throughput	-0.26	[-0.53, +0.02]
➖	syslog_humio_logs	ingress throughput	-0.32	[-0.42, -0.23]
➖	datadog_agent_remap_blackhole_acks	ingress throughput	-0.38	[-0.48, -0.28]
➖	datadog_agent_remap_blackhole	ingress throughput	-0.38	[-0.53, -0.24]
➖	datadog_agent_remap_datadog_logs	ingress throughput	-0.38	[-0.48, -0.29]
➖	syslog_log2metric_splunk_hec_metrics	ingress throughput	-0.50	[-0.65, -0.35]
➖	syslog_log2metric_humio_metrics	ingress throughput	-0.62	[-0.77, -0.48]
➖	syslog_loki	ingress throughput	-0.63	[-0.70, -0.57]
➖	datadog_agent_remap_datadog_logs_acks	ingress throughput	-0.67	[-0.76, -0.59]
➖	otlp_grpc_to_blackhole	ingress throughput	-0.76	[-0.85, -0.66]
➖	syslog_regex_logs2metric_ddmetrics	ingress throughput	-0.96	[-1.07, -0.85]
➖	fluent_elasticsearch	ingress throughput	-1.36	[-1.85, -0.88]
➖	socket_to_socket_blackhole	ingress throughput	-3.20	[-3.27, -3.13]
❌	otlp_http_to_blackhole	ingress throughput	-5.47	[-5.60, -5.34]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

github-actions · 2024-01-03T06:17:54Z

Regression Detector Results

Run ID: 703e8bca-86f1-4dce-86ee-7d5e177939cb
Baseline: d73fd6f
Comparison: c37ccdc
Total CPUs: 7

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

Significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

perf	experiment	goal	Δ mean %	Δ mean % CI
❌	file_to_blackhole	egress throughput	-28.68	[-31.29, -26.07]

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI
➖	socket_to_socket_blackhole	ingress throughput	+1.18	[+1.09, +1.26]
➖	otlp_http_to_blackhole	ingress throughput	+0.82	[+0.68, +0.97]
➖	datadog_agent_remap_datadog_logs_acks	ingress throughput	+0.71	[+0.62, +0.79]
➖	http_to_http_acks	ingress throughput	+0.58	[-0.73, +1.89]
➖	fluent_elasticsearch	ingress throughput	+0.43	[-0.05, +0.91]
➖	syslog_log2metric_splunk_hec_metrics	ingress throughput	+0.42	[+0.29, +0.55]
➖	syslog_loki	ingress throughput	+0.24	[+0.18, +0.30]
➖	http_to_http_noack	ingress throughput	+0.19	[+0.10, +0.28]
➖	syslog_log2metric_tag_cardinality_limit_blackhole	ingress throughput	+0.16	[+0.04, +0.28]
➖	otlp_grpc_to_blackhole	ingress throughput	+0.05	[-0.04, +0.14]
➖	http_to_http_json	ingress throughput	+0.03	[-0.04, +0.10]
➖	splunk_hec_indexer_ack_blackhole	ingress throughput	-0.00	[-0.14, +0.14]
➖	splunk_hec_to_splunk_hec_logs_acks	ingress throughput	-0.00	[-0.15, +0.14]
➖	splunk_hec_to_splunk_hec_logs_noack	ingress throughput	-0.03	[-0.15, +0.08]
➖	enterprise_http_to_http	ingress throughput	-0.16	[-0.25, -0.07]
➖	datadog_agent_remap_blackhole	ingress throughput	-0.23	[-0.33, -0.14]
➖	syslog_splunk_hec_logs	ingress throughput	-0.28	[-0.34, -0.21]
➖	http_text_to_http_json	ingress throughput	-0.29	[-0.41, -0.16]
➖	http_to_s3	ingress throughput	-0.37	[-0.65, -0.09]
➖	syslog_humio_logs	ingress throughput	-0.50	[-0.60, -0.40]
➖	http_elasticsearch	ingress throughput	-0.63	[-0.69, -0.57]
➖	datadog_agent_remap_blackhole_acks	ingress throughput	-0.77	[-0.87, -0.67]
➖	datadog_agent_remap_datadog_logs	ingress throughput	-1.22	[-1.32, -1.12]
➖	splunk_hec_route_s3	ingress throughput	-1.35	[-1.86, -0.84]
➖	syslog_log2metric_humio_metrics	ingress throughput	-2.17	[-2.31, -2.03]
➖	syslog_regex_logs2metric_ddmetrics	ingress throughput	-2.76	[-2.88, -2.64]
❌	file_to_blackhole	egress throughput	-28.68	[-31.29, -26.07]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

github-actions · 2024-01-03T06:36:56Z

Regression Detector Results

Run ID: 6507db6a-610a-4898-9d04-17c12fde3d23
Baseline: 0f29afa
Comparison: f466177
Total CPUs: 7

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

No significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI
➖	otlp_http_to_blackhole	ingress throughput	+1.70	[+1.55, +1.85]
➖	file_to_blackhole	egress throughput	+1.61	[-0.86, +4.08]
➖	otlp_grpc_to_blackhole	ingress throughput	+1.01	[+0.91, +1.11]
➖	datadog_agent_remap_datadog_logs_acks	ingress throughput	+0.86	[+0.78, +0.93]
➖	syslog_log2metric_splunk_hec_metrics	ingress throughput	+0.83	[+0.68, +0.97]
➖	fluent_elasticsearch	ingress throughput	+0.68	[+0.20, +1.15]
➖	socket_to_socket_blackhole	ingress throughput	+0.52	[+0.43, +0.62]
➖	datadog_agent_remap_blackhole_acks	ingress throughput	+0.49	[+0.39, +0.58]
➖	http_text_to_http_json	ingress throughput	+0.47	[+0.35, +0.59]
➖	syslog_log2metric_humio_metrics	ingress throughput	+0.30	[+0.16, +0.44]
➖	datadog_agent_remap_blackhole	ingress throughput	+0.20	[+0.12, +0.28]
➖	http_to_http_noack	ingress throughput	+0.11	[+0.02, +0.20]
➖	http_to_http_json	ingress throughput	+0.04	[-0.03, +0.11]
➖	datadog_agent_remap_datadog_logs	ingress throughput	+0.01	[-0.08, +0.11]
➖	splunk_hec_to_splunk_hec_logs_acks	ingress throughput	+0.00	[-0.15, +0.15]
➖	splunk_hec_indexer_ack_blackhole	ingress throughput	-0.00	[-0.14, +0.14]
➖	splunk_hec_to_splunk_hec_logs_noack	ingress throughput	-0.05	[-0.17, +0.06]
➖	enterprise_http_to_http	ingress throughput	-0.07	[-0.12, -0.01]
➖	http_to_s3	ingress throughput	-0.37	[-0.64, -0.09]
➖	syslog_log2metric_tag_cardinality_limit_blackhole	ingress throughput	-0.52	[-0.64, -0.40]
➖	http_elasticsearch	ingress throughput	-0.60	[-0.67, -0.54]
➖	http_to_http_acks	ingress throughput	-0.78	[-2.08, +0.53]
➖	syslog_loki	ingress throughput	-0.91	[-0.95, -0.87]
➖	syslog_humio_logs	ingress throughput	-1.00	[-1.10, -0.89]
➖	syslog_splunk_hec_logs	ingress throughput	-1.98	[-2.04, -1.92]
➖	syslog_regex_logs2metric_ddmetrics	ingress throughput	-2.07	[-2.24, -1.90]
➖	splunk_hec_route_s3	ingress throughput	-2.27	[-2.78, -1.75]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

…error (vectordotdev#19393) * chore(config): Add option to turn missing env vars in config into an error Using missing environment variables in configurations without a default is a common cause of broken configurations. This starts the process of turning them into a hard error by allowing users to opt into the new behavior. * Immediately deprecate warning mode * Add new CLI option to manually-managed docs

bruceg added domain: cli Anything related to Vector's CLI domain: config Anything related to configuring Vector domain: core Anything related to core crates i.e. vector-core, core-common, etc labels Dec 15, 2023

bruceg requested review from jszwedko and a team December 15, 2023 02:23

bruceg requested a review from a team as a code owner December 15, 2023 02:23

github-actions bot removed the domain: core Anything related to core crates i.e. vector-core, core-common, etc label Dec 15, 2023

bruceg force-pushed the OPW-85-add-strict-env-vars-option branch from c8c6795 to ffc2ab5 Compare December 15, 2023 02:45

jszwedko requested changes Dec 15, 2023

View reviewed changes

bruceg added 2 commits December 21, 2023 15:08

Immediately deprecate warning mode

93dff07

Add new CLI option to manually-managed docs

c9abff9

bruceg requested a review from a team as a code owner December 21, 2023 21:12

github-actions bot added the domain: external docs Anything related to Vector's external, public documentation label Dec 21, 2023

drichards-87 approved these changes Dec 21, 2023

View reviewed changes

bruceg requested a review from jszwedko December 21, 2023 21:20

jszwedko approved these changes Dec 22, 2023

View reviewed changes

Merge remote-tracking branch 'origin/master' into OPW-85-add-strict-e…

f1bc70e

…nv-vars-option

bruceg added this pull request to the merge queue Jan 2, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 3, 2024

bruceg added this pull request to the merge queue Jan 3, 2024

Merged via the queue into master with commit f466177 Jan 3, 2024
41 checks passed

bruceg deleted the OPW-85-add-strict-env-vars-option branch January 3, 2024 06:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(config): Add option to turn missing env vars in config into an error #19393

chore(config): Add option to turn missing env vars in config into an error #19393

bruceg commented Dec 15, 2023

jszwedko left a comment

jszwedko Dec 15, 2023

bruceg Dec 21, 2023

jszwedko Dec 15, 2023

bruceg Dec 21, 2023

hhromic commented Dec 15, 2023

jszwedko commented Dec 15, 2023

hhromic commented Dec 15, 2023

datadog-vectordotdev bot commented Dec 21, 2023

jszwedko left a comment

github-actions bot commented Jan 3, 2024

Fine details of change detection per experiment

Explanation

github-actions bot commented Jan 3, 2024

Fine details of change detection per experiment

Explanation

github-actions bot commented Jan 3, 2024

Fine details of change detection per experiment

Explanation


		## To be deprecated

		- v0.37.0 strict_env_vars Add deprecation warning for missing environment variable interpolation.

	- v0.37.0 strict_env_vars Add deprecation warning for missing environment variable interpolation.
	- v0.35.0 strict_env_vars Add deprecation warning for missing environment variable interpolation.

chore(config): Add option to turn missing env vars in config into an error #19393

chore(config): Add option to turn missing env vars in config into an error #19393

Conversation

bruceg commented Dec 15, 2023

jszwedko left a comment

Choose a reason for hiding this comment

jszwedko Dec 15, 2023

Choose a reason for hiding this comment

bruceg Dec 21, 2023

Choose a reason for hiding this comment

jszwedko Dec 15, 2023

Choose a reason for hiding this comment

bruceg Dec 21, 2023

Choose a reason for hiding this comment

hhromic commented Dec 15, 2023

jszwedko commented Dec 15, 2023

hhromic commented Dec 15, 2023

datadog-vectordotdev bot commented Dec 21, 2023

Datadog Report

jszwedko left a comment

Choose a reason for hiding this comment

github-actions bot commented Jan 3, 2024

Regression Detector Results

Significant changes in experiment optimization goals

Fine details of change detection per experiment

Explanation

github-actions bot commented Jan 3, 2024

Regression Detector Results

Significant changes in experiment optimization goals

Fine details of change detection per experiment

Explanation

github-actions bot commented Jan 3, 2024

Regression Detector Results

No significant changes in experiment optimization goals

Fine details of change detection per experiment

Explanation