sample
to optionally sample logs randomly
#21393
Labels
transform: sample
Anything `sample` transform related
type: feature
A value-adding code addition that introduce new functionality.
A note for the community
Use Cases
With reference to this code block:
vector/src/transforms/sample/transform.rs
Lines 78 to 99 in edb2242
sample
currently uses a deterministic incremental method over the entire volume of inputs events to determine whether to discard an individual event. This means that a singlesample
component cannot handle several streams of events, especially if they have vastly differing volumes since the largest input stream will overwhelm the others.We would like to use a single
sample
component for every service's logs to keep startup times low which meanssample
would have to sample the logs randomly independently of each other.Attempted Solutions
We've essentially implemented the aforementioned random sampling using a
remap
component that assigns each log ato_be_dropped
attribute based onif random_float(0.0, 1.0) > (1.0/sample_rate)
followed by afilter
with conditionto_bool!(to_be_dropped) == false
.Proposal
Add some
mode
option that is an enumeration defaulting toincremental
(for current behavior) orrandom
for the previously described behavior.References
No response
Version
vector 0.40.0 (x86_64-apple-darwin 1167aa9 2024-07-29 15:08:44.028365803)
The text was updated successfully, but these errors were encountered: