Sampler Configuration Setting to Enable Stratified Sampling #20921
Labels
transform: sample
Anything `sample` transform related
type: feature
A value-adding code addition that introduce new functionality.
A note for the community
Use Cases
In many cases, I'd like to sample logs from different services at the same rate, however not every service generates the same volume of logs. As a result, there is no way to guarantee a uniform distribution of logs into the sampler so the rate cannot be applied consistently across the services. The current sampler implementation is relatively straightforward, essentially it maintains a count of events, and using the modulus chooses to emit a log when the count has reached the specified rate and the formula evaluates to zero.
The solution to this is to create a separate sampler for each service (or input stream), but that results in adding many different files that all have essentially the same configuration.
What would be ideal is if I could setup a configuration with an optional
segment_by
key (or some other name), such that a given log with a unique value for the field referenced will maintain its own count, and be sampled independently of logs with different values:Attempted Solutions
No response
Proposal
I am not a Rust expert, but in my view this could essentially be implemented with a hashmap, where each unique value for
segment_by
is a key, with its count as a value. When incrementing the count, similar logic to what exists could be applied by checking if the value for the given key surpasses therate
(i.e. modulo math described earlier here)References
Version
0.39.0
The text was updated successfully, but these errors were encountered: