quantiles - Optimal Quantile Approximation in Streams

This is a translation of TensorFlow's quantile helper class, it aims to compute approximate quantiles with error bound guarantees for weighted data sets. This implementation is an adaptation of techniques from the following papers:

The key ideas at play are the following:

Maintain an in-memory multi-level quantile summary in a way to guarantee a maximum approximation error of eps * W per bucket where W is the total weight across all points in the input dataset.
Two base operations are defined: MERGE and COMPRESS. MERGE combines two summaries guaranteeing a epsNew = max(eps1, eps2). COMPRESS compresses a summary to b + 1 elements guaranteeing epsNew = epsOld + 1/b.
b * sizeof(summary entry) must ideally be small enough to fit in an average CPU L2 cache.
To distribute this algorithm with maintaining error bounds, we need the worker-computed summaries to have no more than eps / h error where h is the height of the distributed computation graph which is 2 for an MR with no combiner.

We mainly want to max out IO bw by ensuring we're not compute-bound and using a reasonable amount of RAM.

Complexity:

Compute: O(n * log(1/eps * log(eps * n))).
Memory: O(1/eps * log^2(eps * n)) <- for one worker streaming through the entire dataset.

An epsilon value of zero would make the algorithm extremely inefficent and therefore, is disallowed.

TODO

Implement an online estimator without the need of finalizing the stream
Add proper documentation
Benchmark
Add serialization

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
demo		demo
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
buffer.go		buffer.go
buffer_test.go		buffer_test.go
stream.go		stream.go
stream_example_test.go		stream_example_test.go
stream_test.go		stream_test.go
summary.go		summary.go
summary_test.go		summary_test.go
utils.go		utils.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

quantiles - Optimal Quantile Approximation in Streams

The key ideas at play are the following:

Complexity:

TODO

About

Releases

Packages

Languages

License

danielcolgan/quantiles

Folders and files

Latest commit

History

Repository files navigation

quantiles - Optimal Quantile Approximation in Streams

The key ideas at play are the following:

Complexity:

TODO

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages