Skip to content

Implementation of Taily algorithm as described by Aly et al. in the 2013 paper "Taily: shard selection using the tail of score distributions."

License

Notifications You must be signed in to change notification settings

pisa-engine/taily

Repository files navigation

Build Status

This library implements Taily algorithm as described by Aly et al. in the 2013 paper Taily: shard selection using the tail of score distributions.

Disclaimer

At this early stage of development, the library interface is subject to changes. If you rely on it now, I advise to use a specific git tag.

Installation

taily is a header-only library. For now, copy and include include/taily.hpp file.

cmake and conan to come...

Dependencies

Library compiles with GCC >= 4.9 and Clang >= 4, and it requires C++14. The only other dependency is Boost.Math library used for Gamma distribution.

Usage

Chances are you will only need to call one function that scores all shards with respect to one query:

std::vector<double> score_shards(
    const Query_Statistics& global_stats,
    const std::vector<Query_Statistics>& shard_stats,
    const int ntop)

global_stats contains statistics for the entire index, while shard_stats vector represents the shards, and ntop is the parameter of Taily---the number top results for which a score threshold will be estimated.

Query_Statistics is a simple structure that contains the collection size and a vector of of length equal to the number of query terms.

struct Query_Statistics {
    std::vector<Feature_Statistics> term_stats;
    int size;
};

Each element of term_stats contains the values needed for computations:

struct Feature_Statistics {
    double expected_value;
    double variance;
    int frequency;

    template<typename FeatureRange>
    static Feature_Statistics from_features(const FeatureRange& features);

    template<typename ForwardIterator>
    static Feature_Statistics from_features(ForwardIterator first, ForwardIterator last);
};

Generating and Writing Features

In case you want to use this library for storing features as well, you can use the helper functions from_features() to computes statistics:

const std::vector<double>& features = fetch_or_generate_features(term);
auto stats = Feature_Statistics::from_features(features);

or

double* features = fetch_or_generate_features(term);
auto stats = Feature_Statistics::from_features(features, features + len);

The first one takes any forward range, such as std::vector, std::array, that overload std::begin() and std::end() that return a forward iterator of doubles. The latter takes two of such iterators.

About

Implementation of Taily algorithm as described by Aly et al. in the 2013 paper "Taily: shard selection using the tail of score distributions."

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published