Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature!: verify and set default VamanaBuildParameters #96

Merged
merged 44 commits into from
Mar 28, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
9702506
feature: add alpha default value set and check in build
yuejiaointel Mar 18, 2025
97a0128
fix: format
yuejiaointel Mar 18, 2025
e5f0cf4
fix: set and check default params in index.h
yuejiaointel Mar 19, 2025
4080533
fix: default test work
yuejiaointel Mar 19, 2025
5d8cf0f
fix: fix all c++ test
yuejiaointel Mar 20, 2025
ffa6bd0
fix: fix index serach test failing by removing reverify functin in apply
yuejiaointel Mar 20, 2025
eb7be5f
fix: revert vamana config.toml
yuejiaointel Mar 20, 2025
c3ec8ba
fix: all test working
yuejiaointel Mar 21, 2025
749bce2
fix: remove unnecessary changes
yuejiaointel Mar 21, 2025
8bd4161
fix: remove unnecessary changes
yuejiaointel Mar 21, 2025
4e774c4
fix:format
yuejiaointel Mar 21, 2025
cc4da27
fix: format
yuejiaointel Mar 21, 2025
100cfef
fix: fix comments
yuejiaointel Mar 24, 2025
ca56187
fix: format
yuejiaointel Mar 24, 2025
1afaee3
fix: comments and alpha default values const
yuejiaointel Mar 24, 2025
f65ca7a
fix: doc update and use constant in tests
yuejiaointel Mar 24, 2025
ed83622
fix: format
yuejiaointel Mar 25, 2025
585cc2f
fix: format
yuejiaointel Mar 25, 2025
b9a0ba3
fix: format
yuejiaointel Mar 25, 2025
0af8c60
fix: remove const in uncompressed.cpp
yuejiaointel Mar 25, 2025
2489f73
fix: remove const in inverted
yuejiaointel Mar 25, 2025
44c0cc1
fix: remove const
yuejiaointel Mar 26, 2025
4ade6f6
fix: same fix
yuejiaointel Mar 26, 2025
e242f1c
Merge branch 'main_bak' into feature_set_default_build_params
yuejiaointel Mar 26, 2025
f4ca13a
fix: rename alpha constant
yuejiaointel Mar 26, 2025
adb379b
fix: fix doc strings
yuejiaointel Mar 26, 2025
d9abbf3
fix: rename parameters constant to vamana specific constant
yuejiaointel Mar 27, 2025
9b68be4
fix: combine lgoic in veryfiy function
yuejiaointel Mar 27, 2025
ef1d863
fix: combine lgoic in veryfiy function
yuejiaointel Mar 27, 2025
76f61d3
fix: update logic again for verify
yuejiaointel Mar 27, 2025
1d35ac4
Apply suggestions from code review
yuejiaointel Mar 28, 2025
382d252
fix: update comment
yuejiaointel Mar 28, 2025
3c8085c
fix: format
yuejiaointel Mar 28, 2025
0793c85
test: test ci failture with extra para
yuejiaointel Mar 28, 2025
47c3297
fix: revert
yuejiaointel Mar 28, 2025
63ebadf
test: ci fail test
yuejiaointel Mar 28, 2025
becd2cd
fix: revert
yuejiaointel Mar 28, 2025
839d3a3
test: ci fail test
yuejiaointel Mar 28, 2025
e146904
test: ci fail test
yuejiaointel Mar 28, 2025
a9b6471
fix: revert
yuejiaointel Mar 28, 2025
5d7d2e0
Use Ubuntu 24.04 in cibuildwheel.yml
mihaic Mar 28, 2025
47d967f
Update pybind11 to 2.11.2
mihaic Mar 28, 2025
dde61e8
Require cmake<4
mihaic Mar 28, 2025
d88d970
fix: revert changes in cibuildwheel and cmakelist
yuejiaointel Mar 28, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion bindings/python/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
requires = [
"setuptools>=42",
"scikit-build",
"cmake>=3.21", # Keep in-sync with `CMakeLists.txt`
"cmake>=3.21, <4", # Keep in-sync with `CMakeLists.txt`
"numpy>=1.10.0, <2", # Keep in-sync with `setup.py`
"archspec>=0.2.0", # Keep in-sync with `setup.py`
"toml>=0.10.2", # Keep in-sync with `setup.py` required for the tests
Expand Down
49 changes: 19 additions & 30 deletions bindings/python/src/vamana.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
#include "svs/lib/dispatcher.h"
#include "svs/lib/float16.h"
#include "svs/lib/meta.h"
#include "svs/lib/preprocessor.h"
#include "svs/orchestrators/vamana.h"

// pybind
Expand Down Expand Up @@ -420,40 +421,22 @@ void wrap(py::module& m) {
size_t window_size,
size_t max_candidate_pool_size,
size_t prune_to,
size_t num_threads) {
if (num_threads != std::numeric_limits<size_t>::max()) {
PyErr_WarnEx(
PyExc_DeprecationWarning,
"Constructing VamanaBuildParameters with the \"num_threads\" "
"keyword "
"argument is deprecated, no longer has any effect, and will be "
"removed "
"from future versions of the library. Use the \"num_threads\" "
"keyword "
"argument of \"svs.Vamana.build\" instead!",
1
);
}

// Default the `prune_to` argument appropriately.
if (prune_to == std::numeric_limits<size_t>::max()) {
prune_to = graph_max_degree;
}

bool use_full_search_history) {
return svs::index::vamana::VamanaBuildParameters{
alpha,
graph_max_degree,
window_size,
max_candidate_pool_size,
prune_to,
true};
use_full_search_history};
}),
py::arg("alpha") = 1.2,
py::arg("graph_max_degree") = 32,
py::arg("window_size") = 64,
py::arg("max_candidate_pool_size") = 80,
py::arg("prune_to") = std::numeric_limits<size_t>::max(),
py::arg("num_threads") = std::numeric_limits<size_t>::max(),
py::arg("alpha") = svs::FLOAT_PLACEHOLDER,
py::arg("graph_max_degree") = svs::VAMANA_GRAPH_MAX_DEGREE_DEFAULT,
py::arg("window_size") = svs::VAMANA_WINDOW_SIZE_DEFAULT,
py::arg("max_candidate_pool_size") = svs::UNSIGNED_INTEGER_PLACEHOLDER,
py::arg("prune_to") = svs::UNSIGNED_INTEGER_PLACEHOLDER,
py::arg("use_full_search_history") =
svs::VAMANA_USE_FULL_SEARCH_HISTORY_DEFAULT,
R"(
Construct a new instance from keyword arguments.

Expand All @@ -462,6 +445,7 @@ void wrap(py::module& m) {
For distance types favoring minimization, set this to a number
greater than 1.0 (typically, 1.2 is sufficient). For distance types
preferring maximization, set to a value less than 1.0 (such as 0.95).
The default value is 1.2 for L2 distance type and 0.95 for MIP/Cosine.
graph_max_degree: The maximum out-degree in the final graph. Graphs with
a higher degree tend to yield better accuracy and performance at the cost
of a larger memory footprint.
Expand All @@ -470,10 +454,15 @@ void wrap(py::module& m) {
longer construction time. Should be larger than `graph_max_degree`.
max_candidate_pool_size: Limit on the number of candidates to consider
for neighbor updates. Should be larger than `window_size`.
The default value is ``graph_max_degree`` * 2.
prune_to: Amount candidate lists will be pruned to when exceeding the
target max degree. In general, setting this to slightly less than
`graph_max_degree` will yield faster index building times. Default:
`graph_max_degree`.
``graph_max_degree`` will yield faster index building times. Default:
` `graph_max_degree`` - 4 if
``graph_max_degree`` is at least 16, otherwise ``graph_max_degree``.
use_full_search_history: When true, uses the full search history during
graph construction, which can improve graph quality at the expense of
additional memory and potentially longer build times.
)"
)
.def_readwrite("alpha", &svs::index::vamana::VamanaBuildParameters::alpha)
Expand Down Expand Up @@ -557,4 +546,4 @@ overwritten when saving the index to this directory.
)"
);
}
} // namespace svs::python::vamana
} // namespace svs::python::vamana
2 changes: 1 addition & 1 deletion bindings/python/tests/test_dynamic_vamana.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ def test_loop(self):
# here, we set an expected mid-point for the recall and allow it to wander up and
# down by a little.
expected_recall = 0.845
expected_recall_delta = 0.03
expected_recall_delta = 0.05

reference = ReferenceDataset(num_threads = num_threads)
data, ids = reference.new_ids(5000)
Expand Down
7 changes: 0 additions & 7 deletions bindings/python/tests/test_vamana.py
Original file line number Diff line number Diff line change
Expand Up @@ -281,13 +281,6 @@ def test_basic(self):
self._test_basic(loader, matcher, first_iter = first_iter)
first_iter = False

def test_deprecation(self):
with warnings.catch_warnings(record = True) as w:
p = svs.VamanaBuildParameters(num_threads = 1)
self.assertTrue(len(w) == 1)
self.assertTrue(issubclass(w[0].category, DeprecationWarning))
self.assertTrue("VamanaBuildParameters" in str(w[0].message))

def _groundtruth_map(self):
return {
svs.DistanceType.L2: test_groundtruth_l2,
Expand Down
15 changes: 8 additions & 7 deletions include/svs/index/vamana/build_params.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
#pragma once

// svs
#include "svs/lib/preprocessor.h"
#include "svs/lib/saveload.h"

// stl
Expand Down Expand Up @@ -44,33 +45,33 @@ struct VamanaBuildParameters {
, use_full_search_history{use_full_search_history_} {}

/// The pruning parameter.
float alpha;
float alpha = svs::FLOAT_PLACEHOLDER;

/// The maximum degree in the graph. A higher max degree may yield a higher quality
/// graph in terms of recall for performance, but the memory footprint of the graph is
/// directly proportional to the maximum degree.
size_t graph_max_degree;
size_t graph_max_degree = svs::VAMANA_GRAPH_MAX_DEGREE_DEFAULT;

/// The search window size to use during graph construction. A higher search window
/// size will yield a higher quality graph since more overall vertices are considered,
/// but will increase construction time.
size_t window_size;
size_t window_size = svs::VAMANA_WINDOW_SIZE_DEFAULT;

/// Set a limit on the number of neighbors considered during pruning. In practice, set
/// this to a high number (at least 5 times greater than the window_size) and forget
/// about it.
size_t max_candidate_pool_size;
size_t max_candidate_pool_size = svs::UNSIGNED_INTEGER_PLACEHOLDER;

/// This is the amount that candidates will be pruned to after certain pruning
/// procedures. Setting this to less than ``graph_max_degree`` can result in significant
/// speedups in index building.
size_t prune_to;
size_t prune_to = svs::UNSIGNED_INTEGER_PLACEHOLDER;

/// When building, either the contents of the search buffer can be used or the entire
/// search history can be used.
///
/// The latter case may yield a slightly better graph as the cost of more search time.
bool use_full_search_history = true;
bool use_full_search_history = svs::VAMANA_USE_FULL_SEARCH_HISTORY_DEFAULT;

///// Comparison
friend bool
Expand Down Expand Up @@ -129,4 +130,4 @@ struct VamanaBuildParameters {
);
}
};
} // namespace svs::index::vamana
} // namespace svs::index::vamana
29 changes: 21 additions & 8 deletions include/svs/index/vamana/dynamic_index.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
#include "svs/index/vamana/index.h"
#include "svs/index/vamana/vamana_build.h"
#include "svs/lib/boundscheck.h"
#include "svs/lib/preprocessor.h"
#include "svs/lib/threads.h"

namespace svs::index::vamana {
Expand Down Expand Up @@ -157,6 +158,9 @@ class MutableVamanaIndex {
float alpha_ = 1.2;
bool use_full_search_history_ = true;

// Construction parameters
VamanaBuildParameters build_parameters_{};

// SVS logger for per index logging
svs::logging::logger_ptr logger_;

Expand Down Expand Up @@ -210,12 +214,19 @@ class MutableVamanaIndex {
, distance_(std::move(distance_function))
, threadpool_(threads::as_threadpool(std::move(threadpool_proto)))
, search_parameters_(vamana::construct_default_search_parameters(data_))
, construction_window_size_(parameters.window_size)
, max_candidates_(parameters.max_candidate_pool_size)
, prune_to_(parameters.prune_to)
, alpha_(parameters.alpha)
, use_full_search_history_{parameters.use_full_search_history}
, build_parameters_(parameters)
, logger_{std::move(logger)} {
// Verify and set defaults directly on the input parameters
verify_and_set_default_index_parameters(build_parameters_, distance_function);

// Set graph again as verify function might change graph_max_degree parameter
graph_ = Graph{data_.size(), build_parameters_.graph_max_degree};
construction_window_size_ = build_parameters_.window_size;
max_candidates_ = build_parameters_.max_candidate_pool_size;
prune_to_ = build_parameters_.prune_to;
alpha_ = build_parameters_.alpha;
use_full_search_history_ = build_parameters_.use_full_search_history;

// Setup the initial translation of external to internal ids.
translator_.insert(external_ids, threads::UnitRange<Idx>(0, external_ids.size()));

Expand All @@ -227,10 +238,12 @@ class MutableVamanaIndex {
auto prefetch_parameters =
GreedySearchPrefetchParameters{sp.prefetch_lookahead_, sp.prefetch_step_};
auto builder = VamanaBuilder(
graph_, data_, distance_, parameters, threadpool_, prefetch_parameters
graph_, data_, distance_, build_parameters_, threadpool_, prefetch_parameters
);
builder.construct(1.0f, entry_point_[0], logging::Level::Info, logger_);
builder.construct(parameters.alpha, entry_point_[0], logging::Level::Info, logger_);
builder.construct(
build_parameters_.alpha, entry_point_[0], logging::Level::Info, logger_
);
}

/// @brief Post re-load constructor.
Expand Down Expand Up @@ -1346,4 +1359,4 @@ auto auto_dynamic_assemble(
std::move(logger)};
}

} // namespace svs::index::vamana
} // namespace svs::index::vamana
69 changes: 64 additions & 5 deletions include/svs/index/vamana/index.h
Original file line number Diff line number Diff line change
Expand Up @@ -404,19 +404,22 @@ class VamanaIndex {
if (graph_.n_nodes() != data_.size()) {
throw ANNEXCEPTION("Wrong sizes!");
}

build_parameters_ = parameters;
// verify the parameters before set local var
verify_and_set_default_index_parameters(build_parameters_, distance_function);
auto builder = VamanaBuilder(
graph_,
data_,
distance_,
parameters,
build_parameters_,
threadpool_,
extensions::estimate_prefetch_parameters(data_)
);

builder.construct(1.0F, entry_point_[0], logging::Level::Info, logger);
builder.construct(parameters.alpha, entry_point_[0], logging::Level::Info, logger);
builder.construct(
build_parameters_.alpha, entry_point_[0], logging::Level::Info, logger
);
}

/// @brief Getter method for logger
Expand Down Expand Up @@ -896,10 +899,13 @@ auto auto_build(
auto entry_point = extensions::compute_entry_point(data, threadpool);

// Default graph.
auto graph = default_graph(data.size(), parameters.graph_max_degree, graph_allocator);
auto verified_parameters = parameters;
verify_and_set_default_index_parameters(verified_parameters, distance);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified parameters here because we are building the graph using parameters.graph_max_degree, but if it is okay to not verify for the graph, I can remove it

auto graph =
default_graph(data.size(), verified_parameters.graph_max_degree, graph_allocator);
using I = typename decltype(graph)::index_type;
return VamanaIndex{
parameters,
verified_parameters,
std::move(graph),
std::move(data),
lib::narrow<I>(entry_point),
Expand Down Expand Up @@ -959,4 +965,57 @@ auto auto_assemble(
index.apply(config);
return index;
}

/// @brief Verify parameters and set defaults if needed
template <typename Dist>
void verify_and_set_default_index_parameters(
VamanaBuildParameters& parameters, Dist distance_function
) {
// Set default values
if (parameters.max_candidate_pool_size == svs::UNSIGNED_INTEGER_PLACEHOLDER) {
parameters.max_candidate_pool_size = 2 * parameters.graph_max_degree;
}

if (parameters.prune_to == svs::UNSIGNED_INTEGER_PLACEHOLDER) {
if (parameters.graph_max_degree >= 16) {
parameters.prune_to = parameters.graph_max_degree - 4;
} else {
parameters.prune_to = parameters.graph_max_degree;
}
}

// Check supported distance type using std::is_same type trait
using dist_type = std::decay_t<decltype(distance_function)>;
// Create type flags for each distance type
constexpr bool is_L2 = std::is_same_v<dist_type, svs::distance::DistanceL2>;
constexpr bool is_IP = std::is_same_v<dist_type, svs::distance::DistanceIP>;
constexpr bool is_Cosine =
std::is_same_v<dist_type, svs::distance::DistanceCosineSimilarity>;

// Handle alpha based on distance type
if constexpr (is_L2) {
if (parameters.alpha == svs::FLOAT_PLACEHOLDER) {
parameters.alpha = svs::VAMANA_ALPHA_MINIMIZE_DEFAULT;
} else if (parameters.alpha < 1.0f) {
// Check User set values
throw std::invalid_argument("For L2 distance, alpha must be >= 1.0");
}
} else if constexpr (is_IP || is_Cosine) {
if (parameters.alpha == svs::FLOAT_PLACEHOLDER) {
parameters.alpha = svs::VAMANA_ALPHA_MAXIMIZE_DEFAULT;
} else if (parameters.alpha > 1.0f) {
// Check User set values
throw std::invalid_argument("For MIP/Cosine distance, alpha must be <= 1.0");
} else if (parameters.alpha <= 0.0f) {
throw std::invalid_argument("alpha must be > 0");
}
} else {
throw std::invalid_argument("Unsupported distance type");
}

// Check prune_to <= graph_max_degree
if (parameters.prune_to > parameters.graph_max_degree) {
throw std::invalid_argument("prune_to must be <= graph_max_degree");
}
}
} // namespace svs::index::vamana
14 changes: 14 additions & 0 deletions include/svs/lib/preprocessor.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@

#pragma once

#include <cstddef>
#include <limits>

namespace svs::preprocessor::detail {

// consteval functions for working with preprocessor defines.
Expand Down Expand Up @@ -159,3 +162,14 @@ inline constexpr bool have_avx512_avx2 = true;
#endif

} // namespace svs::arch

namespace svs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the good work.

Since some of these parameters are vamana-specific, could we move these parameters to svs::index::vamana namespace? Or change the name to Vamana specific.

For example, GRAPH_MAX_DEGREE_DEFAULT -> VAMANA_GRAPH_MAX_DEGREE_DEFAULT.
ALPHA_*, USE_FULL_SEARCH_HISTORY_DEFAULT, WINDOW_SIZE_DEFAULT should also be changed to Vamana specific

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for suggestion, fixed!

// Maximum values used as default initializers
inline constexpr size_t UNSIGNED_INTEGER_PLACEHOLDER = std::numeric_limits<size_t>::max();
inline constexpr float FLOAT_PLACEHOLDER = std::numeric_limits<float>::max();
inline constexpr float VAMANA_GRAPH_MAX_DEGREE_DEFAULT = 32;
inline constexpr float VAMANA_WINDOW_SIZE_DEFAULT = 64;
inline constexpr bool VAMANA_USE_FULL_SEARCH_HISTORY_DEFAULT = true;
inline constexpr float VAMANA_ALPHA_MINIMIZE_DEFAULT = 1.2;
inline constexpr float VAMANA_ALPHA_MAXIMIZE_DEFAULT = 0.95;
} // namespace svs
Loading
Loading