Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support for Cosine distance in IVF-Flat #179

Merged
merged 22 commits into from
Aug 14, 2024

Conversation

lowener
Copy link
Contributor

@lowener lowener commented Jun 6, 2024

No description provided.

@github-actions github-actions bot added the cpp label Jun 6, 2024
@cjnolet cjnolet added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Jun 26, 2024
@lowener lowener marked this pull request as ready for review July 8, 2024 15:53
@lowener lowener requested a review from a team as a code owner July 8, 2024 15:53
@lowener lowener requested a review from a team as a code owner July 9, 2024 13:51
@github-actions github-actions bot added the Python label Jul 9, 2024
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lowener for this work! It looks good overall, I have left some comments. I still need to take a closer look at the modified distance kernels

cpp/src/cluster/detail/kmeans_balanced.cuh Show resolved Hide resolved
cpp/src/cluster/detail/kmeans_balanced.cuh Outdated Show resolved Hide resolved
cpp/test/neighbors/ann_ivf_flat.cuh Outdated Show resolved Hide resolved
@lowener lowener requested a review from tfeher July 15, 2024 23:07
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lowener for the updates. The PR looks good overall. There might be a potential for improving the norm computations, but that need not stop this PR.

Comment on lines +140 to +143
if constexpr (ComputeNorm) {
norm_query += queryRegs[k] * queryRegs[k];
norm_data += encV[k] * encV[k];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should ask the question whether it is more efficient to compute the norms here or precompute them:

  • we could store norm of database vectors in the index
  • query norm is recalculated every time we process vector in the cluster. I expect that it would be better to use precalculated query norm, but we should confirm that by measurement. Please open an issue to track this, and improve this in a follow-up PR.

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation is looking good. We need to make sure we are testing all cases (for all supported languages) and properly conveying the support metrics to the users (also for all supported languages).

cpp/test/neighbors/ann_ivf_flat.cuh Outdated Show resolved Hide resolved
cpp/src/cluster/detail/kmeans_balanced.cuh Show resolved Hide resolved
@lowener lowener changed the base branch from branch-24.08 to branch-24.10 August 7, 2024 09:51
@lowener lowener requested a review from cjnolet August 8, 2024 16:57
@cjnolet
Copy link
Member

cjnolet commented Aug 14, 2024

/merge

@rapids-bot rapids-bot bot merged commit 829ae16 into rapidsai:branch-24.10 Aug 14, 2024
61 checks passed
@lowener lowener deleted the 24.06-flat-cosine branch August 14, 2024 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpp improvement Improves an existing functionality non-breaking Introduces a non-breaking change Python
Projects
Development

Successfully merging this pull request may close these issues.

3 participants