Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] UMAP API for building with batched NN Descent #6022

Merged
merged 12 commits into from
Aug 23, 2024

Conversation

jinsolp
Copy link
Contributor

@jinsolp jinsolp commented Aug 14, 2024

Description

adds the following parameters as part of the build_kwds

  • n_clusters: number of clusters to use when batching. Larger number of clusters reduce GPU memory usage. Defaults to 1 (no batch)

Results showing consistent trustworthiness scores for doing/not doing batching.

Also note below that now UMAP can run with datasets that don't fit on the GPU. Putting the dataset on host and enabling the batching method allows UMAP to run with a dataset that is 50M x 768 (153GB).

Screenshot 2024-08-13 at 5 55 27 PM

Notes

This PR in raft needs to be merged before this PR

@jinsolp jinsolp requested review from a team as code owners August 14, 2024 00:57
@github-actions github-actions bot added Cython / Python Cython or Python issue CUDA/C++ labels Aug 14, 2024
@jinsolp jinsolp requested a review from a team as a code owner August 21, 2024 17:15
@github-actions github-actions bot added the CMake label Aug 21, 2024
@jinsolp jinsolp marked this pull request as draft August 21, 2024 17:16
@jinsolp jinsolp added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 21, 2024
@jinsolp
Copy link
Contributor Author

jinsolp commented Aug 21, 2024

Will change get_raft.cmake after the relevant raft PR is merged!

python/cuml/cuml/tests/test_umap.py Outdated Show resolved Hide resolved
python/cuml/cuml/tests/test_umap.py Outdated Show resolved Hide resolved
@@ -82,8 +82,8 @@ endfunction()
# To use a different RAFT locally, set the CMake variable
# CPM_raft_SOURCE=/path/to/local/raft
find_and_configure_raft(VERSION ${CUML_MIN_VERSION_raft}
FORK rapidsai
PINNED_TAG branch-${CUML_BRANCH_VERSION_raft}
FORK jinsolp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder this needs to be reverted before this PR is merged

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! I'll revert these after the raft PR is merged the package is released : )

@github-actions github-actions bot removed the CMake label Aug 23, 2024
@cjnolet
Copy link
Member

cjnolet commented Aug 23, 2024

/merge

@rapids-bot rapids-bot bot merged commit 4587f6a into rapidsai:branch-24.10 Aug 23, 2024
54 checks passed
@jinsolp jinsolp deleted the umap-batch-nnd branch August 23, 2024 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CUDA/C++ Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

3 participants