[FEA] UMAP API for building with batched NN Descent #6022

jinsolp · 2024-08-14T00:57:26Z

Description

adds the following parameters as part of the build_kwds

n_clusters: number of clusters to use when batching. Larger number of clusters reduce GPU memory usage. Defaults to 1 (no batch)

Results showing consistent trustworthiness scores for doing/not doing batching.

Also note below that now UMAP can run with datasets that don't fit on the GPU. Putting the dataset on host and enabling the batching method allows UMAP to run with a dataset that is 50M x 768 (153GB).

Notes

This PR in raft needs to be merged before this PR

jinsolp · 2024-08-21T22:10:53Z

Will change get_raft.cmake after the relevant raft PR is merged!

python/cuml/cuml/tests/test_umap.py

cjnolet · 2024-08-22T22:51:39Z

cpp/cmake/thirdparty/get_raft.cmake

@@ -82,8 +82,8 @@ endfunction()
 # To use a different RAFT locally, set the CMake variable
 # CPM_raft_SOURCE=/path/to/local/raft
 find_and_configure_raft(VERSION          ${CUML_MIN_VERSION_raft}
-      FORK             rapidsai
-      PINNED_TAG       branch-${CUML_BRANCH_VERSION_raft}
+      FORK             jinsolp


Just a reminder this needs to be reverted before this PR is merged

Yep! I'll revert these after the raft PR is merged the package is released : )

cjnolet · 2024-08-23T00:22:37Z

/merge

umap api for batch nnd

36ab624

jinsolp requested review from a team as code owners August 14, 2024 00:57

github-actions bot added Cython / Python Cython or Python issue CUDA/C++ labels Aug 14, 2024

jinsolp added 3 commits August 14, 2024 03:33

add batch nnd umap tests

fa13f0d

add documentation for build kwds

3443d94

building with custom raft PR

2cb4623

jinsolp requested a review from a team as a code owner August 21, 2024 17:15

github-actions bot added the CMake label Aug 21, 2024

jinsolp marked this pull request as draft August 21, 2024 17:16

Merge branch 'rapidsai:branch-24.10' into umap-batch-nnd

b9b9dd6

jinsolp added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 21, 2024

running with digits

ecd8420

dantegd requested changes Aug 21, 2024

View reviewed changes

python/cuml/cuml/tests/test_umap.py Outdated Show resolved Hide resolved

python/cuml/cuml/tests/test_umap.py Outdated Show resolved Hide resolved

jinsolp and others added 3 commits August 21, 2024 23:10

error handling

67d88e1

remove do_batch

5e2be1f

Merge branch 'rapidsai:branch-24.10' into umap-batch-nnd

591c04f

jinsolp requested a review from dantegd August 22, 2024 19:32

jinsolp marked this pull request as ready for review August 22, 2024 21:56

jinsolp mentioned this pull request Aug 22, 2024

[FEA] Enable HDBSCAN to build knn graph using NN Descent #5939

Open

cjnolet assigned jinsolp Aug 22, 2024

cjnolet approved these changes Aug 22, 2024

View reviewed changes

dantegd approved these changes Aug 23, 2024

View reviewed changes

Revert fork and pinned tag

4fdd2f2

github-actions bot removed the CMake label Aug 23, 2024

Trigger CI

243a2ff

Trigger CI

57adba7

rapids-bot bot merged commit 4587f6a into rapidsai:branch-24.10 Aug 23, 2024
54 checks passed

jinsolp deleted the umap-batch-nnd branch August 23, 2024 17:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] UMAP API for building with batched NN Descent #6022

[FEA] UMAP API for building with batched NN Descent #6022

jinsolp commented Aug 14, 2024 •

edited

Loading

jinsolp commented Aug 21, 2024

cjnolet Aug 22, 2024

jinsolp Aug 22, 2024

cjnolet commented Aug 23, 2024

[FEA] UMAP API for building with batched NN Descent #6022

[FEA] UMAP API for building with batched NN Descent #6022

Conversation

jinsolp commented Aug 14, 2024 • edited Loading

Description

Notes

jinsolp commented Aug 21, 2024

cjnolet Aug 22, 2024

Choose a reason for hiding this comment

jinsolp Aug 22, 2024

Choose a reason for hiding this comment

cjnolet commented Aug 23, 2024

jinsolp commented Aug 14, 2024 •

edited

Loading