Fix UMAP issues with large inputs #6245

viclafargue · 2025-01-22T13:29:16Z

Answers #6204

wphicks

Looking good! Let's just make sure to IWYU for the new uses of uint64_t. Using C++ types (std::uint64_t) in our non-CUDA code would be a bonus, but it shouldn't block merge. I've also called out some spots where we could use uniform initialization syntax rather than a bare cast.

cpp/include/cuml/common/callback.hpp

cpp/include/cuml/manifold/common.hpp

cpp/src/umap/fuzzy_simpl_set/naive.cuh

cpp/src/tsne/tsne_runner.cuh

cpp/src/umap/fuzzy_simpl_set/naive.cuh

cpp/src/umap/knn_graph/algo.cuh

cpp/src/umap/simpl_set_embed/algo.cuh

copy-pr-bot · 2025-01-23T19:12:43Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

wphicks · 2025-01-23T19:33:36Z

@viclafargue That last commit was unsigned. Could you sign it and push that up?

cjnolet

@viclafargue and I discussed this briefly last week, but given the nature of the 64-bit hardcoded changes here, I would like to see at least a small benchmark before this is merged so that we can feel comfortable that this doesn’t have a huge impact on the runtime.

divyegala · 2025-02-03T19:04:42Z

/ok to test

divyegala · 2025-02-03T19:07:26Z

/ok to test

divyegala · 2025-02-03T20:23:08Z

/ok to test

divyegala · 2025-02-04T02:11:54Z

/ok to test

wphicks

Looks pretty good. The one change that I'm concerned about is with the blocks calculation. Let me know if I'm thinking about that one wrong.

wphicks · 2025-02-05T19:18:52Z

cpp/cmake/thirdparty/get_cuvs.cmake

@@ -73,8 +73,8 @@ endfunction()
 # To use a different CUVS locally, set the CMake variable
 # CPM_cuvs_SOURCE=/path/to/local/cuvs
 find_and_configure_cuvs(VERSION          ${CUML_MIN_VERSION_cuvs}
-      FORK             rapidsai
-      PINNED_TAG       branch-${CUML_BRANCH_VERSION_cuvs}
+      FORK             divyegala


Just noting that we'll need to make sure to switch these back once the associated PRs have been merged so we don't forget.

wphicks · 2025-02-05T19:42:51Z

cpp/src/tsne/tsne_runner.cuh

@@ -167,7 +168,7 @@ class TSNE_runner {
  {
    distance_and_perplexity();

-    const auto NNZ  = COO_Matrix.nnz;
+    const auto NNZ  = (value_idx)COO_Matrix.nnz;


Suggested change

const auto NNZ = (value_idx)COO_Matrix.nnz;

const auto NNZ = value_idx(COO_Matrix.nnz);

Let's not let it prevent merge, but if we need to make further changes, it would be good to avoid raw casts.

wphicks · 2025-02-05T19:45:54Z

cpp/src/umap/fuzzy_simpl_set/naive.cuh

@@ -328,7 +331,8 @@ void launcher(int n,
   * Compute graph of membership strengths
   */

-  dim3 grid_elm(raft::ceildiv(n * n_neighbors, TPB_X), 1, 1);
+  uint64_t to_process = static_cast<uint64_t>(in.n_rows) * n_neighbors;
+  dim3 grid_elm(raft::ceildiv(to_process, TPB_X), 1, 1);


I believer there's a chance for an overflow here. If to_process is large enough, we'd get something that would overflow a 32-bit integer, right? We also probably want to limit this to the maximum number of blocks (65535) anyway.

csadorf · 2025-02-05T19:46:07Z

dependencies.yaml

-          - &treelite treelite==4.4.1
+          - &treelite treelite==4.3.0


Why do we have to downgrade treelite?

This is a bad merge from a missing commit in a forward merger when switching target branch from 25.02 to 25.04.

csadorf · 2025-02-05T19:48:19Z

cpp/cmake/thirdparty/get_cuvs.cmake

+      FORK             divyegala
+      PINNED_TAG       raft-sparse-updates


Reminder to revert this before merge.

csadorf · 2025-02-05T19:48:34Z

cpp/cmake/thirdparty/get_raft.cmake

+      FORK                     viclafargue
+      PINNED_TAG               fix-sparse-utilities


Reminder to revert this before merge.

csadorf · 2025-02-05T19:48:51Z

cpp/cmake/thirdparty/get_treelite.cmake

+find_and_configure_treelite(VERSION     4.3.0
+                        PINNED_TAG  575e4208f2b18e40d818c338ecb95d7a26e69aab


What is the motivation for the downgrade?

csadorf · 2025-02-05T19:54:34Z

cpp/src/umap/simpl_set_embed/optimize_batch_kernel.cuh

@@ -348,16 +350,15 @@ CUML_KERNEL void optimize_batch_kernel(T const* head_embedding,
 * @param rounding:    Floating rounding factor used to truncate the gradient update for
 *                     deterministic result.
 */
-template <typename T, int TPB_X>
+template <typename T, uint64_t TPB_X>


TPB_X is hard-coded to 256, no? Does it make sense to switch its type to uint64_t ?

csadorf · 2025-02-05T20:04:01Z

cpp/src/umap/runner.cuh


  raft::sparse::convert::sorted_coo_to_csr(&graph_coo, row_ind.data(), stream);
-  raft::sparse::linalg::coo_degree(&graph_coo, ia.data(), stream);


So we never needed the degree here in the first place?

csadorf · 2025-02-05T20:11:06Z

cpp/src/umap/simpl_set_embed/algo.cuh

@@ -252,14 +252,12 @@ void optimize_layout(T* head_embedding,

  T rounding = create_gradient_rounding_factor<T>(head, nnz, head_n, alpha, stream_view);

-  MLCommon::FastIntDiv tail_n_fast(tail_n);


This and similar changes might lead to a performance regression.

Couldn't agree more. This is there for a reason. If we need to, we should be allowing a 64-bit int div instead of removing this altogether.

csadorf · 2025-02-05T20:11:45Z

cpp/src/umap/simpl_set_embed/runner.cuh

+#include <stdint.h>
+


This isn't needed as far as I can tell.

cjnolet · 2025-02-05T20:13:24Z

cpp/src/umap/runner.cuh

@@ -425,7 +427,7 @@ void _transform(const raft::handle_t& handle,
   * Compute graph of membership strengths
   */

-  int nnz = inputs.n * params->n_neighbors;
+  uint64_t nnz = uint64_t{inputs.n} * params->n_neighbors;


Please don't hardcode these types. Create an nnz_t.

cjnolet · 2025-02-05T20:13:50Z

cpp/src/umap/init_embed/spectral_algo.cuh

@@ -44,15 +46,16 @@ using namespace ML;
 */
 template <typename T>
 void launcher(const raft::handle_t& handle,
-              int n,
+              uint64_t n,


Use idx_t

cjnolet · 2025-02-05T20:14:12Z

cpp/src/umap/simpl_set_embed/algo.cuh

@@ -59,7 +58,8 @@ using namespace ML;
 * @param stream cuda stream
 */
 template <typename T>
-void make_epochs_per_sample(T* weights, int weights_n, int n_epochs, T* result, cudaStream_t stream)
+void make_epochs_per_sample(
+  T* weights, uint64_t weights_n, int n_epochs, T* result, cudaStream_t stream)


Use an idx_t and switch that for uint64_t

cjnolet · 2025-02-05T20:14:35Z

cpp/src/umap/simpl_set_embed/algo.cuh

@@ -170,7 +169,7 @@ T create_rounding_factor(T max_abs, int n)

 template <typename T>
 T create_gradient_rounding_factor(
-  const int* head, int nnz, int n_samples, T alpha, rmm::cuda_stream_view stream)
+  const int* head, uint64_t nnz, int n_samples, T alpha, rmm::cuda_stream_view stream)


Creare an nnz_t please

cjnolet · 2025-02-05T20:14:42Z

cpp/src/umap/simpl_set_embed/algo.cuh

@@ -195,14 +194,14 @@ T create_gradient_rounding_factor(
 * positive weights (neighbors in the 1-skeleton) and repelling
 * negative weights (non-neighbors in the 1-skeleton).
 */
-template <int TPB_X, typename T>
+template <uint64_t TPB_X, typename T>


cjnolet · 2025-02-05T20:14:51Z

cpp/src/umap/simpl_set_embed/algo.cuh

@@ -301,7 +299,7 @@ template <int TPB_X, typename T>
 void launcher(
  int m, int n, raft::sparse::COO<T>* in, UMAPParams* params, T* embedding, cudaStream_t stream)
 {
-  int nnz = in->nnz;
+  uint64_t nnz = in->nnz;


cjnolet · 2025-02-05T20:15:13Z

cpp/src/umap/simpl_set_embed/optimize_batch_kernel.cuh

@@ -96,16 +98,15 @@ DI T truncate_gradient(T const rounding_factor, T const x)
  return (rounding_factor + x) - rounding_factor;
 }

-template <typename T, int TPB_X, int n_components>
+template <typename T, uint64_t TPB_X, uint64_t n_components>


Was this really needed?

cjnolet · 2025-02-05T20:15:27Z

cpp/src/umap/simpl_set_embed/optimize_batch_kernel.cuh

                                           T const* tail_embedding,
                                           T* tail_buffer,
-                                           const MLCommon::FastIntDiv tail_n,


Please don't remove FastIntDiv. Make this support uint64_t if needed.

cjnolet · 2025-02-05T20:15:38Z

cpp/src/umap/simpl_set_embed/optimize_batch_kernel.cuh

                                           const int* head,
                                           const int* tail,
-                                           int nnz,
+                                           uint64_t nnz,


cjnolet · 2025-02-05T20:15:48Z

cpp/src/umap/simpl_set_embed/optimize_batch_kernel.cuh

    gen.next(r);
-    int t                    = r % tail_n;
+    uint64_t t               = r % tail_n;


define this in a template, please. THis is perf critical.

cjnolet · 2025-02-05T20:16:45Z

cpp/src/umap/simpl_set_embed/optimize_batch_kernel.cuh

-  T const* other   = tail_embedding + (k * params.n_components);
+  uint64_t n_components = params.n_components;
+
+  uint64_t j       = head[row];


Pleae define a template for this.

cjnolet · 2025-02-05T20:16:59Z

cpp/src/umap/simpl_set_embed/optimize_batch_kernel.cuh

-    int t                    = r % tail_n;
-    T const* negative_sample = tail_embedding + (t * params.n_components);
-    dist_squared             = rdist<T>(current, negative_sample, params.n_components);
+    uint64_t t               = r % tail_n;


cjnolet

Same issues as raft, but this is hardcoding types in perf critical code. Please use templates- it allows us to quickly switch.

Fix UMAP issues with large inputs

58dea2c

viclafargue requested a review from a team as a code owner January 22, 2025 13:29

viclafargue requested review from bdice and lowener January 22, 2025 13:29

github-actions bot added the CUDA/C++ label Jan 22, 2025

viclafargue added bug Something isn't working non-breaking Non-breaking change labels Jan 22, 2025

viclafargue mentioned this pull request Jan 22, 2025

[BUG] Batched nn-descent UMAP unexpectedly throws OOM error on dataset that should succeed with UVM #6204

Open

viclafargue added 2 commits January 22, 2025 16:28

Re-enable coo_sort before removing zeroes

ea9a476

updates

c8db94b

wphicks requested changes Jan 23, 2025

View reviewed changes

fix issue

fb26681

viclafargue added 2 commits January 24, 2025 11:31

answering review

038f31e

fix small issue

fb70daf

wphicks approved these changes Jan 24, 2025

View reviewed changes

viclafargue added 3 commits January 24, 2025 19:32

typos

ca86392

typos

72a83ab

compilation fix

a7134dd

cjnolet requested changes Jan 27, 2025

View reviewed changes

viclafargue and others added 3 commits January 29, 2025 19:38

changes so far

9abe842

completing change

6ff906b

point to raft and cuvs branch

5650fc1

divyegala requested a review from a team as a code owner February 3, 2025 18:42

divyegala requested a review from vyasr February 3, 2025 18:42

github-actions bot added the CMake label Feb 3, 2025

Merge branch 'branch-25.02' into fix-umap-large-inputs

2e2bfbf

divyegala changed the base branch from branch-25.02 to branch-25.04 February 3, 2025 18:43

divyegala requested a review from a team as a code owner February 3, 2025 18:43

divyegala added 2 commits February 3, 2025 19:06

fix cmake args in get_cuvs

7a7229b

fix cmake args in get_cuvs

2c46ad1

bump

c51a1e9

divyegala added 2 commits February 4, 2025 02:01

bump

53d276c

Merge branch 'branch-25.04' into fix-umap-large-inputs

0aaeac8

viclafargue requested review from a team as code owners February 5, 2025 18:30

github-actions bot added the Cython / Python Cython or Python issue label Feb 5, 2025

Merge branch-25.04

f7ce445

viclafargue force-pushed the fix-umap-large-inputs branch from 288d129 to f7ce445 Compare February 5, 2025 18:43

github-actions bot added the conda conda issue label Feb 5, 2025

wphicks requested changes Feb 5, 2025

View reviewed changes

csadorf reviewed Feb 5, 2025

View reviewed changes

cjnolet reviewed Feb 5, 2025

View reviewed changes

cpp/src/umap/simpl_set_embed/optimize_batch_kernel.cuh

const int* head,

const int* tail,

int nnz,

uint64_t nnz,

Copy link

Member

cjnolet Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use nnz_t

cjnolet reviewed Feb 5, 2025

View reviewed changes

cjnolet requested changes Feb 5, 2025

View reviewed changes

	const auto NNZ = (value_idx)COO_Matrix.nnz;
	const auto NNZ = value_idx(COO_Matrix.nnz);

		find_and_configure_treelite(VERSION 4.3.0
		PINNED_TAG 575e4208f2b18e40d818c338ecb95d7a26e69aab


		raft::sparse::convert::sorted_coo_to_csr(&graph_coo, row_ind.data(), stream);
		raft::sparse::linalg::coo_degree(&graph_coo, ia.data(), stream);

		@@ -252,14 +252,12 @@ void optimize_layout(T* head_embedding,

		T rounding = create_gradient_rounding_factor<T>(head, nnz, head_n, alpha, stream_view);

		MLCommon::FastIntDiv tail_n_fast(tail_n);

		#include <stdint.h>

Fix UMAP issues with large inputs #6245

Are you sure you want to change the base?

Fix UMAP issues with large inputs #6245

Conversation

viclafargue commented Jan 22, 2025

wphicks left a comment

Choose a reason for hiding this comment

copy-pr-bot bot commented Jan 23, 2025

wphicks commented Jan 23, 2025

cjnolet left a comment

Choose a reason for hiding this comment

divyegala commented Feb 3, 2025

divyegala commented Feb 3, 2025

divyegala commented Feb 3, 2025

divyegala commented Feb 4, 2025

wphicks left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjnolet Feb 5, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjnolet left a comment

Choose a reason for hiding this comment

cjnolet Feb 5, 2025 •

edited

Loading