Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spectral Embedding API #5953

Draft
wants to merge 7 commits into
base: branch-24.08
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,321 changes: 1,321 additions & 0 deletions Lanczos.ipynb

Large diffs are not rendered by default.

1,661 changes: 1,661 additions & 0 deletions LanczosTest.ipynb

Large diffs are not rendered by default.

1,259 changes: 1,259 additions & 0 deletions Spectral.ipynb

Large diffs are not rendered by default.

78 changes: 77 additions & 1 deletion cpp/include/cuml/cluster/spectral.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2021, NVIDIA CORPORATION.
* Copyright (c) 2019-2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -16,6 +16,8 @@

#pragma once

#include <stdint.h>

namespace raft {
class handle_t;
}
Expand Down Expand Up @@ -48,5 +50,79 @@ void fit_embedding(const raft::handle_t& handle,
float* out,
unsigned long long seed = 1234567);

/**
* Given a COO formatted (symmetric) knn graph, this function
* computes the spectral embeddings (lowest n_components
* eigenvectors), using Lanczos min cut algorithm.
* @param handle cuml handle
* @param rows source vertices of knn graph (size nnz)
* @param cols destination vertices of knn graph (size nnz)
* @param vals edge weights connecting vertices of knn graph (size nnz)
* @param nnz size of rows/cols/vals
* @param n number of samples in X
* @param n_components the number of components to project the X into
* @param out output array for embedding (size n*n_comonents)
* @param seed random seed to use in both the lanczos solver and k-means
*/
template <typename T>
void lanczos_solver(const raft::handle_t& handle,
int* rows,
int* cols,
T* vals,
int nnz,
int n,
int n_components,
T* eigenvectors,
T* eigenvalues,
T* v0,
int* eig_iters,
unsigned long long seed = 1234567,
int maxiter = 4000,
float tol = 0.01,
int conv_n_iters = 5,
float conv_eps = 0.001,
int restartiter = 15);

struct SpectralParams {};

/**
* @brief Dimensionality reduction via TSNE using Barnes-Hut, Fourier Interpolation, or naive
* methods. or brute force O(N^2).
*
* @param[in] handle The GPU handle.
* @param[in] X The row-major dataset in device memory.
* @param[out] Y The column-major final embedding in device memory
* @param[in] n Number of rows in data X.
* @param[in] p Number of columns in data X.
* @param[in] knn_indices Array containing nearest neighbors indices.
* @param[in] knn_dists Array containing nearest neighbors distances.
* @param[in] params Parameters for TSNE model
* @param[out] kl_div (optional) KL divergence output
*
* The CUDA implementation is derived from the excellent CannyLabs open source
* implementation here: https://github.com/CannyLab/tsne-cuda/. The CannyLabs
* code is licensed according to the conditions in
* cuml/cpp/src/tsne/cannylabs_tsne_license.txt. A full description of their
* approach is available in their article t-SNE-CUDA: GPU-Accelerated t-SNE and
* its Applications to Modern Data (https://arxiv.org/abs/1807.11824).
*/
void spectral_fit(const raft::handle_t& handle,
float* X,
float* Y,
int n,
int p,
int* knn_indices,
int* knn_rows,
float* knn_dists,
int* a_knn_indices,
int* a_knn_rows,
float* a_knn_dists,
int num_neighbors,
int* rows,
int* cols,
float* vals,
int nnz,
int n_components);

} // namespace Spectral
} // namespace ML
Loading
Loading