-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WiP] SRHTs #122
base: main
Are you sure you want to change the base?
[WiP] SRHTs #122
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's exciting to see progress here!
I left several comments. Please resolve them, but then take a step back. Part of what you're doing is adding FHT support. This already takes effort without thinking about randomization.
Building up your suite of FHT kernels
Make a folder RandBLAS/trig/
and a file RandBLAS/trig/hadamard.hh
. Have your various FHT implementations go here. You already have one for applying to column-major data from the left and storing the result in a column-major sketch. You could easily write implementation that takes in row-major data and writes to a row-major sketch (although the only way you can parallelize this is with BLAS1, I think).
It would be great to have implementations for both the Hadamard transform itself and for the transpose of the Hadamard transform (equivalently, the inverse of the Hadamard transform). If you do this then w.l.o.g. you can always assume you're applying the transformation from the left.
Once you have those functions working, add in the ability to implicitly scale the rows of the input matrix by a vector of coefficients. In sketching we end up setting this vector to a Rademacher random vector, but from an implementation standpoint these functions don't need to care where the vector comes from.
Once you have those cases sorted out, you can allow conflicting layouts for the input matrix and the output matrix (i.e., input is row-major and output is column-major). This is the trick I use for resolving transposition in the sparse-times-dense matrix kernels.
Note: if you feel like you need to allocate a temporary matrix for workspace in order to do anything useful, you can definitely try that.
Writing tests
Make a folder test/test_matmul_cores/test_trig/
and a file test/test_matmul_cores/test_trig/test_hadamard.cc
. This will handle tests only for your FHT.
You can take some inspiration from
RandBLAS/test/test_matmul_wrappers/test_sketch_sparse.cc
Lines 53 to 190 in 36da117
// Adapted from test::linop_common::test_left_apply_transpose_to_eye. | |
template <typename T, typename DenseSkOp, SparseMatrix SpMat = COOMatrix<T,int64_t>> | |
void test_left_transposed_sketch_of_eye( | |
// B = S^T * eye, where S is m-by-d, B is d-by-m | |
DenseSkOp &S, Layout layout | |
) { | |
auto [m, d] = dimensions(S); | |
auto I = eye<SpMat>(m); | |
std::vector<T> B(d * m, 0.0); | |
bool is_colmajor = (Layout::ColMajor == layout); | |
int64_t ldb = (is_colmajor) ? d : m; | |
int64_t lds = (is_colmajor) ? m : d; | |
lsksp3( | |
layout, Op::Trans, Op::NoTrans, d, m, m, | |
(T) 1.0, S, 0, 0, I, 0, 0, (T) 0.0, B.data(), ldb | |
); | |
std::vector<T> S_dense(m * d, 0.0); | |
to_explicit_buffer(S, S_dense.data(), layout); | |
test::comparison::matrices_approx_equal( | |
layout, Op::Trans, d, m, | |
B.data(), ldb, S_dense.data(), lds, | |
__PRETTY_FUNCTION__, __FILE__, __LINE__ | |
); | |
} | |
// Adapted from test::linop_common::test_left_apply_submatrix_to_eye. | |
template <typename T, typename DenseSkOp, SparseMatrix SpMat = COOMatrix<T,int64_t>> | |
void test_left_submat_sketch_of_eye( | |
// B = alpha * submat(S0) * eye + beta*B, where S = submat(S) is d1-by-m1 offset by (S_ro, S_co) in S0, and B is random. | |
T alpha, DenseSkOp &S0, int64_t d1, int64_t m1, int64_t S_ro, int64_t S_co, Layout layout, T beta = 0.0 | |
) { | |
auto [d0, m0] = dimensions(S0); | |
randblas_require(d0 >= d1); | |
randblas_require(m0 >= m1); | |
bool is_colmajor = layout == Layout::ColMajor; | |
int64_t ldb = (is_colmajor) ? d1 : m1; | |
// define a matrix to be sketched, and create workspace for sketch. | |
auto I = eye<SpMat>(m1); | |
auto B = std::get<0>(random_matrix<T>(d1, m1, RNGState(42))); | |
std::vector<T> B_backup(B); | |
// Perform the sketch | |
lsksp3( | |
layout, Op::NoTrans, Op::NoTrans, d1, m1, m1, | |
alpha, S0, S_ro, S_co, I, 0, 0, beta, B.data(), ldb | |
); | |
// Check the result | |
T *expect = new T[d0 * m0]; | |
to_explicit_buffer(S0, expect, layout); | |
int64_t ld_expect = (is_colmajor) ? d0 : m0; | |
auto [row_stride_s, col_stride_s] = layout_to_strides(layout, ld_expect); | |
auto [row_stride_b, col_stride_b] = layout_to_strides(layout, ldb); | |
int64_t offset = row_stride_s * S_ro + col_stride_s * S_co; | |
#define MAT_E(_i, _j) expect[offset + (_i)*row_stride_s + (_j)*col_stride_s] | |
#define MAT_B(_i, _j) B_backup[ (_i)*row_stride_b + (_j)*col_stride_b] | |
for (int i = 0; i < d1; ++i) { | |
for (int j = 0; j < m1; ++j) { | |
MAT_E(i,j) = alpha * MAT_E(i,j) + beta * MAT_B(i, j); | |
} | |
} | |
test::comparison::matrices_approx_equal( | |
layout, Op::NoTrans, | |
d1, m1, | |
B.data(), ldb, | |
&expect[offset], ld_expect, | |
__PRETTY_FUNCTION__, __FILE__, __LINE__ | |
); | |
delete [] expect; | |
} | |
// Adapted from test::linop_common::test_right_apply_transpose_to_eye. | |
template <typename T, typename DenseSkOp, SparseMatrix SpMat = COOMatrix<T,int64_t>> | |
void test_right_transposed_sketch_of_eye( | |
// B = eye * S^T, where S is d-by-n, so eye is order n and B is n-by-d | |
DenseSkOp &S, Layout layout | |
) { | |
auto [d, n] = dimensions(S); | |
auto I = eye<SpMat>(n); | |
std::vector<T> B(n * d, 0.0); | |
bool is_colmajor = Layout::ColMajor == layout; | |
int64_t ldb = (is_colmajor) ? n : d; | |
int64_t lds = (is_colmajor) ? d : n; | |
rsksp3(layout, Op::NoTrans, Op::Trans, n, d, n, (T) 1.0, I, 0, 0, S, 0, 0, (T) 0.0, B.data(), ldb); | |
std::vector<T> S_dense(n * d, 0.0); | |
to_explicit_buffer(S, S_dense.data(), layout); | |
test::comparison::matrices_approx_equal( | |
layout, Op::Trans, n, d, | |
B.data(), ldb, S_dense.data(), lds, | |
__PRETTY_FUNCTION__, __FILE__, __LINE__ | |
); | |
} | |
// Adapted from test::linop_common::test_right_apply_submatrix_to_eye. | |
template <typename T, typename DenseSkOp, SparseMatrix SpMat = COOMatrix<T,int64_t>> | |
void test_right_submat_sketch_of_eye( | |
// B = alpha * eye * submat(S) + beta*B : submat(S) is n-by-d, eye is n-by-n, B is n-by-d and random | |
T alpha, DenseSkOp &S0, int64_t n, int64_t d, int64_t S_ro, int64_t S_co, Layout layout, T beta = 0.0 | |
) { | |
auto [n0, d0] = dimensions(S0); | |
randblas_require(n0 >= n); | |
randblas_require(d0 >= d); | |
bool is_colmajor = layout == Layout::ColMajor; | |
int64_t ldb = (is_colmajor) ? n : d; | |
auto I = eye<SpMat>(n); | |
auto B = std::get<0>(random_matrix<T>(n, d, RNGState(11))); | |
std::vector<T> B_backup(B); | |
rsksp3(layout, Op::NoTrans, Op::NoTrans, n, d, n, alpha, I, 0, 0, S0, S_ro, S_co, beta, B.data(), ldb); | |
T *expect = new T[n0 * d0]; | |
to_explicit_buffer(S0, expect, layout); | |
int64_t ld_expect = (is_colmajor)? n0 : d0; | |
auto [row_stride_s, col_stride_s] = layout_to_strides(layout, ld_expect); | |
auto [row_stride_b, col_stride_b] = layout_to_strides(layout, ldb); | |
int64_t offset = row_stride_s * S_ro + col_stride_s * S_co; | |
#define MAT_E(_i, _j) expect[offset + (_i)*row_stride_s + (_j)*col_stride_s] | |
#define MAT_B(_i, _j) B_backup[ (_i)*row_stride_b + (_j)*col_stride_b] | |
for (int i = 0; i < n; ++i) { | |
for (int j = 0; j < d; ++j) { | |
MAT_E(i,j) = alpha * MAT_E(i,j) + beta * MAT_B(i, j); | |
} | |
} | |
test::comparison::matrices_approx_equal( | |
layout, Op::NoTrans, n, d, B.data(), ldb, &expect[offset], ld_expect, | |
__PRETTY_FUNCTION__, __FILE__, __LINE__ | |
); | |
delete [] expect; | |
} |
@aryamanjeendgar, for our reference, here's the code you mentioned from the test suite of FFHT: void fht(double *buf, int log_n) {
int n = 1 << log_n;
for (int i = 0; i < log_n; ++i) {
int s1 = 1 << i;
int s2 = s1 << 1;
for (int j = 0; j < n; j += s2) {
for (int k = 0; k < s1; ++k) {
double u = buf[j + k];
double v = buf[j + k + s1];
buf[j + k] = u + v;
buf[j + k + s1] = u - v;
}
}
}
} Step 1 to figuring out how we'll change it: replace the bit manipulations with standard integer arithmetic (or just give explanatory comments). You can also create a GitHub issue to facilitate the discussion, or make an Overleaf project. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do a detailed read of my notes in GitHub Issue #99.
I didn't review the whole PR since my plane is landing and I need to put away my laptop.
Ping @aryamanjeendgar
RandBLAS/trig_skops.hh
Outdated
template<typename T, SignedInteger sint_t> | ||
void applyDiagonalRademacher( | ||
bool left, | ||
blas::Layout layout, | ||
int64_t rows, | ||
int64_t cols, | ||
T* A, | ||
sint_t* diag | ||
) | ||
{ | ||
if(left && layout == blas::Layout::ColMajor) { | ||
for(int col=0; col < cols; col++) { | ||
if(diag[col] > 0) | ||
continue; | ||
blas::scal(rows, diag[col], &A[col * rows], 1); | ||
} | ||
} | ||
else if(left && layout == blas::Layout::RowMajor) { | ||
for(int col=0; col < cols; col++) { | ||
if(diag[col] > 0) | ||
continue; | ||
blas::scal(rows, diag[col], &A[col], cols); | ||
} | ||
} | ||
else if(!left && layout == blas::Layout::ColMajor) { | ||
for(int row = 0; row < rows; row++) { | ||
if(diag[row] > 0) | ||
continue; | ||
blas::scal(cols, diag[row], &A[row], rows); | ||
} | ||
} | ||
else { | ||
for(int row = 0; row < rows; row++) { | ||
if(diag[row] > 0) | ||
continue; | ||
blas::scal(cols, diag[row], &A[row * cols], 1); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine for now, but it can and should be much more efficient. It's also probably better suited to RandBLAS/util.hh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking pretty good! Please start writing unit tests.
A description of the tests: All of the tests right now (unfortunately) use Eigen --- I use Eigen extensively to be able to consistently produce I also ended up using Much of this infrastructure (with norm computation, scaling, computing products etc.) can be ported (with a lot more LoC XD) to BLAS with the exception of the permutation tests which tests against Eigen's The correctness tests are fairly straightforward to understand:
The tests are simple, but they cover all of the potential code paths in my code (since the code paths are chosen around two inputs: sketching direction + layout of the input matrix) Let me know how we should proceed next, @rileyjmurray! UPDATE: The latest commit also adds in |
This is a draft PR for integrating SRHTs into
RandBLAS