Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add triangular matrix functions and ExecutionPolicy overloads #87

Merged
merged 20 commits into from
Oct 18, 2021

Conversation

amklinv-nnl
Copy link
Contributor

Closes #45

@amklinv-nnl
Copy link
Contributor Author

I added the overwriting triangular_matrix_[left/right]_product functions, but not the updating ones. Submitting a draft request in case @mhoemmen gets to it first.

@mhoemmen mhoemmen self-requested a review July 22, 2021 23:07
Copy link
Contributor

@mhoemmen mhoemmen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR and for catching the missing functions! : - ) I caught just a couple issues that we should consider fixing before merging.

include/experimental/__p1673_bits/blas3_matrix_product.hpp Outdated Show resolved Hide resolved

if constexpr (std::is_same_v<Triangle, lower_triangle_t>) {
for (size_type j = 0; j < C.extent(1); ++j) {
for (size_type i = j; i < C.extent(0); ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Unlike the symmetric rank-1 or rank-k update functions, these functions (that is, {symmetric,hermitian}_matrix_{left,right}_product) assume that the input matrix A -- not the output matrix -- is symmetric." Thus, they should iterate over all entries of C, but only over the lower or upper triangle of A.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure I made the exact same mistake with the symmetric functions, so I should probably fix that too.

extents<>::size_type numCols_C,
class Layout_C,
class Accessor_C>
void symmetric_matrix_product(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This symmetric_matrix_product function isn't in the current version of P1673. It's OK to leave in there, but it doesn't have to be there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking out the trash!

class Accessor_X>
void triangular_matrix_matrix_left_solve(
std::experimental::mdspan<ElementType_A, std::experimental::extents<numRows_A, numCols_A>, Layout_A, Accessor_A> A,
Triangle t,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you consider commenting out t (as in Triangle /* t */) to prevent build warnings for t being unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure would! How did you trigger those warnings? I didn't see them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amklinv-nnl Greetings! Sorry I missed your comment! Adding -Wall to the list of warnings should help.

C = A * B (or C = B * A) was not treating A as a symmetric matrix.

This does not have tests yet. Untested code is broken code.
Replaced by symmetric_matrix[right/left]_product
Replaced by left/right equivalents

if constexpr (std::is_same_v<Triangle, lower_triangle_t>) {
for (size_type j = 0; j < C.extent(1); ++j) {
for (size_type i = 0; i < C.extent(0); ++i) {
C(i,j) = ElementType_C{};
for (size_type k = 0; k <= i; ++k) {
const size_type k_upper = explicitDiagonal ? i : i - size_type(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size_type is unsigned, so if i is zero and explicitDiagonal is false, then i - size_type(i) will be a big positive number. I think the fix would be to change the bounds of the for loop over i.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

}
}
}
else { // upper_triangle_t
for (size_type j = 0; j < C.extent(1); ++j) {
const size_type k_upper = explicitDiagonal ? j : j - size_type(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see note above -- thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

The bound k_upper may be negative for certain values, meaning it
cannot be an unsigned type.
Comes with bonus tests for left-multiply. Right
should be assumed to not work (especially since
it's not even implemented).
@@ -572,7 +572,7 @@ void triangular_matrix_left_product(
for (size_type i = 0; i < C.extent(0); ++i) {
C(i,j) = ElementType_C{};
const ptrdiff_t k_upper = explicitDiagonal ? i : i - size_type(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size_type minus size_type is still size_type, so i - size_type(1) will still wrap around to -1 with i=0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with ptrdiff_t(1)

}
else { // lower_triangle_t
for (size_type j=0; j < C.extent(1); ++j) {
for (ptrdiff_t k=C.extent(0)-1; k >= 0; --k) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be correct, though I tend to prefer k = N; k > 0; --k) count-down loops, as they are more robust to changes in the index type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

for (size_type j=0; j < C.extent(1); ++j) {
for (size_type k=0; k < C.extent(0); ++k) {
for (size_type i=0; i < k; ++i) {
C(i,j) += C(k,j)*A(i,k);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Order of multiplication is actually significant, since we support element types like quaternions with noncommutative multiplication. This means that the "left product" matrix A actually needs to go on the left for each element-times-element multiplication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

C(i,j) += C(k,j)*A(i,k);
}
if constexpr (explicitDiagonal) {
C(k,j) *= A(k,k);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that multiplication order is significant (see above), would you consider spelling out the multiplication instead of using *=, as a way to make clear the order of the factors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

const ptrdiff_t k_upper = explicitDiagonal ? i : i - size_type(1);
is incorrect because size_type - size_type is still size_type.
Bad things would happen if i=0. ptrdiff_t(1) is now used instead.
tests/gemm.cpp Outdated
/* C = A * B, where A is triangular mxm */
using extents_t = extents<dynamic_extent, dynamic_extent>;
using matrix_t = mdspan<double, extents_t, layout_left>;
double snan = std::numeric_limits<double>::signaling_NaN();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you consider making this constexpr, given that it's a kind of named constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

tests/gemm.cpp Outdated
for (ptrdiff_t j = 0; j < m; ++j) {
for (ptrdiff_t i = 0; i < n; ++i) {
// FIXME: Choose a more reasonable value for the tolerance
double tol = 1e-9;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you consider making constants like these constexpr if possible? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

tests/gemm.cpp Outdated Show resolved Hide resolved
@@ -673,7 +673,7 @@ void triangular_matrix_right_product(
const ptrdiff_t k_upper = explicitDiagonal ? j : j - size_type(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable j has type size_type (std::size_t, an unsigned integer), and j - size_type(1) has type size_type (please see this godbolt example), so k_upper will be the biggest possible 64-bit positive integer. The resulting k <= k_upper comparison will always evaluate to true.

The work-around that comes to mind would be to have a special case for explicitDiagonal being true. That way, one could rewrite the loop bound as k < j. Here is a possible reimplementation of lines 676-681:

for (ptrdiff_t k = 0; k < j; ++k) {
  C(i,j) += B(i,k) * A(k,j);
}
if constexpr (explicitDiagonal) {
  C(i,j) += B(i,j) /* times 1 */;
} else {
  C(i,j) += B(i,j) * A(j,j);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is changing size_type(1) to ptrdiff_t(1) acceptable? If not, I can change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amklinv-nnl Please see suggested changes below -- thanks!

Some triangular multiplications were written with
C = B(...)*A(...) when they should have been
C = A(...)*B(...) and vice-versa. These have been
fixed to allow for strange datatypes.
@amklinv-nnl amklinv-nnl marked this pull request as ready for review August 30, 2021 21:19

if constexpr (std::is_same_v<Triangle, lower_triangle_t>) {
for (size_type j = 0; j < C.extent(1); ++j) {
for (size_type i = 0; i < C.extent(0); ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (size_type i = 0; i < C.extent(0); ++i) {
for (ptrdiff_t i = 0; i < static_cast<ptrdiff_t>(C.extent(0)); ++i) {

}
}
else { // upper_triangle_t
for (size_type j = 0; j < C.extent(1); ++j) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (size_type j = 0; j < C.extent(1); ++j) {
for (ptrdiff_t j = 0; j < static_cast<ptrdiff_t>(C.extent(1)); ++j) {

}
else { // lower_triangle_t
for (size_type j=0; j < C.extent(1); ++j) {
for (size_type k=C.extent(0); k > 0; --k) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done; thank you! : - )

for (size_type i = 0; i <= j; ++i) {
C(i,j) = E(i,j);
for (size_type k = 0; k < A.extent(1); ++k) {
C(i,j) += B(i,k) * A(k,j);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the symmetric and Hermitian left and right products should go over the matrix A twice, unlike the triangular left and right products.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should separate that into its own issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amklinv-nnl I might be a bit confused perhaps -- were those functions implemented before? If not and if this PR provides a partial, known-incorrect implementation for later revision, then would you consider putting assert(false); in the functions known to be correct, so that we know to come back to them later? Thanks! : - )

Also replaced untested functions with assert(false). Untested code is
broken code.
Copy link
Contributor

@mhoemmen mhoemmen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting this! This should fix #111, #112, and #113. There is just one small issue regarding conj; please see note.

@@ -40,6 +40,8 @@
//@HEADER
*/

#include <cassert>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should put this inside the include-once guard, but no biggie : - )

C(i,j) = ElementType_C{};
for (size_type k = 0; k < A.extent(1); ++k) {
C(i,j) += A(i,k) * B(k,j);
ElementType_A aik = i <= k ? conj(A(k,i)) : A(i,k);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will compile for generic ElementType_A, because std::conj always returns std::complex (even if its argument is real). There might be something in here like a "conditional conj" that returns the same type as its argument, but it's not too hard to write one:

template<class T>
T cond_conj(const T& t) {
  return t;
}
template<class R>
std::complex<R> cond_conj(const std::complex<R>& z)
{
  using std::conj;
  return conj(z); 
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've spun this off into a separate issue, #115.

C(i,j) = ElementType_C{};
for (size_type k = 0; k < A.extent(1); ++k) {
C(i,j) += B(i,k) * A(k,j);
ElementType_A akj = j <= k ? A(k,j) : conj(A(j,k));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see note above; thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've spun this off into a separate issue, #115.

@mhoemmen
Copy link
Contributor

I've spun off remaining things to fix into separate issues, #115 and #116. Thanks so much for your hard work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ExecutionPolicy overloads of in-place triangular_matrix_* functions (see list)
2 participants