Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse algebra support with OOP API #760

Merged
merged 108 commits into from
Nov 9, 2024
Merged

Conversation

jalvesz
Copy link
Contributor

@jalvesz jalvesz commented Jan 10, 2024

PR related to:
#38
#749
#189

Here I try to propose the OOP API upfront as a means to centralize data-containment format within stdlib, which I feel would be a (the) big added value. I'm migrating the proposal started here FSPARSE to adapt it to stdlib.

Current proposal design:

  • Derived Types:
    sparse_type (abstract base type)
    COO_type, CSR_type, CSC_type, ELL_type, SELLC_type (DTs with no data buffer)
    COO_?p_type, CSR_?p_type, CSC_?p_type, ELL_?p_type, SELLC_?p_type (DTs with data buffer, where ? kind precision including complex values)
  • matrix-vector product:
call spmv( matrix, x, y [,alpha,beta,op] ) !> y = alpha * op(matrix) * x + beta * y

op: sparse_op_none='N' (default), sparse_op_transpose='T' or sparse_op_hermitian='H'

  • Data accessors:
val = matrix%at( i , j ) !> to get the value at (i,j) position, if (i,j) not in the sparse pattern, value = 0, if out-of-bounds val = NaN
call matrix%add( i , j, val ) !> to set the value at (i,j) position, if (i,j) not in the current structure, skip
!> OR add a data block
call matrix%add( dofs_i(:), dofs_j(:), mat(:,:) ) !>
  • Conversions :
call coo2ordered(COO [, sort_data]) !> sort and remove duplicates from the COO data structure.
 
call dense2coo( dense , COO )  !> dense is a plain 2d array
call coo2dense( COO  , dense )

call coo2csr( COO , CSR ) !> assumes ordered and unique indexes for COO ... to implement a sorting algorithms before transferring data
call csr2coo( CSR , COO ) !> assumes ordered and unique indexes for CSR 

call coo2csc( COO , CSC ) !> assumes ordered and unique indexes for COO 
call csc2coo( CSC , COO ) !> assumes ordered and unique indexes for CSC

call csr2ell( CSR , ELL [, num_nz_rows]  ) 
call csr2sellc( CSR , SELLC [, chunk ]  ) !> chunk sizes for spmv [4, 8, 16]

call diag( <sparse> , diag ) !> extract the diagonal components of the sparse matrix

@zerothi
Copy link

zerothi commented Apr 4, 2024

Wouldn't it be useful if sparse_t has the interface for mat%to_coo|csr|... etc instead of using a function call. Since this is going OOP, lets keep it like that.

@jalvesz
Copy link
Contributor Author

jalvesz commented Apr 4, 2024

This could be done even shorter with a %to(...) interface as the types are declared before calling the conversion. I've done that before but did not propose it here upfront as I wanted to hear some opinions on what the OOP API in stdlib should look like.

@zerothi
Copy link

zerothi commented Apr 4, 2024

Agreed, %to would be even better, or %convert or clarity? hmm...

@jalvesz jalvesz marked this pull request as draft April 20, 2024 09:14
@ftucciarone
Copy link

ftucciarone commented May 17, 2024

Is it a good idea to have matvec product that do not initialize the result vector? I have to say that I have spent a good hour debugging a code and the source of divergence in the solver was the fact that I was not initializing the result vector before passing it to the subroutine. While I understand that this is a great way to perform Axpy operation, I would advise to have separate matvec (with initialization) and Axpy (without) routines.

    subroutine matvec_csr_1d_sp(vec_y,matrix,vec_x)
        type(CSR_sp), intent(in) :: matrix
        real(sp), intent(in)    :: vec_x(:)
        real(sp), intent(inout) :: vec_y(:)
        integer :: i, j
        real(sp) :: aux

        associate( data => matrix%data, col => matrix%col, rowptr => matrix%rowptr, &
            & nnz => matrix%nnz, nrows => matrix%nrows, ncols => matrix%ncols, sym => matrix%sym )
            if( sym == k_NOSYMMETRY) then
                do concurrent(i=1:nrows)
                    do j = rowptr(i), rowptr(i+1)-1
                        vec_y(i) = vec_y(i) + data(j) * vec_x(col(j))
                    end do
                end do

            else if( sym == k_SYMTRIINF )then
                do i = 1 , nrows
                    aux  = 0._sp
                    do j = rowptr(i), rowptr(i+1)-2
                        aux = aux + data(j) * vec_x(col(j))
                        vec_y(col(j)) = vec_y(col(j)) + data(j) * vec_x(i)
                    end do
                    aux = aux + data(j) * vec_x(i)
                    vec_y(i) = vec_y(i) + aux
                end do

            else if( sym == k_SYMTRISUP )then
                do i = 1 , nrows
                    aux  = vec_x(i) * data(rowptr(i))
                    do j = rowptr(i)+1, rowptr(i+1)-1
                        aux = aux + data(j) * vec_x(col(j))
                        vec_y(col(j)) = vec_y(col(j)) + data(j) * vec_x(i)
                    end do
                    vec_y(i) = vec_y(i) + aux
                end do

            end if
        end associate
    end subroutine

@jalvesz
Copy link
Contributor Author

jalvesz commented May 18, 2024

Hi @ftucciarone, thanks for checking out the current draft and sorry for the inconvenience.

Yes, I have intentionally designed the API to not initialize the vector, in the current version within FSPARSE https://github.com/jalvesz/FSPARSE?tab=readme-ov-file#sparse-matrix-vector-product I updated the README to be clear about it but forgot to update here.

The reason for that choice is because it allows maximum flexibility for preprocessing linear systems needing operations with the same matrix operator that will be used for the solver. So it basically places the "responsability" on the user to place a y=0 just before if one does indeed need a clean vector.

This is of course totally open to debate, I just proposed following my experience reworking solvers. My feeling is that doing that can avoid having to manage different implementations or optionals, when the solution can be to explicitly state that the interface is updating (y = y + M * x) instead of overwriting (y = M * x).

@ftucciarone
Copy link

ftucciarone commented May 18, 2024

Hi @jalvesz, it was actually a pleasure to look into this draft, I am learning a lot and I look forward to discuss it with you if possible.

I have to say that I have particularly strong feelings about the "non-initialization" problem, mostly twofold.
First, if y = y + M*x is the chosen way to go, then it must be written very large, at the very beginning of the documentation, to make sure everyone sees that. Test example should also take care of that, showing that the result is wrong if not initialized before.
Second, I have the feeling that doing first y=0_dp and then call matvec(y, A, x) might add an overhead due to the initialization of y for very large problems, while initializing the component y(i) while doing the matvec might be faster, but I have to test because I'm not 100% sure about this.

However, I think that separating matvec and Axpy could be feasible at this stage (I can take care of that with the correct guidance) and potentially doable by preprocessing as well (I'm thinking at a fypp if).

Cheers, Francesco

Edit: example of splitting matvec and matAxpy with fypp

#:include "../include/common.fypp"
#:set RANKS = range(1, 2+1)
#:set KINDS_TYPES = REAL_KINDS_TYPES
#:set OPERATIONS = ['matvec', 'matAxpy'] <--- DEFINE THE NAMES
#! define ranks without parentheses
#:def rksfx2(rank)
#{if rank > 0}#${":," + ":," * (rank - 1)}$#{endif}#
#:enddef

!! matvec_csr
    #:for k1, t1 in (KINDS_TYPES)
    #:for rank in RANKS
    #:for idx, opname in enumerate(OPERATIONS) <--- ITERATE OVER THE NAMES
    subroutine ${opname}$_csr_${rank}$d_${k1}$(vec_y,matrix,vec_x)
        type(CSR_${k1}$), intent(in) :: matrix
        ${t1}$, intent(in)    :: vec_x${ranksuffix(rank)}$
        ${t1}$, intent(inout) :: vec_y${ranksuffix(rank)}$
        integer :: i, j
        #:if rank == 1
        ${t1}$ :: aux
        #:else
        ${t1}$ :: aux(size(vec_x,dim=1))
        #:endif

        associate( data => matrix%data, col => matrix%col, rowptr => matrix%rowptr, &
            & nnz => matrix%nnz, nrows => matrix%nrows, ncols => matrix%ncols, sym => matrix%sym )
            if( sym == k_NOSYMMETRY) then
                do concurrent(i=1:nrows)
                    do j = rowptr(i), rowptr(i+1)-1
                        #:if idx==0  <--- IF MATVEC 
                        vec_y(${rksfx2(rank-1)}$i) = data(j) * vec_x(${rksfx2(rank-1)}$col(j))
                        #:else  <--- IF AXPY
                        vec_y(${rksfx2(rank-1)}$i) = vec_y(${rksfx2(rank-1)}$i) + data(j) * vec_x(${rksfx2(rank-1)}$col(j))
                        #:endif
                    end do
                end do

            else if( sym == k_SYMTRIINF )then
                do i = 1 , nrows
                    aux  = 0._${k1}$
                    do j = rowptr(i), rowptr(i+1)-2
                        aux = aux + data(j) * vec_x(${rksfx2(rank-1)}$col(j))
                        vec_y(${rksfx2(rank-1)}$col(j)) = vec_y(${rksfx2(rank-1)}$col(j)) + data(j) * vec_x(${rksfx2(rank-1)}$i)
                    end do
                    aux = aux + data(j) * vec_x(${rksfx2(rank-1)}$i)
                    #:if idx==0   <--- IF MATVEC 
                    vec_y(${rksfx2(rank-1)}$i) = aux
                    #:else  <--- IF AXPY
                    vec_y(${rksfx2(rank-1)}$i) = vec_y(${rksfx2(rank-1)}$i) + aux
                    #:endif                      
                end do

            else if( sym == k_SYMTRISUP )then
                do i = 1 , nrows
                    aux  = vec_x(${rksfx2(rank-1)}$i) * data(rowptr(i))
                    do j = rowptr(i)+1, rowptr(i+1)-1
                        aux = aux + data(j) * vec_x(${rksfx2(rank-1)}$col(j))
                        vec_y(${rksfx2(rank-1)}$col(j)) = vec_y(${rksfx2(rank-1)}$col(j)) + data(j) * vec_x(${rksfx2(rank-1)}$i)
                    end do
                    #:if idx==0   <--- IF MATVEC 
                    vec_y(${rksfx2(rank-1)}$i) = aux
                    #:else  <--- IF AXPY
                    vec_y(${rksfx2(rank-1)}$i) = vec_y(${rksfx2(rank-1)}$i) + aux
                    #:endif                    
                 end do

            end if
        end associate
    end subroutine
    
    #:endfor
    #:endfor
    #:endfor

@jalvesz
Copy link
Contributor Author

jalvesz commented May 18, 2024

@ftucciarone thanks for the idea! I will also bring parts of a discussion I had with @ivan-pi on exactly this topic, where he brought to my attention the following:

The MKL library offers, y := beta y + alpha A x, and then you need to pick either beta = 0, or beta = 1, depending on what you want.

In Apple's Accelerate Framework on the other hand, they offer two functions:

SparseMultiply(A,x,y)           // y = A x
SparseMultiplyAdd(A,x,y)     // y += A x

In PSBLAS, spmm covers both vectors and matrices (rank-2 dense arrays):

call psb_spmm(alpha, a, x, beta, y, desc_a, info)
call psb_spmm(alpha, a, x, beta, y, desc_a, info, trans, work)

https://psctoolkit.github.io/psblasguide/userhtmlse4.html#x18-550004

Taking all those ideas together, I could imagine using optionals to cover:

call matvec( A , vec_x, vec_y ) !> vec_y = vec_y + A * y
call matvec( A , vec_x, vec_y , overwrite = .true. ) !> vec_y = A * y
call matvec( A , vec_x, vec_y , alpha=value, beta=value ) !> vec_y = beta * vec_y + alpha * A * y

Or the default to be overwrite and an optional for addition, or simply with beta =1 if(present(beta)) would automatically determine that.

Let me know your thoughts on that

@jalvesz
Copy link
Contributor Author

jalvesz commented Oct 19, 2024

Updates: added in-place transpose (and conjugate transpose) for the spmv kernels

Copy link
Contributor

@perazz perazz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jalvesz for the update. LGTM as the implementation is very clean and it looks like you've addressed all previous comments. I would like this PR to be merged before new features are added, as it's becoming harder to track the new additions.

So I suggest any others that have comments/edits to please do so, and maybe we should think of merging soon if no further comments arise.

@jalvesz
Copy link
Contributor Author

jalvesz commented Nov 1, 2024

Thanks @perazz for the feedback. I did a few last cleanups and enhanced the sellc format support. On my side I also think the PR is now ready to be merged if there are no further comments. Future features/improvements could be addressed with separate PRs having this base available.

Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jalvesz for this PR. LGTM. I agree woth @perazz that this first version is ready to be merged

@perazz
Copy link
Contributor

perazz commented Nov 5, 2024

Sounds good @jvdp1 @jalvesz - let's wait another couple of days, and then I will merge.

@jalvesz
Copy link
Contributor Author

jalvesz commented Nov 5, 2024

In 22cd23e I rollback the use of local associate construct for the SELLC spmv because after discussion with @ivan-pi I realized that it hinders optimizations. I saw drops in performance of about 40-60% compared to using the internal subroutine kernels.

@perazz perazz merged commit 3369013 into fortran-lang:master Nov 9, 2024
16 checks passed
@jalvesz jalvesz deleted the sparse branch November 10, 2024 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants