Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split out level 3 gemm tests #2610

Merged
merged 6 commits into from
Jan 25, 2025
Merged

Split out level 3 gemm tests #2610

merged 6 commits into from
Jan 25, 2025

Conversation

kshyatt
Copy link
Contributor

@kshyatt kshyatt commented Jan 8, 2025

Testing locally, the level 3 and split-out level 3 GEMM-y tests seem to take the same amount of time. Should help with parallelization. Also removed an extraneous comment.

@kshyatt kshyatt requested a review from maleadt January 8, 2025 16:45
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: 055d1ed Previous: 4bec614 Ratio
latency/precompile 45467516551.5 ns 45396234276 ns 1.00
latency/ttfp 6292609157.5 ns 6416277525.5 ns 0.98
latency/import 2969512039 ns 3047951471 ns 0.97
integration/volumerhs 9556104 ns 9572210 ns 1.00
integration/byval/slices=1 146714 ns 146689 ns 1.00
integration/byval/slices=3 424987 ns 424769 ns 1.00
integration/byval/reference 144753 ns 144911 ns 1.00
integration/byval/slices=2 285749.5 ns 285674 ns 1.00
integration/cudadevrt 103213 ns 103228 ns 1.00
kernel/indexing 14056 ns 13962 ns 1.01
kernel/indexing_checked 14736 ns 14556 ns 1.01
kernel/occupancy 641.5117647058823 ns 693.384105960265 ns 0.93
kernel/launch 2064.8500000000004 ns 2164.166666666667 ns 0.95
kernel/rand 15370 ns 14418 ns 1.07
array/reverse/1d 19264.5 ns 19581 ns 0.98
array/reverse/2d 24785 ns 24389 ns 1.02
array/reverse/1d_inplace 10279 ns 10606.666666666666 ns 0.97
array/reverse/2d_inplace 11737 ns 11144 ns 1.05
array/copy 20689 ns 20336 ns 1.02
array/iteration/findall/int 155191.5 ns 156856.5 ns 0.99
array/iteration/findall/bool 134163.5 ns 135569 ns 0.99
array/iteration/findfirst/int 153411.5 ns 153474.5 ns 1.00
array/iteration/findfirst/bool 152939 ns 152950 ns 1.00
array/iteration/scalar 61964 ns 60882 ns 1.02
array/iteration/logical 195342 ns 202672 ns 0.96
array/iteration/findmin/1d 37671 ns 37856 ns 1.00
array/iteration/findmin/2d 93676 ns 93737 ns 1.00
array/reductions/reduce/1d 39152.5 ns 38166 ns 1.03
array/reductions/reduce/2d 51078 ns 51122 ns 1.00
array/reductions/mapreduce/1d 36169.5 ns 31151.5 ns 1.16
array/reductions/mapreduce/2d 44463 ns 49629.5 ns 0.90
array/broadcast 21661 ns 21225 ns 1.02
array/copyto!/gpu_to_gpu 11483 ns 13324 ns 0.86
array/copyto!/cpu_to_gpu 207967 ns 208348.5 ns 1.00
array/copyto!/gpu_to_cpu 241256 ns 241560 ns 1.00
array/accumulate/1d 108663.5 ns 108467 ns 1.00
array/accumulate/2d 79954 ns 79962 ns 1.00
array/construct 1278.25 ns 1342.7 ns 0.95
array/random/randn/Float32 42941.5 ns 43560.5 ns 0.99
array/random/randn!/Float32 26269 ns 26195 ns 1.00
array/random/rand!/Int64 26906 ns 27079 ns 0.99
array/random/rand!/Float32 8585.666666666666 ns 8700 ns 0.99
array/random/rand/Int64 29592 ns 29827 ns 0.99
array/random/rand/Float32 12842 ns 12930 ns 0.99
array/permutedims/4d 60861 ns 67316 ns 0.90
array/permutedims/2d 54753 ns 56600 ns 0.97
array/permutedims/3d 56085 ns 59248 ns 0.95
array/sorting/1d 2775041 ns 2764861 ns 1.00
array/sorting/by 3366209 ns 3352588 ns 1.00
array/sorting/2d 1084260 ns 1080760 ns 1.00
cuda/synchronization/stream/auto 1044.3 ns 1111.7 ns 0.94
cuda/synchronization/stream/nonblocking 6317.8 ns 6387.8 ns 0.99
cuda/synchronization/stream/blocking 817.2022471910112 ns 831.395061728395 ns 0.98
cuda/synchronization/context/auto 1215.6 ns 1212.1 ns 1.00
cuda/synchronization/context/nonblocking 6522 ns 6586.8 ns 0.99
cuda/synchronization/context/blocking 909.1590909090909 ns 916.775 ns 0.99

This comment was automatically generated by workflow using github-action-benchmark.

@maleadt
Copy link
Member

maleadt commented Jan 17, 2025

Failure seems related:

libraries/cublas/level3: Error During Test at /var/lib/buildkite-agent/builds/gpuci-8/julialang/cuda-dot-jl/test/libraries/cublas/level3.jl:20
2025-01-08 18:25:58 CEST	  Got exception outside of a @test
2025-01-08 18:25:58 CEST	  CUBLASError: an invalid value was used as an argument (code 7, CUBLAS_STATUS_INVALID_VALUE)

@kshyatt
Copy link
Contributor Author

kshyatt commented Jan 19, 2025

Can't repro this after rebasing onto latest master. Let me push and see if it persists.

test/libraries/cublas/level3_gemm.jl Show resolved Hide resolved
test/libraries/cublas/level3_gemm.jl Show resolved Hide resolved
test/libraries/cublas/level3_gemm.jl Show resolved Hide resolved
test/libraries/cublas/level3_gemm.jl Outdated Show resolved Hide resolved
test/libraries/cublas/level3_gemm.jl Outdated Show resolved Hide resolved
test/libraries/cublas/level3_gemm.jl Show resolved Hide resolved
test/libraries/cublas/level3_gemm.jl Show resolved Hide resolved
test/libraries/cublas/level3_gemm.jl Show resolved Hide resolved
test/libraries/cublas/level3_gemm.jl Show resolved Hide resolved
test/libraries/cublas/level3_gemm.jl Show resolved Hide resolved
Copy link
Contributor

github-actions bot commented Jan 23, 2025

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic master) to apply these changes.

Click here to view the suggested changes.
diff --git a/lib/cublas/linalg.jl b/lib/cublas/linalg.jl
index f63d06f0a..f50f07c38 100644
--- a/lib/cublas/linalg.jl
+++ b/lib/cublas/linalg.jl
@@ -205,9 +205,11 @@ LinearAlgebra.generic_trimatdiv!(C::StridedCuVector{T}, uploc, isunitc, tfun::Fu
 # work around upstream breakage from JuliaLang/julia#55547
 @static if VERSION >= v"1.11.2"
     const CuUpperOrUnitUpperTriangular = LinearAlgebra.UpperOrUnitUpperTriangular{
-        <:Any,<:Union{<:CuArray, Adjoint{<:Any, <:CuArray}, Transpose{<:Any, <:CuArray}}}
+        <:Any, <:Union{<:CuArray, Adjoint{<:Any, <:CuArray}, Transpose{<:Any, <:CuArray}},
+    }
     const CuLowerOrUnitLowerTriangular = LinearAlgebra.LowerOrUnitLowerTriangular{
-        <:Any,<:Union{<:CuArray, Adjoint{<:Any, <:CuArray}, Transpose{<:Any, <:CuArray}}}
+        <:Any, <:Union{<:CuArray, Adjoint{<:Any, <:CuArray}, Transpose{<:Any, <:CuArray}},
+    }
     LinearAlgebra.istriu(::CuUpperOrUnitUpperTriangular) = true
     LinearAlgebra.istril(::CuUpperOrUnitUpperTriangular) = false
     LinearAlgebra.istriu(::CuLowerOrUnitLowerTriangular) = false
diff --git a/lib/cusparse/linalg.jl b/lib/cusparse/linalg.jl
index 18ecac4a7..dd1d2489e 100644
--- a/lib/cusparse/linalg.jl
+++ b/lib/cusparse/linalg.jl
@@ -241,9 +241,11 @@ end
 # work around upstream breakage from JuliaLang/julia#55547
 @static if VERSION >= v"1.11.2"
     const CuSparseUpperOrUnitUpperTriangular = LinearAlgebra.UpperOrUnitUpperTriangular{
-        <:Any,<:Union{<:AbstractCuSparseMatrix, Adjoint{<:Any, <:AbstractCuSparseMatrix}, Transpose{<:Any, <:AbstractCuSparseMatrix}}}
+        <:Any, <:Union{<:AbstractCuSparseMatrix, Adjoint{<:Any, <:AbstractCuSparseMatrix}, Transpose{<:Any, <:AbstractCuSparseMatrix}},
+    }
     const CuSparseLowerOrUnitLowerTriangular = LinearAlgebra.LowerOrUnitLowerTriangular{
-        <:Any,<:Union{<:AbstractCuSparseMatrix, Adjoint{<:Any, <:AbstractCuSparseMatrix}, Transpose{<:Any, <:AbstractCuSparseMatrix}}}
+        <:Any, <:Union{<:AbstractCuSparseMatrix, Adjoint{<:Any, <:AbstractCuSparseMatrix}, Transpose{<:Any, <:AbstractCuSparseMatrix}},
+    }
     LinearAlgebra.istriu(::CuSparseUpperOrUnitUpperTriangular) = true
     LinearAlgebra.istril(::CuSparseUpperOrUnitUpperTriangular) = false
     LinearAlgebra.istriu(::CuSparseLowerOrUnitLowerTriangular) = false
diff --git a/test/core/initialization.jl b/test/core/initialization.jl
index c2d03ac83..8b408c564 100644
--- a/test/core/initialization.jl
+++ b/test/core/initialization.jl
@@ -186,8 +186,8 @@ end
 ## allocations
 
 let broken = VERSION == v"1.11.3" && Base.JLOptions().code_coverage != 0
-    @test @allocated(current_context()) == 0 broken=broken
-    @test @allocated(context()) == 0 broken=broken
-    @test @allocated(stream()) == 0 broken=broken
-    @test @allocated(device()) == 0 broken=broken
+    @test @allocated(current_context()) == 0 broken = broken
+    @test @allocated(context()) == 0 broken = broken
+    @test @allocated(stream()) == 0 broken = broken
+    @test @allocated(device()) == 0 broken = broken
 end
diff --git a/test/libraries/cublas/level3_gemm.jl b/test/libraries/cublas/level3_gemm.jl
index bdbe8d1db..2f9770bce 100644
--- a/test/libraries/cublas/level3_gemm.jl
+++ b/test/libraries/cublas/level3_gemm.jl
@@ -37,27 +37,27 @@ k = 13
         @testset "gemm!" begin
             alpha = rand(elty)
             beta = rand(elty)
-            A = rand(elty,m,k)
-            B = rand(elty,k,n)
-            C1 = rand(elty,m,n)
+            A = rand(elty, m, k)
+            B = rand(elty, k, n)
+            C1 = rand(elty, m, n)
             C2 = copy(C1)
             d_A = CuArray(A)
             d_B = CuArray(B)
             d_C1 = CuArray(C1)
             d_C2 = CuArray(C2)
-            hA = rand(elty,m,m)
+            hA = rand(elty, m, m)
             hA = hA + hA'
             dhA = CuArray(hA)
-            sA = rand(elty,m,m)
+            sA = rand(elty, m, m)
             sA = sA + transpose(sA)
             dsA = CuArray(sA)
 
-            CUBLAS.gemm!('N','N',alpha,d_A,d_B,beta,d_C1)
+            CUBLAS.gemm!('N', 'N', alpha, d_A, d_B, beta, d_C1)
             mul!(d_C2, d_A, d_B)
             h_C1 = Array(d_C1)
             h_C2 = Array(d_C2)
-            C1 = (alpha*A)*B + beta*C1
-            C2 = A*B
+            C1 = (alpha * A) * B + beta * C1
+            C2 = A * B
             # compare
             @test C1 ≈ h_C1
             @test C2 ≈ h_C2
@@ -65,9 +65,9 @@ k = 13
             @test_throws DimensionMismatch mul!(d_C1, d_A, dsA)
         end
         @testset "strided gemm!" begin
-            denseA = CUDA.rand(elty, 4,4)
-            denseB = CUDA.rand(elty, 4,4)
-            denseC = CUDA.zeros(elty, 4,4)
+            denseA = CUDA.rand(elty, 4, 4)
+            denseB = CUDA.rand(elty, 4, 4)
+            denseC = CUDA.zeros(elty, 4, 4)
 
             stridedA = view(denseA, 1:2, 1:2)::SubArray
             stridedB = view(denseB, 1:2, 1:2)::SubArray
@@ -82,28 +82,28 @@ k = 13
         end
         if capability(device()) > v"5.0"
             @testset "gemmEx!" begin
-                A = rand(elty,m,k)
-                B = rand(elty,k,n)
-                C1 = rand(elty,m,n)
+                A = rand(elty, m, k)
+                B = rand(elty, k, n)
+                C1 = rand(elty, m, n)
                 d_A = CuArray(A)
                 d_B = CuArray(B)
                 d_C1 = CuArray(C1)
                 α = rand(elty)
                 β = rand(elty)
-                CUBLAS.gemmEx!('N','N',α,d_A,d_B,β,d_C1)
+                CUBLAS.gemmEx!('N', 'N', α, d_A, d_B, β, d_C1)
                 h_C1 = Array(d_C1)
-                C1 = (α*A)*B + β*C1
+                C1 = (α * A) * B + β * C1
                 # compare
                 @test C1 ≈ h_C1
             end
         end
         @testset "gemm" begin
-            A = rand(elty,m,k)
-            B = rand(elty,k,n)
+            A = rand(elty, m, k)
+            B = rand(elty, k, n)
             d_A = CuArray(A)
             d_B = CuArray(B)
-            d_C1 = CUBLAS.gemm('N','N',d_A,d_B)
-            C1 = A*B
+            d_C1 = CUBLAS.gemm('N', 'N', d_A, d_B)
+            C1 = A * B
             C2 = d_A * d_B
             # compare
             h_C1 = Array(d_C1)
@@ -114,50 +114,50 @@ k = 13
         @testset "symm!" begin
             alpha = rand(elty)
             beta = rand(elty)
-            sA = rand(elty,m,m)
+            sA = rand(elty, m, m)
             sA = sA + transpose(sA)
             dsA = CuArray(sA)
-            B = rand(elty,m,n)
-            C = rand(elty,m,n)
-            Bbad = rand(elty,m+1,n+1)
+            B = rand(elty, m, n)
+            C = rand(elty, m, n)
+            Bbad = rand(elty, m + 1, n + 1)
             d_B = CuArray(B)
             d_C = CuArray(C)
             d_Bbad = CuArray(Bbad)
-            CUBLAS.symm!('L','U',alpha,dsA,d_B,beta,d_C)
-            C = (alpha*sA)*B + beta*C
+            CUBLAS.symm!('L', 'U', alpha, dsA, d_B, beta, d_C)
+            C = (alpha * sA) * B + beta * C
             # compare
             h_C = Array(d_C)
             @test C ≈ h_C
-            @test_throws DimensionMismatch CUBLAS.symm!('L','U',alpha,dsA,d_Bbad,beta,d_C)
+            @test_throws DimensionMismatch CUBLAS.symm!('L', 'U', alpha, dsA, d_Bbad, beta, d_C)
         end
 
         @testset "symm" begin
-            sA = rand(elty,m,m)
+            sA = rand(elty, m, m)
             sA = sA + transpose(sA)
             dsA = CuArray(sA)
-            B = rand(elty,m,n)
-            C = rand(elty,m,n)
-            Bbad = rand(elty,m+1,n+1)
+            B = rand(elty, m, n)
+            C = rand(elty, m, n)
+            Bbad = rand(elty, m + 1, n + 1)
             d_B = CuArray(B)
             d_C = CuArray(C)
             d_Bbad = CuArray(Bbad)
-            d_C = CUBLAS.symm('L','U',dsA,d_B)
-            C = sA*B
+            d_C = CUBLAS.symm('L', 'U', dsA, d_B)
+            C = sA * B
             # compare
             h_C = Array(d_C)
             @test C ≈ h_C
-            @test_throws DimensionMismatch CUBLAS.symm('L','U',dsA,d_Bbad)
+            @test_throws DimensionMismatch CUBLAS.symm('L', 'U', dsA, d_Bbad)
         end
         @testset "trmm!" begin
             alpha = rand(elty)
             A = triu(rand(elty, m, m))
-            B = rand(elty,m,n)
-            C = zeros(elty,m,n)
+            B = rand(elty, m, n)
+            C = zeros(elty, m, n)
             dA = CuArray(A)
             dB = CuArray(B)
             dC = CuArray(C)
-            C = alpha*A*B
-            CUBLAS.trmm!('L','U','N','N',alpha,dA,dB,dC)
+            C = alpha * A * B
+            CUBLAS.trmm!('L', 'U', 'N', 'N', alpha, dA, dB, dC)
             # move to host and compare
             h_C = Array(dC)
             @test C ≈ h_C
@@ -165,23 +165,23 @@ k = 13
         @testset "trmm" begin
             alpha = rand(elty)
             A = triu(rand(elty, m, m))
-            B = rand(elty,m,n)
-            C = zeros(elty,m,n)
+            B = rand(elty, m, n)
+            C = zeros(elty, m, n)
             dA = CuArray(A)
             dB = CuArray(B)
             dC = CuArray(C)
-            C = alpha*A*B
-            d_C = CUBLAS.trmm('L','U','N','N',alpha,dA,dB)
+            C = alpha * A * B
+            d_C = CUBLAS.trmm('L', 'U', 'N', 'N', alpha, dA, dB)
             # move to host and compare
             h_C = Array(d_C)
             @test C ≈ h_C
         end
         @testset "triangular-dense mul!" begin
             A = triu(rand(elty, m, m))
-            B = rand(elty,m,n)
-            C = zeros(elty,m,n)
+            B = rand(elty, m, n)
+            C = zeros(elty, m, n)
 
-            sA = rand(elty,m,m)
+            sA = rand(elty, m, m)
             sA = sA + transpose(sA)
 
             for t in (identity, transpose, adjoint), TR in (UpperTriangular, LowerTriangular, UnitUpperTriangular, UnitLowerTriangular)
@@ -210,22 +210,22 @@ k = 13
         end
 
         @testset "triangular-triangular mul!" begin
-            A  = triu(rand(elty, m, m))
-            B  = triu(rand(elty, m, m))
-            C0 = zeros(elty,m,m)
+            A = triu(rand(elty, m, m))
+            B = triu(rand(elty, m, m))
+            C0 = zeros(elty, m, m)
 
-            sA = rand(elty,m,m)
+            sA = rand(elty, m, m)
             sA = sA + transpose(sA)
-            sB = rand(elty,m,m)
+            sB = rand(elty, m, m)
             sB = sB + transpose(sB)
 
             for (TRa, ta, TRb, tb, TRc) in (
-                (UpperTriangular, identity,  LowerTriangular, identity,  Matrix),
-                (LowerTriangular, identity,  UpperTriangular, identity,  Matrix),
-                (UpperTriangular, identity,  UpperTriangular, transpose, Matrix),
-                (UpperTriangular, transpose, UpperTriangular, identity,  Matrix),
-                (LowerTriangular, identity,  LowerTriangular, transpose, Matrix),
-                (LowerTriangular, transpose, LowerTriangular, identity,  Matrix),
+                    (UpperTriangular, identity, LowerTriangular, identity, Matrix),
+                    (LowerTriangular, identity, UpperTriangular, identity, Matrix),
+                    (UpperTriangular, identity, UpperTriangular, transpose, Matrix),
+                    (UpperTriangular, transpose, UpperTriangular, identity, Matrix),
+                    (LowerTriangular, identity, LowerTriangular, transpose, Matrix),
+                    (LowerTriangular, transpose, LowerTriangular, identity, Matrix),
                 )
 
                 A = copy(sA) |> TRa
@@ -251,28 +251,28 @@ k = 13
             @testset "hemm!" begin
                 alpha = rand(elty)
                 beta = rand(elty)
-                hA = rand(elty,m,m)
+                hA = rand(elty, m, m)
                 hA = hA + hA'
                 dhA = CuArray(hA)
-                B = rand(elty,m,n)
-                C = rand(elty,m,n)
+                B = rand(elty, m, n)
+                C = rand(elty, m, n)
                 d_B = CuArray(B)
                 d_C = CuArray(C)
                 # compute
-                C = alpha*(hA*B) + beta*C
-                CUBLAS.hemm!('L','L',alpha,dhA,d_B,beta,d_C)
+                C = alpha * (hA * B) + beta * C
+                CUBLAS.hemm!('L', 'L', alpha, dhA, d_B, beta, d_C)
                 # move to host and compare
                 h_C = Array(d_C)
                 @test C ≈ h_C
             end
             @testset "hemm" begin
-                hA = rand(elty,m,m)
+                hA = rand(elty, m, m)
                 hA = hA + hA'
                 dhA = CuArray(hA)
-                B = rand(elty,m,n)
+                B = rand(elty, m, n)
                 d_B = CuArray(B)
-                C = hA*B
-                d_C = CUBLAS.hemm('L','U',dhA,d_B)
+                C = hA * B
+                d_C = CUBLAS.hemm('L', 'U', dhA, d_B)
                 # move to host and compare
                 h_C = Array(d_C)
                 @test C ≈ h_C
@@ -285,62 +285,62 @@ k = 13
         alpha = rand(elty)
         beta = rand(elty)
         # generate matrices
-        bA = [rand(elty,m,k) for i in 1:10]
-        bB = [rand(elty,k,n) for i in 1:10]
-        bC = [rand(elty,m,n) for i in 1:10]
+        bA = [rand(elty, m, k) for i in 1:10]
+        bB = [rand(elty, k, n) for i in 1:10]
+        bC = [rand(elty, m, n) for i in 1:10]
         # move to device
         bd_A = CuArray{elty, 2}[]
         bd_B = CuArray{elty, 2}[]
         bd_C = CuArray{elty, 2}[]
         bd_bad = CuArray{elty, 2}[]
         for i in 1:length(bA)
-            push!(bd_A,CuArray(bA[i]))
-            push!(bd_B,CuArray(bB[i]))
-            push!(bd_C,CuArray(bC[i]))
+            push!(bd_A, CuArray(bA[i]))
+            push!(bd_B, CuArray(bB[i]))
+            push!(bd_C, CuArray(bC[i]))
             if i < length(bA) - 2
-                push!(bd_bad,CuArray(bC[i]))
+                push!(bd_bad, CuArray(bC[i]))
             end
         end
 
         @testset "gemm_batched!" begin
             # C = (alpha*A)*B + beta*C
-            CUBLAS.gemm_batched!('N','N',alpha,bd_A,bd_B,beta,bd_C)
+            CUBLAS.gemm_batched!('N', 'N', alpha, bd_A, bd_B, beta, bd_C)
             for i in 1:length(bd_C)
-                bC[i] = (alpha*bA[i])*bB[i] + beta*bC[i]
+                bC[i] = (alpha * bA[i]) * bB[i] + beta * bC[i]
                 h_C = Array(bd_C[i])
                 #compare
                 @test bC[i] ≈ h_C
             end
-            @test_throws DimensionMismatch CUBLAS.gemm_batched!('N','N',alpha,bd_A,bd_bad,beta,bd_C)
+            @test_throws DimensionMismatch CUBLAS.gemm_batched!('N', 'N', alpha, bd_A, bd_bad, beta, bd_C)
         end
 
         @testset "gemm_batched" begin
-            bd_C = CUBLAS.gemm_batched('N','N',bd_A,bd_B)
+            bd_C = CUBLAS.gemm_batched('N', 'N', bd_A, bd_B)
             for i in 1:length(bA)
-                bC[i] = bA[i]*bB[i]
+                bC[i] = bA[i] * bB[i]
                 h_C = Array(bd_C[i])
                 @test bC[i] ≈ h_C
             end
-            @test_throws DimensionMismatch CUBLAS.gemm_batched('N','N',alpha,bd_A,bd_bad)
+            @test_throws DimensionMismatch CUBLAS.gemm_batched('N', 'N', alpha, bd_A, bd_bad)
         end
 
         @testset "gemmBatchedEx!" begin
             # C = (alpha*A)*B + beta*C
-            CUBLAS.gemmBatchedEx!('N','N',alpha,bd_A,bd_B,beta,bd_C)
+            CUBLAS.gemmBatchedEx!('N', 'N', alpha, bd_A, bd_B, beta, bd_C)
             for i in 1:length(bd_C)
-                bC[i] = (alpha*bA[i])*bB[i] + beta*bC[i]
+                bC[i] = (alpha * bA[i]) * bB[i] + beta * bC[i]
                 h_C = Array(bd_C[i])
                 #compare
                 @test bC[i] ≈ h_C
             end
-            @test_throws DimensionMismatch CUBLAS.gemmBatchedEx!('N','N',alpha,bd_A,bd_bad,beta,bd_C)
+            @test_throws DimensionMismatch CUBLAS.gemmBatchedEx!('N', 'N', alpha, bd_A, bd_bad, beta, bd_C)
         end
 
         nbatch = 10
         bA = rand(elty, m, k, nbatch)
         bB = rand(elty, k, n, nbatch)
         bC = rand(elty, m, n, nbatch)
-        bbad = rand(elty, m+1, n+1, nbatch)
+        bbad = rand(elty, m + 1, n + 1, nbatch)
         # move to device
         bd_A = CuArray{elty, 3}(bA)
         bd_B = CuArray{elty, 3}(bB)
@@ -402,16 +402,16 @@ k = 13
             alpha = rand(elty, num_groups)
             beta = rand(elty, num_groups)
             # generate matrices
-            bA = [[rand(elty,3*i,2*i) for j in 1:group_sizes[i]] for i in 1:num_groups]
-            bB = [[rand(elty,2*i,5*i) for j in 1:group_sizes[i]] for i in 1:num_groups]
-            bC = [[rand(elty,3*i,5*i) for j in 1:group_sizes[i]] for i in 1:num_groups]
+            bA = [[rand(elty, 3 * i, 2 * i) for j in 1:group_sizes[i]] for i in 1:num_groups]
+            bB = [[rand(elty, 2 * i, 5 * i) for j in 1:group_sizes[i]] for i in 1:num_groups]
+            bC = [[rand(elty, 3 * i, 5 * i) for j in 1:group_sizes[i]] for i in 1:num_groups]
             # move to device
             bd_A = [[CuArray(bA[i][j]) for j in 1:group_sizes[i]] for i in 1:num_groups]
             bd_B = [[CuArray(bB[i][j]) for j in 1:group_sizes[i]] for i in 1:num_groups]
             bd_C = [[CuArray(bC[i][j]) for j in 1:group_sizes[i]] for i in 1:num_groups]
             @testset "gemm_grouped_batched!" begin
                 # C = (alpha*A)*B + beta*C
-                CUBLAS.gemm_grouped_batched!(transA,transB,alpha,bd_A,bd_B,beta,bd_C)
+                CUBLAS.gemm_grouped_batched!(transA, transB, alpha, bd_A, bd_B, beta, bd_C)
                 for i in 1:num_groups, j in 1:group_sizes[i]
                     bC[i][j] = alpha[i] * bA[i][j] * bB[i][j] + beta[i] * bC[i][j]
                     h_C = Array(bd_C[i][j])
@@ -420,7 +420,7 @@ k = 13
             end
 
             @testset "gemm_grouped_batched" begin
-                bd_C = CUBLAS.gemm_grouped_batched(transA,transB,bd_A,bd_B)
+                bd_C = CUBLAS.gemm_grouped_batched(transA, transB, bd_A, bd_B)
                 for i in 1:num_groups, j in 1:group_sizes[i]
                     bC[i][j] = bA[i][j] * bB[i][j]
                     h_C = Array(bd_C[i][j])
@@ -439,22 +439,22 @@ k = 13
             alpha = rand(elty, 10)
             beta = rand(elty, 10)
             # generate matrices
-            bA = [rand(elty,3*i,2*i) for i in 1:10]
-            bB = [rand(elty,2*i,5*i) for i in 1:10]
-            bC = [rand(elty,3*i,5*i) for i in 1:10]
+            bA = [rand(elty, 3 * i, 2 * i) for i in 1:10]
+            bB = [rand(elty, 2 * i, 5 * i) for i in 1:10]
+            bC = [rand(elty, 3 * i, 5 * i) for i in 1:10]
             # move to device
             bd_A = CuArray{elty, 2}[]
             bd_B = CuArray{elty, 2}[]
             bd_C = CuArray{elty, 2}[]
             for i in 1:length(bA)
-                push!(bd_A,CuArray(bA[i]))
-                push!(bd_B,CuArray(bB[i]))
-                push!(bd_C,CuArray(bC[i]))
+                push!(bd_A, CuArray(bA[i]))
+                push!(bd_B, CuArray(bB[i]))
+                push!(bd_C, CuArray(bC[i]))
             end
 
             @testset "gemm_grouped_batched!" begin
                 # C = (alpha*A)*B + beta*C
-                CUBLAS.gemm_grouped_batched!(transA,transB,alpha,bd_A,bd_B,beta,bd_C)
+                CUBLAS.gemm_grouped_batched!(transA, transB, alpha, bd_A, bd_B, beta, bd_C)
                 for i in 1:length(bd_C)
                     bC[i] = alpha[i] * bA[i] * bB[i] + beta[i] * bC[i]
                     h_C = Array(bd_C[i])
@@ -463,7 +463,7 @@ k = 13
             end
 
             @testset "gemm_grouped_batched" begin
-                bd_C = CUBLAS.gemm_grouped_batched(transA,transB,bd_A,bd_B)
+                bd_C = CUBLAS.gemm_grouped_batched(transA, transB, bd_A, bd_B)
                 for i in 1:length(bd_C)
                     bC[i] = bA[i] * bB[i]
                     h_C = Array(bd_C[i])
@@ -474,19 +474,21 @@ k = 13
     end
 
     @testset "mixed-precision matmul" begin
-        m,k,n = 4,4,4
-        cudaTypes = (Float16, Complex{Float16}, BFloat16, Complex{BFloat16}, Float32, Complex{Float32},
-                    Float64, Complex{Float64}, Int8, Complex{Int8}, UInt8, Complex{UInt8},
-                    Int16, Complex{Int16}, UInt16, Complex{UInt16}, Int32, Complex{Int32},
-                    UInt32, Complex{UInt32}, Int64, Complex{Int64}, UInt64, Complex{UInt64})
+        m, k, n = 4, 4, 4
+        cudaTypes = (
+            Float16, Complex{Float16}, BFloat16, Complex{BFloat16}, Float32, Complex{Float32},
+            Float64, Complex{Float64}, Int8, Complex{Int8}, UInt8, Complex{UInt8},
+            Int16, Complex{Int16}, UInt16, Complex{UInt16}, Int32, Complex{Int32},
+            UInt32, Complex{UInt32}, Int64, Complex{Int64}, UInt64, Complex{UInt64},
+        )
 
         for AT in cudaTypes, CT in cudaTypes
             BT = AT # gemmEx requires identical A and B types
 
             # we only test combinations of types that are supported by gemmEx
-            if CUBLAS.gemmExComputeType(AT, BT, CT, m,k,n) !== nothing
-                A = AT <: BFloat16 ? AT.(rand(m,k)) : rand(AT, m,k)
-                B = BT <: BFloat16 ? BT.(rand(k,n)) : rand(BT, k,n)
+            if CUBLAS.gemmExComputeType(AT, BT, CT, m, k, n) !== nothing
+                A = AT <: BFloat16 ? AT.(rand(m, k)) : rand(AT, m, k)
+                B = BT <: BFloat16 ? BT.(rand(k, n)) : rand(BT, k, n)
                 C = similar(B, CT)
                 mul!(C, A, B)
 
@@ -501,18 +503,18 @@ k = 13
                 mul!(dC, dA, dB)
 
                 rtol = Base.rtoldefault(AT, BT, 0)
-                @test C ≈ Array(dC) rtol=rtol
+                @test C ≈ Array(dC) rtol = rtol
             end
         end
 
         # also test an unsupported combination (falling back to GPUArrays)
         if VERSION < v"1.11-"   # JuliaGPU/CUDA.jl#2441
-            AT=BFloat16
-            BT=Int32
-            CT=Float64
+            AT = BFloat16
+            BT = Int32
+            CT = Float64
 
-            A = AT.(rand(m,k))
-            B = rand(BT, k,n)
+            A = AT.(rand(m, k))
+            B = rand(BT, k, n)
             C = similar(B, CT)
             mul!(C, A, B)
 
@@ -522,15 +524,15 @@ k = 13
             mul!(dC, dA, dB)
 
             rtol = Base.rtoldefault(AT, BT, 0)
-            @test C ≈ Array(dC) rtol=rtol
+            @test C ≈ Array(dC) rtol = rtol
         end
     end
 
     @testset "gemm! with strided inputs" begin # JuliaGPU/CUDA.jl#78
         inn = 784; out = 32
-        testf(randn(784*100), rand(Float32, 784, 100)) do p, x
-            p[reshape(1:(out*inn),out,inn)] * x
-            @view(p[reshape(1:(out*inn),out,inn)]) * x
+        testf(randn(784 * 100), rand(Float32, 784, 100)) do p, x
+            p[reshape(1:(out * inn), out, inn)] * x
+            @view(p[reshape(1:(out * inn), out, inn)]) * x
         end
     end
 end
diff --git a/test/libraries/cublas/xt.jl b/test/libraries/cublas/xt.jl
index 0f7f8d098..51e7339df 100644
--- a/test/libraries/cublas/xt.jl
+++ b/test/libraries/cublas/xt.jl
@@ -20,13 +20,13 @@ k = 13
         @testset "xt_trmm! gpu" begin
             alpha = rand(elty)
             A = triu(rand(elty, m, m))
-            B = rand(elty,m,n)
-            C = zeros(elty,m,n)
+            B = rand(elty, m, n)
+            C = zeros(elty, m, n)
             dA = CuArray(A)
             dB = CuArray(B)
             dC = CuArray(C)
-            C = alpha*A*B
-            CUBLAS.xt_trmm!('L','U','N','N',alpha,dA,dB,dC)
+            C = alpha * A * B
+            CUBLAS.xt_trmm!('L', 'U', 'N', 'N', alpha, dA, dB, dC)
             # move to host and compare
             h_C = Array(dC)
             @test C ≈ h_C
@@ -34,22 +34,22 @@ k = 13
         @testset "xt_trmm! cpu" begin
             alpha = rand(elty)
             A = triu(rand(elty, m, m))
-            B = rand(elty,m,n)
-            C = alpha*A*B
+            B = rand(elty, m, n)
+            C = alpha * A * B
             h_C = zeros(elty, m, n)
-            CUBLAS.xt_trmm!('L','U','N','N',alpha,copy(A),copy(B),h_C)
+            CUBLAS.xt_trmm!('L', 'U', 'N', 'N', alpha, copy(A), copy(B), h_C)
             @test C ≈ h_C
         end
         @testset "xt_trmm gpu" begin
             alpha = rand(elty)
             A = triu(rand(elty, m, m))
-            B = rand(elty,m,n)
-            C = zeros(elty,m,n)
+            B = rand(elty, m, n)
+            C = zeros(elty, m, n)
             dA = CuArray(A)
             dB = CuArray(B)
             dC = CuArray(C)
-            C = alpha*A*B
-            d_C = CUBLAS.xt_trmm('L','U','N','N',alpha,dA,dB)
+            C = alpha * A * B
+            d_C = CUBLAS.xt_trmm('L', 'U', 'N', 'N', alpha, dA, dB)
             # move to host and compare
             @test d_C isa CuArray
             h_C = Array(d_C)
@@ -58,9 +58,9 @@ k = 13
         @testset "xt_trmm cpu" begin
             alpha = rand(elty)
             A = triu(rand(elty, m, m))
-            B = rand(elty,m,n)
-            C = alpha*A*B
-            h_C = CUBLAS.xt_trmm('L','U','N','N',alpha,copy(A),copy(B))
+            B = rand(elty, m, n)
+            C = alpha * A * B
+            h_C = CUBLAS.xt_trmm('L', 'U', 'N', 'N', alpha, copy(A), copy(B))
             @test h_C isa Array
             @test C ≈ h_C
         end
@@ -68,13 +68,13 @@ k = 13
         @testset "xt_trsm! gpu" begin
             alpha = rand(elty)
             A = triu(rand(elty, m, m))
-            B = rand(elty,m,n)
+            B = rand(elty, m, n)
             dA = CuArray(A)
             dB = CuArray(B)
-            C = alpha*(A\B)
+            C = alpha * (A \ B)
             dC = copy(dB)
             synchronize()
-            CUBLAS.xt_trsm!('L','U','N','N',alpha,dA,dC)
+            CUBLAS.xt_trsm!('L', 'U', 'N', 'N', alpha, dA, dC)
             # move to host and compare
             h_C = Array(dC)
             @test C ≈ h_C
@@ -82,16 +82,16 @@ k = 13
         @testset "xt_symm! gpu" begin
             alpha = rand(elty)
             beta = rand(elty)
-            sA = rand(elty,m,m)
+            sA = rand(elty, m, m)
             sA = sA + transpose(sA)
             dsA = CuArray(sA)
-            B = rand(elty,m,n)
-            C = rand(elty,m,n)
-            Bbad = rand(elty,m+1,n+1)
+            B = rand(elty, m, n)
+            C = rand(elty, m, n)
+            Bbad = rand(elty, m + 1, n + 1)
             d_B = CuArray(B)
             d_C = CuArray(C)
-            CUBLAS.xt_symm!('L','U',alpha,dsA,d_B,beta,d_C)
-            C = (alpha*sA)*B + beta*C
+            CUBLAS.xt_symm!('L', 'U', alpha, dsA, d_B, beta, d_C)
+            C = (alpha * sA) * B + beta * C
             # compare
             h_C = Array(d_C)
             @test C ≈ h_C
@@ -99,36 +99,36 @@ k = 13
         @testset "xt_symm! cpu" begin
             alpha = rand(elty)
             beta = rand(elty)
-            sA = rand(elty,m,m)
+            sA = rand(elty, m, m)
             sA = sA + transpose(sA)
-            B = rand(elty,m,n)
-            C = rand(elty,m,n)
-            h_C = copy(C) 
-            CUBLAS.xt_symm!('L','U',alpha,copy(sA),copy(B),beta,h_C)
-            C = (alpha*sA)*B + beta*C
+            B = rand(elty, m, n)
+            C = rand(elty, m, n)
+            h_C = copy(C)
+            CUBLAS.xt_symm!('L', 'U', alpha, copy(sA), copy(B), beta, h_C)
+            C = (alpha * sA) * B + beta * C
             # compare
             @test C ≈ h_C
         end
 
         @testset "xt_symm gpu" begin
-            sA = rand(elty,m,m)
+            sA = rand(elty, m, m)
             sA = sA + transpose(sA)
             dsA = CuArray(sA)
-            B = rand(elty,m,n)
+            B = rand(elty, m, n)
             d_B = CuArray(B)
-            d_C = CUBLAS.xt_symm('L','U',dsA,d_B)
-            C = sA*B
+            d_C = CUBLAS.xt_symm('L', 'U', dsA, d_B)
+            C = sA * B
             # compare
             @test d_C isa CuArray
             h_C = Array(d_C)
             @test C ≈ h_C
         end
         @testset "xt_symm cpu" begin
-            sA = rand(elty,m,m)
+            sA = rand(elty, m, m)
             sA = sA + transpose(sA)
-            B = rand(elty,m,n)
-            h_C = CUBLAS.xt_symm('L','U',copy(sA),copy(B))
-            C = sA*B
+            B = rand(elty, m, n)
+            h_C = CUBLAS.xt_symm('L', 'U', copy(sA), copy(B))
+            C = sA * B
             # compare
             @test h_C isa Array
             @test C ≈ h_C
@@ -136,53 +136,53 @@ k = 13
         @testset "xt_gemm! gpu" begin
             alpha = rand(elty)
             beta = rand(elty)
-            A = rand(elty,m,k)
-            B = rand(elty,k,n)
-            C1  = rand(elty,m,n)
-            C2  = copy(C1) 
+            A = rand(elty, m, k)
+            B = rand(elty, k, n)
+            C1 = rand(elty, m, n)
+            C2 = copy(C1)
             d_A = CuArray(A)
             d_B = CuArray(B)
-            Bbad = rand(elty,k+1,n+1)
+            Bbad = rand(elty, k + 1, n + 1)
             d_Bbad = CuArray(Bbad)
             d_C1 = CuArray(C1)
             d_C2 = CuArray(C2)
-            @test_throws DimensionMismatch CUBLAS.xt_gemm!('N','N',alpha,d_A,d_Bbad,beta,d_C1)
+            @test_throws DimensionMismatch CUBLAS.xt_gemm!('N', 'N', alpha, d_A, d_Bbad, beta, d_C1)
             synchronize()
-            CUBLAS.xt_gemm!('N','N',alpha,d_A,d_B,beta,d_C1)
+            CUBLAS.xt_gemm!('N', 'N', alpha, d_A, d_B, beta, d_C1)
             mul!(d_C2, d_A, d_B)
             h_C1 = Array(d_C1)
             h_C2 = Array(d_C2)
-            C1 = (alpha*A)*B + beta*C1
-            C2 = A*B
+            C1 = (alpha * A) * B + beta * C1
+            C2 = A * B
             # compare
             @test C1 ≈ h_C1
             @test C2 ≈ h_C2
         end
         @testset "xt_gemm! cpu" begin
             alpha = rand(elty)
-            beta  = rand(elty)
-            A     = rand(elty,m,k)
-            B     = rand(elty,k,n)
-            C1    = rand(elty,m,n)
-            C2    = copy(C1)
-            C3    = copy(C1)
-            C4    = copy(C2)
-            CUBLAS.xt_gemm!('N','N',alpha,A,B,beta,C1)
+            beta = rand(elty)
+            A = rand(elty, m, k)
+            B = rand(elty, k, n)
+            C1 = rand(elty, m, n)
+            C2 = copy(C1)
+            C3 = copy(C1)
+            C4 = copy(C2)
+            CUBLAS.xt_gemm!('N', 'N', alpha, A, B, beta, C1)
             mul!(C2, A, B)
-            C3 = (alpha*A)*B + beta*C3
-            C4 = A*B
+            C3 = (alpha * A) * B + beta * C3
+            C4 = A * B
             # compare
             @test C1 ≈ C3
             @test C2 ≈ C4
         end
         @testset "xt_gemm gpu" begin
-            A = rand(elty,m,k)
-            B = rand(elty,k,n)
+            A = rand(elty, m, k)
+            B = rand(elty, k, n)
             d_A = CuArray(A)
             d_B = CuArray(B)
             synchronize()
-            d_C = CUBLAS.xt_gemm('N','N',d_A,d_B)
-            C  = A*B
+            d_C = CUBLAS.xt_gemm('N', 'N', d_A, d_B)
+            C = A * B
             C2 = d_A * d_B
             # compare
             @test d_C isa CuArray
@@ -192,34 +192,34 @@ k = 13
             @test C ≈ h_C2
         end
         @testset "xt_gemm cpu" begin
-            A = rand(elty,m,k)
-            B = rand(elty,k,n)
-            C = CUBLAS.xt_gemm('N','N',A,B)
-            C2  = A*B
+            A = rand(elty, m, k)
+            B = rand(elty, k, n)
+            C = CUBLAS.xt_gemm('N', 'N', A, B)
+            C2 = A * B
             # compare
             @test C isa Array
-            @test C ≈ A*B
+            @test C ≈ A * B
             @test C ≈ C2
         end
         @testset "xt_trsm! cpu" begin
             alpha = rand(elty)
             A = triu(rand(elty, m, m))
-            B = rand(elty,m,n)
-            C = alpha*(A\B)
+            B = rand(elty, m, n)
+            C = alpha * (A \ B)
             h_C = copy(B)
             synchronize()
-            CUBLAS.xt_trsm!('L','U','N','N',alpha,copy(A),h_C)
+            CUBLAS.xt_trsm!('L', 'U', 'N', 'N', alpha, copy(A), h_C)
             @test C ≈ h_C
         end
         @testset "xt_trsm gpu" begin
             alpha = rand(elty)
             A = triu(rand(elty, m, m))
-            B = rand(elty,m,n)
+            B = rand(elty, m, n)
             dA = CuArray(A)
             dB = CuArray(B)
-            C  = alpha*(A\B)
+            C = alpha * (A \ B)
             synchronize()
-            dC = CUBLAS.xt_trsm('L','U','N','N',alpha,dA,dB)
+            dC = CUBLAS.xt_trsm('L', 'U', 'N', 'N', alpha, dA, dB)
             # move to host and compare
             @test dC isa CuArray
             h_C = Array(dC)
@@ -228,10 +228,10 @@ k = 13
         @testset "xt_trsm cpu" begin
             alpha = rand(elty)
             A = triu(rand(elty, m, m))
-            B = rand(elty,m,n)
-            C  = alpha*(A\B)
+            B = rand(elty, m, n)
+            C = alpha * (A \ B)
             synchronize()
-            h_C = CUBLAS.xt_trsm('L','U','N','N',alpha,copy(A),copy(B))
+            h_C = CUBLAS.xt_trsm('L', 'U', 'N', 'N', alpha, copy(A), copy(B))
             @test h_C isa Array
             @test C ≈ h_C
         end
@@ -248,17 +248,17 @@ k = 13
             d_syrkx_C = CuArray(syrkx_C)
             # C = (alpha*A)*transpose(B) + beta*C
             synchronize()
-            d_syrkx_C = CUBLAS.xt_syrkx!('U','N',alpha,d_syrkx_A,d_syrkx_B,beta,d_syrkx_C)
-            final_C = (alpha*syrkx_A)*transpose(syrkx_B) + beta*syrkx_C
+            d_syrkx_C = CUBLAS.xt_syrkx!('U', 'N', alpha, d_syrkx_A, d_syrkx_B, beta, d_syrkx_C)
+            final_C = (alpha * syrkx_A) * transpose(syrkx_B) + beta * syrkx_C
             # move to host and compare
             h_C = Array(d_syrkx_C)
             @test triu(final_C) ≈ triu(h_C)
             badC = rand(elty, m, n)
             d_badC = CuArray(badC)
-            @test_throws DimensionMismatch CUBLAS.xt_syrkx!('U','N',alpha,d_syrkx_A,d_syrkx_B,beta,d_badC)
-            badC = rand(elty, n+1, n+1)
+            @test_throws DimensionMismatch CUBLAS.xt_syrkx!('U', 'N', alpha, d_syrkx_A, d_syrkx_B, beta, d_badC)
+            badC = rand(elty, n + 1, n + 1)
             d_badC = CuArray(badC)
-            @test_throws DimensionMismatch CUBLAS.xt_syrkx!('U','N',alpha,d_syrkx_A,d_syrkx_B,beta,d_badC)
+            @test_throws DimensionMismatch CUBLAS.xt_syrkx!('U', 'N', alpha, d_syrkx_A, d_syrkx_B, beta, d_badC)
         end
         @testset "xt_syrkx! cpu" begin
             alpha = rand(elty)
@@ -268,8 +268,8 @@ k = 13
             syrkx_B = rand(elty, n, k)
             syrkx_C = rand(elty, n, n)
             syrkx_C += syrkx_C'
-            final_C = (alpha*syrkx_A)*transpose(syrkx_B) + beta*syrkx_C
-            CUBLAS.xt_syrkx!('U','N',alpha,syrkx_A,syrkx_B,beta,syrkx_C)
+            final_C = (alpha * syrkx_A) * transpose(syrkx_B) + beta * syrkx_C
+            CUBLAS.xt_syrkx!('U', 'N', alpha, syrkx_A, syrkx_B, beta, syrkx_C)
             # move to host and compare
             @test triu(final_C) ≈ triu(syrkx_C)
         end
@@ -280,8 +280,8 @@ k = 13
             d_syrkx_A = CuArray(syrkx_A)
             d_syrkx_B = CuArray(syrkx_B)
             synchronize()
-            d_syrkx_C = CUBLAS.xt_syrkx('U','N',d_syrkx_A,d_syrkx_B)
-            final_C = syrkx_A*transpose(syrkx_B)
+            d_syrkx_C = CUBLAS.xt_syrkx('U', 'N', d_syrkx_A, d_syrkx_B)
+            final_C = syrkx_A * transpose(syrkx_B)
             # move to host and compare
             @test d_syrkx_C isa CuArray
             h_C = Array(d_syrkx_C)
@@ -291,17 +291,17 @@ k = 13
             # generate matrices
             syrkx_A = rand(elty, n, k)
             syrkx_B = rand(elty, n, k)
-            h_C = CUBLAS.xt_syrkx('U','N',syrkx_A,syrkx_B)
-            final_C = syrkx_A*transpose(syrkx_B)
+            h_C = CUBLAS.xt_syrkx('U', 'N', syrkx_A, syrkx_B)
+            final_C = syrkx_A * transpose(syrkx_B)
             @test h_C isa Array
             @test triu(final_C) ≈ triu(h_C)
         end
         @testset "xt_syrk gpu" begin
             # C = A*transpose(A)
-            A = rand(elty,m,k)
+            A = rand(elty, m, k)
             d_A = CuArray(A)
-            d_C = CUBLAS.xt_syrk('U','N',d_A)
-            C = A*transpose(A)
+            d_C = CUBLAS.xt_syrk('U', 'N', d_A)
+            C = A * transpose(A)
             C = triu(C)
             # move to host and compare
             @test d_C isa CuArray
@@ -310,10 +310,10 @@ k = 13
             @test C ≈ h_C
         end
         @testset "xt_syrk cpu" begin
-            A = rand(elty,m,k)
+            A = rand(elty, m, k)
             # C = A*transpose(A)
-            h_C = CUBLAS.xt_syrk('U','N',copy(A))
-            C = A*transpose(A)
+            h_C = CUBLAS.xt_syrk('U', 'N', copy(A))
+            C = A * transpose(A)
             C = triu(C)
             # move to host and compare
             @test h_C isa Array
@@ -324,16 +324,16 @@ k = 13
             @testset "xt_hemm! gpu" begin
                 alpha = rand(elty)
                 beta = rand(elty)
-                hA = rand(elty,m,m)
+                hA = rand(elty, m, m)
                 hA = hA + hA'
                 dhA = CuArray(hA)
-                B = rand(elty,m,n)
-                C = rand(elty,m,n)
+                B = rand(elty, m, n)
+                C = rand(elty, m, n)
                 d_B = CuArray(B)
                 d_C = CuArray(C)
                 # compute
-                C = alpha*(hA*B) + beta*C
-                CUBLAS.xt_hemm!('L','L',alpha,dhA,d_B,beta,d_C)
+                C = alpha * (hA * B) + beta * C
+                CUBLAS.xt_hemm!('L', 'L', alpha, dhA, d_B, beta, d_C)
                 # move to host and compare
                 h_C = Array(d_C)
                 @test C ≈ h_C
@@ -341,35 +341,35 @@ k = 13
             @testset "xt_hemm! cpu" begin
                 alpha = rand(elty)
                 beta = rand(elty)
-                hA = rand(elty,m,m)
+                hA = rand(elty, m, m)
                 hA = hA + hA'
-                B = rand(elty,m,n)
-                C = rand(elty,m,n)
+                B = rand(elty, m, n)
+                C = rand(elty, m, n)
                 # compute
                 h_C = copy(C)
-                C = alpha*(hA*B) + beta*C
-                CUBLAS.xt_hemm!('L','L',alpha,copy(hA),copy(B),beta,h_C)
+                C = alpha * (hA * B) + beta * C
+                CUBLAS.xt_hemm!('L', 'L', alpha, copy(hA), copy(B), beta, h_C)
                 @test C ≈ h_C
             end
             @testset "xt_hemm gpu" begin
-                hA  = rand(elty,m,m)
-                hA  = hA + hA'
+                hA = rand(elty, m, m)
+                hA = hA + hA'
                 dhA = CuArray(hA)
-                B   = rand(elty,m,n)
+                B = rand(elty, m, n)
                 d_B = CuArray(B)
-                C   = hA*B
-                d_C = CUBLAS.xt_hemm('L','U',dhA, d_B)
+                C = hA * B
+                d_C = CUBLAS.xt_hemm('L', 'U', dhA, d_B)
                 # move to host and compare
                 @test d_C isa CuArray
                 h_C = Array(d_C)
                 @test C ≈ h_C
             end
             @testset "xt_hemm cpu" begin
-                hA = rand(elty,m,m)
+                hA = rand(elty, m, m)
                 hA = hA + hA'
-                B = rand(elty,m,n)
-                C   = hA*B
-                h_C = CUBLAS.xt_hemm('L','U',copy(hA), copy(B))
+                B = rand(elty, m, n)
+                C = hA * B
+                h_C = CUBLAS.xt_hemm('L', 'U', copy(hA), copy(B))
                 # move to host and compare
                 @test h_C isa Array
                 @test C ≈ h_C
@@ -377,13 +377,13 @@ k = 13
             @testset "xt_herk! gpu" begin
                 alpha = rand(elty)
                 beta = rand(elty)
-                A = rand(elty,m,m)
+                A = rand(elty, m, m)
                 hA = A + A'
-                C = real(alpha)*(A*A') + real(beta)*copy(hA)
+                C = real(alpha) * (A * A') + real(beta) * copy(hA)
                 d_A = CuArray(A)
                 d_C = CuArray(hA)
                 synchronize()
-                CUBLAS.xt_herk!('U','N',real(alpha),d_A,real(beta),d_C)
+                CUBLAS.xt_herk!('U', 'N', real(alpha), d_A, real(beta), d_C)
                 C = triu(C)
                 # move to host and compare
                 h_C = Array(d_C)
@@ -393,22 +393,22 @@ k = 13
             @testset "xt_herk! cpu" begin
                 alpha = rand(elty)
                 beta = rand(elty)
-                A = rand(elty,m,m)
+                A = rand(elty, m, m)
                 hA = A + A'
                 h_C = copy(hA)
-                CUBLAS.xt_herk!('U','N',real(alpha),copy(A),real(beta),h_C)
-                C = real(alpha)*(A*A') + real(beta)*copy(hA)
+                CUBLAS.xt_herk!('U', 'N', real(alpha), copy(A), real(beta), h_C)
+                C = real(alpha) * (A * A') + real(beta) * copy(hA)
                 C = triu(C)
                 # move to host and compare
                 h_C = triu(h_C)
                 @test C ≈ h_C
             end
             @testset "xt_herk gpu" begin
-                A = rand(elty,m,m)
+                A = rand(elty, m, m)
                 d_A = CuArray(A)
                 synchronize()
-                d_C = CUBLAS.xt_herk('U','N',d_A)
-                C = A*A'
+                d_C = CUBLAS.xt_herk('U', 'N', d_A)
+                C = A * A'
                 C = triu(C)
                 # move to host and compare
                 @test d_C isa CuArray
@@ -417,9 +417,9 @@ k = 13
                 @test C ≈ h_C
             end
             @testset "xt_herk cpu" begin
-                A = rand(elty,m,m)
-                h_C = CUBLAS.xt_herk('U','N',copy(A))
-                C = A*A'
+                A = rand(elty, m, m)
+                h_C = CUBLAS.xt_herk('U', 'N', copy(A))
+                C = A * A'
                 C = triu(C)
                 # move to host and compare
                 @test h_C isa Array
@@ -432,16 +432,16 @@ k = 13
                 # generate parameters
                 α = rand(elty1)
                 β = rand(elty2)
-                A = rand(elty,m,k)
-                B = rand(elty,m,k)
+                A = rand(elty, m, k)
+                B = rand(elty, m, k)
                 d_A = CuArray(A)
                 d_B = CuArray(B)
-                C = rand(elty,m,m)
+                C = rand(elty, m, m)
                 C = C + C'
                 d_C = CuArray(C)
-                C = α*(A*B') + conj(α)*(B*A') + β*C
+                C = α * (A * B') + conj(α) * (B * A') + β * C
                 synchronize()
-                CUBLAS.xt_her2k!('U','N',α,d_A,d_B,β,d_C)
+                CUBLAS.xt_her2k!('U', 'N', α, d_A, d_B, β, d_C)
                 # move back to host and compare
                 C = triu(C)
                 h_C = Array(d_C)
@@ -454,13 +454,13 @@ k = 13
                 # generate parameters
                 α = rand(elty1)
                 β = rand(elty2)
-                A = rand(elty,m,k)
-                B = rand(elty,m,k)
-                C = rand(elty,m,m)
+                A = rand(elty, m, k)
+                B = rand(elty, m, k)
+                C = rand(elty, m, m)
                 C = C + C'
                 h_C = copy(C)
-                C = α*(A*B') + conj(α)*(B*A') + β*C
-                CUBLAS.xt_her2k!('U','N',α,A,B,β,h_C)
+                C = α * (A * B') + conj(α) * (B * A') + β * C
+                CUBLAS.xt_her2k!('U', 'N', α, A, B, β, h_C)
                 # move back to host and compare
                 C = triu(C)
                 h_C = triu(h_C)
@@ -468,15 +468,15 @@ k = 13
             end
             @testset "xt_her2k gpu" begin
                 # generate parameters
-                A = rand(elty,m,k)
-                B = rand(elty,m,k)
+                A = rand(elty, m, k)
+                B = rand(elty, m, k)
                 d_A = CuArray(A)
                 d_B = CuArray(B)
-                C = rand(elty,m,m)
+                C = rand(elty, m, m)
                 C = C + C'
-                C = (A*B') + (B*A')
+                C = (A * B') + (B * A')
                 synchronize()
-                d_C = CUBLAS.xt_her2k('U','N',d_A,d_B)
+                d_C = CUBLAS.xt_her2k('U', 'N', d_A, d_B)
                 # move back to host and compare
                 C = triu(C)
                 @test d_C isa CuArray
@@ -485,13 +485,13 @@ k = 13
                 @test C ≈ h_C
             end
             @testset "xt_her2k cpu" begin
-                A = rand(elty,m,k)
-                B = rand(elty,m,k)
-                C = rand(elty,m,m)
+                A = rand(elty, m, k)
+                B = rand(elty, m, k)
+                C = rand(elty, m, m)
                 # generate parameters
                 C = C + C'
-                C = (A*B') + (B*A')
-                h_C = CUBLAS.xt_her2k('U','N',A,B)
+                C = (A * B') + (B * A')
+                h_C = CUBLAS.xt_her2k('U', 'N', A, B)
                 # move back to host and compare
                 @test h_C isa Array
                 C = triu(C)
@@ -499,12 +499,12 @@ k = 13
                 @test C ≈ h_C
             end
             @testset "her2k" begin
-                A = rand(elty,m,k)
-                B = rand(elty,m,k)
+                A = rand(elty, m, k)
+                B = rand(elty, m, k)
                 d_A = CuArray(A)
                 d_B = CuArray(B)
-                C = A*B' + B*A'
-                d_C = CUBLAS.her2k('U','N',d_A,d_B)
+                C = A * B' + B * A'
+                d_C = CUBLAS.her2k('U', 'N', d_A, d_B)
                 # move back to host and compare
                 C = triu(C)
                 h_C = Array(d_C)

Copy link

codecov bot commented Jan 24, 2025

Codecov Report

Attention: Patch coverage is 0% with 8 lines in your changes missing coverage. Please review.

Project coverage is 73.61%. Comparing base (4bec614) to head (055d1ed).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
lib/cublas/linalg.jl 0.00% 4 Missing ⚠️
lib/cusparse/linalg.jl 0.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2610      +/-   ##
==========================================
- Coverage   73.61%   73.61%   -0.01%     
==========================================
  Files         157      157              
  Lines       15223    15227       +4     
==========================================
+ Hits        11207    11209       +2     
- Misses       4016     4018       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@maleadt maleadt merged commit 159345f into master Jan 25, 2025
3 checks passed
@maleadt maleadt deleted the ksh/gemm branch January 25, 2025 07:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants