-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split out level 3 gemm tests #2610
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA.jl Benchmarks
Benchmark suite | Current: 055d1ed | Previous: 4bec614 | Ratio |
---|---|---|---|
latency/precompile |
45467516551.5 ns |
45396234276 ns |
1.00 |
latency/ttfp |
6292609157.5 ns |
6416277525.5 ns |
0.98 |
latency/import |
2969512039 ns |
3047951471 ns |
0.97 |
integration/volumerhs |
9556104 ns |
9572210 ns |
1.00 |
integration/byval/slices=1 |
146714 ns |
146689 ns |
1.00 |
integration/byval/slices=3 |
424987 ns |
424769 ns |
1.00 |
integration/byval/reference |
144753 ns |
144911 ns |
1.00 |
integration/byval/slices=2 |
285749.5 ns |
285674 ns |
1.00 |
integration/cudadevrt |
103213 ns |
103228 ns |
1.00 |
kernel/indexing |
14056 ns |
13962 ns |
1.01 |
kernel/indexing_checked |
14736 ns |
14556 ns |
1.01 |
kernel/occupancy |
641.5117647058823 ns |
693.384105960265 ns |
0.93 |
kernel/launch |
2064.8500000000004 ns |
2164.166666666667 ns |
0.95 |
kernel/rand |
15370 ns |
14418 ns |
1.07 |
array/reverse/1d |
19264.5 ns |
19581 ns |
0.98 |
array/reverse/2d |
24785 ns |
24389 ns |
1.02 |
array/reverse/1d_inplace |
10279 ns |
10606.666666666666 ns |
0.97 |
array/reverse/2d_inplace |
11737 ns |
11144 ns |
1.05 |
array/copy |
20689 ns |
20336 ns |
1.02 |
array/iteration/findall/int |
155191.5 ns |
156856.5 ns |
0.99 |
array/iteration/findall/bool |
134163.5 ns |
135569 ns |
0.99 |
array/iteration/findfirst/int |
153411.5 ns |
153474.5 ns |
1.00 |
array/iteration/findfirst/bool |
152939 ns |
152950 ns |
1.00 |
array/iteration/scalar |
61964 ns |
60882 ns |
1.02 |
array/iteration/logical |
195342 ns |
202672 ns |
0.96 |
array/iteration/findmin/1d |
37671 ns |
37856 ns |
1.00 |
array/iteration/findmin/2d |
93676 ns |
93737 ns |
1.00 |
array/reductions/reduce/1d |
39152.5 ns |
38166 ns |
1.03 |
array/reductions/reduce/2d |
51078 ns |
51122 ns |
1.00 |
array/reductions/mapreduce/1d |
36169.5 ns |
31151.5 ns |
1.16 |
array/reductions/mapreduce/2d |
44463 ns |
49629.5 ns |
0.90 |
array/broadcast |
21661 ns |
21225 ns |
1.02 |
array/copyto!/gpu_to_gpu |
11483 ns |
13324 ns |
0.86 |
array/copyto!/cpu_to_gpu |
207967 ns |
208348.5 ns |
1.00 |
array/copyto!/gpu_to_cpu |
241256 ns |
241560 ns |
1.00 |
array/accumulate/1d |
108663.5 ns |
108467 ns |
1.00 |
array/accumulate/2d |
79954 ns |
79962 ns |
1.00 |
array/construct |
1278.25 ns |
1342.7 ns |
0.95 |
array/random/randn/Float32 |
42941.5 ns |
43560.5 ns |
0.99 |
array/random/randn!/Float32 |
26269 ns |
26195 ns |
1.00 |
array/random/rand!/Int64 |
26906 ns |
27079 ns |
0.99 |
array/random/rand!/Float32 |
8585.666666666666 ns |
8700 ns |
0.99 |
array/random/rand/Int64 |
29592 ns |
29827 ns |
0.99 |
array/random/rand/Float32 |
12842 ns |
12930 ns |
0.99 |
array/permutedims/4d |
60861 ns |
67316 ns |
0.90 |
array/permutedims/2d |
54753 ns |
56600 ns |
0.97 |
array/permutedims/3d |
56085 ns |
59248 ns |
0.95 |
array/sorting/1d |
2775041 ns |
2764861 ns |
1.00 |
array/sorting/by |
3366209 ns |
3352588 ns |
1.00 |
array/sorting/2d |
1084260 ns |
1080760 ns |
1.00 |
cuda/synchronization/stream/auto |
1044.3 ns |
1111.7 ns |
0.94 |
cuda/synchronization/stream/nonblocking |
6317.8 ns |
6387.8 ns |
0.99 |
cuda/synchronization/stream/blocking |
817.2022471910112 ns |
831.395061728395 ns |
0.98 |
cuda/synchronization/context/auto |
1215.6 ns |
1212.1 ns |
1.00 |
cuda/synchronization/context/nonblocking |
6522 ns |
6586.8 ns |
0.99 |
cuda/synchronization/context/blocking |
909.1590909090909 ns |
916.775 ns |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
Failure seems related:
|
Can't repro this after rebasing onto latest master. Let me push and see if it persists. |
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/lib/cublas/linalg.jl b/lib/cublas/linalg.jl
index f63d06f0a..f50f07c38 100644
--- a/lib/cublas/linalg.jl
+++ b/lib/cublas/linalg.jl
@@ -205,9 +205,11 @@ LinearAlgebra.generic_trimatdiv!(C::StridedCuVector{T}, uploc, isunitc, tfun::Fu
# work around upstream breakage from JuliaLang/julia#55547
@static if VERSION >= v"1.11.2"
const CuUpperOrUnitUpperTriangular = LinearAlgebra.UpperOrUnitUpperTriangular{
- <:Any,<:Union{<:CuArray, Adjoint{<:Any, <:CuArray}, Transpose{<:Any, <:CuArray}}}
+ <:Any, <:Union{<:CuArray, Adjoint{<:Any, <:CuArray}, Transpose{<:Any, <:CuArray}},
+ }
const CuLowerOrUnitLowerTriangular = LinearAlgebra.LowerOrUnitLowerTriangular{
- <:Any,<:Union{<:CuArray, Adjoint{<:Any, <:CuArray}, Transpose{<:Any, <:CuArray}}}
+ <:Any, <:Union{<:CuArray, Adjoint{<:Any, <:CuArray}, Transpose{<:Any, <:CuArray}},
+ }
LinearAlgebra.istriu(::CuUpperOrUnitUpperTriangular) = true
LinearAlgebra.istril(::CuUpperOrUnitUpperTriangular) = false
LinearAlgebra.istriu(::CuLowerOrUnitLowerTriangular) = false
diff --git a/lib/cusparse/linalg.jl b/lib/cusparse/linalg.jl
index 18ecac4a7..dd1d2489e 100644
--- a/lib/cusparse/linalg.jl
+++ b/lib/cusparse/linalg.jl
@@ -241,9 +241,11 @@ end
# work around upstream breakage from JuliaLang/julia#55547
@static if VERSION >= v"1.11.2"
const CuSparseUpperOrUnitUpperTriangular = LinearAlgebra.UpperOrUnitUpperTriangular{
- <:Any,<:Union{<:AbstractCuSparseMatrix, Adjoint{<:Any, <:AbstractCuSparseMatrix}, Transpose{<:Any, <:AbstractCuSparseMatrix}}}
+ <:Any, <:Union{<:AbstractCuSparseMatrix, Adjoint{<:Any, <:AbstractCuSparseMatrix}, Transpose{<:Any, <:AbstractCuSparseMatrix}},
+ }
const CuSparseLowerOrUnitLowerTriangular = LinearAlgebra.LowerOrUnitLowerTriangular{
- <:Any,<:Union{<:AbstractCuSparseMatrix, Adjoint{<:Any, <:AbstractCuSparseMatrix}, Transpose{<:Any, <:AbstractCuSparseMatrix}}}
+ <:Any, <:Union{<:AbstractCuSparseMatrix, Adjoint{<:Any, <:AbstractCuSparseMatrix}, Transpose{<:Any, <:AbstractCuSparseMatrix}},
+ }
LinearAlgebra.istriu(::CuSparseUpperOrUnitUpperTriangular) = true
LinearAlgebra.istril(::CuSparseUpperOrUnitUpperTriangular) = false
LinearAlgebra.istriu(::CuSparseLowerOrUnitLowerTriangular) = false
diff --git a/test/core/initialization.jl b/test/core/initialization.jl
index c2d03ac83..8b408c564 100644
--- a/test/core/initialization.jl
+++ b/test/core/initialization.jl
@@ -186,8 +186,8 @@ end
## allocations
let broken = VERSION == v"1.11.3" && Base.JLOptions().code_coverage != 0
- @test @allocated(current_context()) == 0 broken=broken
- @test @allocated(context()) == 0 broken=broken
- @test @allocated(stream()) == 0 broken=broken
- @test @allocated(device()) == 0 broken=broken
+ @test @allocated(current_context()) == 0 broken = broken
+ @test @allocated(context()) == 0 broken = broken
+ @test @allocated(stream()) == 0 broken = broken
+ @test @allocated(device()) == 0 broken = broken
end
diff --git a/test/libraries/cublas/level3_gemm.jl b/test/libraries/cublas/level3_gemm.jl
index bdbe8d1db..2f9770bce 100644
--- a/test/libraries/cublas/level3_gemm.jl
+++ b/test/libraries/cublas/level3_gemm.jl
@@ -37,27 +37,27 @@ k = 13
@testset "gemm!" begin
alpha = rand(elty)
beta = rand(elty)
- A = rand(elty,m,k)
- B = rand(elty,k,n)
- C1 = rand(elty,m,n)
+ A = rand(elty, m, k)
+ B = rand(elty, k, n)
+ C1 = rand(elty, m, n)
C2 = copy(C1)
d_A = CuArray(A)
d_B = CuArray(B)
d_C1 = CuArray(C1)
d_C2 = CuArray(C2)
- hA = rand(elty,m,m)
+ hA = rand(elty, m, m)
hA = hA + hA'
dhA = CuArray(hA)
- sA = rand(elty,m,m)
+ sA = rand(elty, m, m)
sA = sA + transpose(sA)
dsA = CuArray(sA)
- CUBLAS.gemm!('N','N',alpha,d_A,d_B,beta,d_C1)
+ CUBLAS.gemm!('N', 'N', alpha, d_A, d_B, beta, d_C1)
mul!(d_C2, d_A, d_B)
h_C1 = Array(d_C1)
h_C2 = Array(d_C2)
- C1 = (alpha*A)*B + beta*C1
- C2 = A*B
+ C1 = (alpha * A) * B + beta * C1
+ C2 = A * B
# compare
@test C1 ≈ h_C1
@test C2 ≈ h_C2
@@ -65,9 +65,9 @@ k = 13
@test_throws DimensionMismatch mul!(d_C1, d_A, dsA)
end
@testset "strided gemm!" begin
- denseA = CUDA.rand(elty, 4,4)
- denseB = CUDA.rand(elty, 4,4)
- denseC = CUDA.zeros(elty, 4,4)
+ denseA = CUDA.rand(elty, 4, 4)
+ denseB = CUDA.rand(elty, 4, 4)
+ denseC = CUDA.zeros(elty, 4, 4)
stridedA = view(denseA, 1:2, 1:2)::SubArray
stridedB = view(denseB, 1:2, 1:2)::SubArray
@@ -82,28 +82,28 @@ k = 13
end
if capability(device()) > v"5.0"
@testset "gemmEx!" begin
- A = rand(elty,m,k)
- B = rand(elty,k,n)
- C1 = rand(elty,m,n)
+ A = rand(elty, m, k)
+ B = rand(elty, k, n)
+ C1 = rand(elty, m, n)
d_A = CuArray(A)
d_B = CuArray(B)
d_C1 = CuArray(C1)
α = rand(elty)
β = rand(elty)
- CUBLAS.gemmEx!('N','N',α,d_A,d_B,β,d_C1)
+ CUBLAS.gemmEx!('N', 'N', α, d_A, d_B, β, d_C1)
h_C1 = Array(d_C1)
- C1 = (α*A)*B + β*C1
+ C1 = (α * A) * B + β * C1
# compare
@test C1 ≈ h_C1
end
end
@testset "gemm" begin
- A = rand(elty,m,k)
- B = rand(elty,k,n)
+ A = rand(elty, m, k)
+ B = rand(elty, k, n)
d_A = CuArray(A)
d_B = CuArray(B)
- d_C1 = CUBLAS.gemm('N','N',d_A,d_B)
- C1 = A*B
+ d_C1 = CUBLAS.gemm('N', 'N', d_A, d_B)
+ C1 = A * B
C2 = d_A * d_B
# compare
h_C1 = Array(d_C1)
@@ -114,50 +114,50 @@ k = 13
@testset "symm!" begin
alpha = rand(elty)
beta = rand(elty)
- sA = rand(elty,m,m)
+ sA = rand(elty, m, m)
sA = sA + transpose(sA)
dsA = CuArray(sA)
- B = rand(elty,m,n)
- C = rand(elty,m,n)
- Bbad = rand(elty,m+1,n+1)
+ B = rand(elty, m, n)
+ C = rand(elty, m, n)
+ Bbad = rand(elty, m + 1, n + 1)
d_B = CuArray(B)
d_C = CuArray(C)
d_Bbad = CuArray(Bbad)
- CUBLAS.symm!('L','U',alpha,dsA,d_B,beta,d_C)
- C = (alpha*sA)*B + beta*C
+ CUBLAS.symm!('L', 'U', alpha, dsA, d_B, beta, d_C)
+ C = (alpha * sA) * B + beta * C
# compare
h_C = Array(d_C)
@test C ≈ h_C
- @test_throws DimensionMismatch CUBLAS.symm!('L','U',alpha,dsA,d_Bbad,beta,d_C)
+ @test_throws DimensionMismatch CUBLAS.symm!('L', 'U', alpha, dsA, d_Bbad, beta, d_C)
end
@testset "symm" begin
- sA = rand(elty,m,m)
+ sA = rand(elty, m, m)
sA = sA + transpose(sA)
dsA = CuArray(sA)
- B = rand(elty,m,n)
- C = rand(elty,m,n)
- Bbad = rand(elty,m+1,n+1)
+ B = rand(elty, m, n)
+ C = rand(elty, m, n)
+ Bbad = rand(elty, m + 1, n + 1)
d_B = CuArray(B)
d_C = CuArray(C)
d_Bbad = CuArray(Bbad)
- d_C = CUBLAS.symm('L','U',dsA,d_B)
- C = sA*B
+ d_C = CUBLAS.symm('L', 'U', dsA, d_B)
+ C = sA * B
# compare
h_C = Array(d_C)
@test C ≈ h_C
- @test_throws DimensionMismatch CUBLAS.symm('L','U',dsA,d_Bbad)
+ @test_throws DimensionMismatch CUBLAS.symm('L', 'U', dsA, d_Bbad)
end
@testset "trmm!" begin
alpha = rand(elty)
A = triu(rand(elty, m, m))
- B = rand(elty,m,n)
- C = zeros(elty,m,n)
+ B = rand(elty, m, n)
+ C = zeros(elty, m, n)
dA = CuArray(A)
dB = CuArray(B)
dC = CuArray(C)
- C = alpha*A*B
- CUBLAS.trmm!('L','U','N','N',alpha,dA,dB,dC)
+ C = alpha * A * B
+ CUBLAS.trmm!('L', 'U', 'N', 'N', alpha, dA, dB, dC)
# move to host and compare
h_C = Array(dC)
@test C ≈ h_C
@@ -165,23 +165,23 @@ k = 13
@testset "trmm" begin
alpha = rand(elty)
A = triu(rand(elty, m, m))
- B = rand(elty,m,n)
- C = zeros(elty,m,n)
+ B = rand(elty, m, n)
+ C = zeros(elty, m, n)
dA = CuArray(A)
dB = CuArray(B)
dC = CuArray(C)
- C = alpha*A*B
- d_C = CUBLAS.trmm('L','U','N','N',alpha,dA,dB)
+ C = alpha * A * B
+ d_C = CUBLAS.trmm('L', 'U', 'N', 'N', alpha, dA, dB)
# move to host and compare
h_C = Array(d_C)
@test C ≈ h_C
end
@testset "triangular-dense mul!" begin
A = triu(rand(elty, m, m))
- B = rand(elty,m,n)
- C = zeros(elty,m,n)
+ B = rand(elty, m, n)
+ C = zeros(elty, m, n)
- sA = rand(elty,m,m)
+ sA = rand(elty, m, m)
sA = sA + transpose(sA)
for t in (identity, transpose, adjoint), TR in (UpperTriangular, LowerTriangular, UnitUpperTriangular, UnitLowerTriangular)
@@ -210,22 +210,22 @@ k = 13
end
@testset "triangular-triangular mul!" begin
- A = triu(rand(elty, m, m))
- B = triu(rand(elty, m, m))
- C0 = zeros(elty,m,m)
+ A = triu(rand(elty, m, m))
+ B = triu(rand(elty, m, m))
+ C0 = zeros(elty, m, m)
- sA = rand(elty,m,m)
+ sA = rand(elty, m, m)
sA = sA + transpose(sA)
- sB = rand(elty,m,m)
+ sB = rand(elty, m, m)
sB = sB + transpose(sB)
for (TRa, ta, TRb, tb, TRc) in (
- (UpperTriangular, identity, LowerTriangular, identity, Matrix),
- (LowerTriangular, identity, UpperTriangular, identity, Matrix),
- (UpperTriangular, identity, UpperTriangular, transpose, Matrix),
- (UpperTriangular, transpose, UpperTriangular, identity, Matrix),
- (LowerTriangular, identity, LowerTriangular, transpose, Matrix),
- (LowerTriangular, transpose, LowerTriangular, identity, Matrix),
+ (UpperTriangular, identity, LowerTriangular, identity, Matrix),
+ (LowerTriangular, identity, UpperTriangular, identity, Matrix),
+ (UpperTriangular, identity, UpperTriangular, transpose, Matrix),
+ (UpperTriangular, transpose, UpperTriangular, identity, Matrix),
+ (LowerTriangular, identity, LowerTriangular, transpose, Matrix),
+ (LowerTriangular, transpose, LowerTriangular, identity, Matrix),
)
A = copy(sA) |> TRa
@@ -251,28 +251,28 @@ k = 13
@testset "hemm!" begin
alpha = rand(elty)
beta = rand(elty)
- hA = rand(elty,m,m)
+ hA = rand(elty, m, m)
hA = hA + hA'
dhA = CuArray(hA)
- B = rand(elty,m,n)
- C = rand(elty,m,n)
+ B = rand(elty, m, n)
+ C = rand(elty, m, n)
d_B = CuArray(B)
d_C = CuArray(C)
# compute
- C = alpha*(hA*B) + beta*C
- CUBLAS.hemm!('L','L',alpha,dhA,d_B,beta,d_C)
+ C = alpha * (hA * B) + beta * C
+ CUBLAS.hemm!('L', 'L', alpha, dhA, d_B, beta, d_C)
# move to host and compare
h_C = Array(d_C)
@test C ≈ h_C
end
@testset "hemm" begin
- hA = rand(elty,m,m)
+ hA = rand(elty, m, m)
hA = hA + hA'
dhA = CuArray(hA)
- B = rand(elty,m,n)
+ B = rand(elty, m, n)
d_B = CuArray(B)
- C = hA*B
- d_C = CUBLAS.hemm('L','U',dhA,d_B)
+ C = hA * B
+ d_C = CUBLAS.hemm('L', 'U', dhA, d_B)
# move to host and compare
h_C = Array(d_C)
@test C ≈ h_C
@@ -285,62 +285,62 @@ k = 13
alpha = rand(elty)
beta = rand(elty)
# generate matrices
- bA = [rand(elty,m,k) for i in 1:10]
- bB = [rand(elty,k,n) for i in 1:10]
- bC = [rand(elty,m,n) for i in 1:10]
+ bA = [rand(elty, m, k) for i in 1:10]
+ bB = [rand(elty, k, n) for i in 1:10]
+ bC = [rand(elty, m, n) for i in 1:10]
# move to device
bd_A = CuArray{elty, 2}[]
bd_B = CuArray{elty, 2}[]
bd_C = CuArray{elty, 2}[]
bd_bad = CuArray{elty, 2}[]
for i in 1:length(bA)
- push!(bd_A,CuArray(bA[i]))
- push!(bd_B,CuArray(bB[i]))
- push!(bd_C,CuArray(bC[i]))
+ push!(bd_A, CuArray(bA[i]))
+ push!(bd_B, CuArray(bB[i]))
+ push!(bd_C, CuArray(bC[i]))
if i < length(bA) - 2
- push!(bd_bad,CuArray(bC[i]))
+ push!(bd_bad, CuArray(bC[i]))
end
end
@testset "gemm_batched!" begin
# C = (alpha*A)*B + beta*C
- CUBLAS.gemm_batched!('N','N',alpha,bd_A,bd_B,beta,bd_C)
+ CUBLAS.gemm_batched!('N', 'N', alpha, bd_A, bd_B, beta, bd_C)
for i in 1:length(bd_C)
- bC[i] = (alpha*bA[i])*bB[i] + beta*bC[i]
+ bC[i] = (alpha * bA[i]) * bB[i] + beta * bC[i]
h_C = Array(bd_C[i])
#compare
@test bC[i] ≈ h_C
end
- @test_throws DimensionMismatch CUBLAS.gemm_batched!('N','N',alpha,bd_A,bd_bad,beta,bd_C)
+ @test_throws DimensionMismatch CUBLAS.gemm_batched!('N', 'N', alpha, bd_A, bd_bad, beta, bd_C)
end
@testset "gemm_batched" begin
- bd_C = CUBLAS.gemm_batched('N','N',bd_A,bd_B)
+ bd_C = CUBLAS.gemm_batched('N', 'N', bd_A, bd_B)
for i in 1:length(bA)
- bC[i] = bA[i]*bB[i]
+ bC[i] = bA[i] * bB[i]
h_C = Array(bd_C[i])
@test bC[i] ≈ h_C
end
- @test_throws DimensionMismatch CUBLAS.gemm_batched('N','N',alpha,bd_A,bd_bad)
+ @test_throws DimensionMismatch CUBLAS.gemm_batched('N', 'N', alpha, bd_A, bd_bad)
end
@testset "gemmBatchedEx!" begin
# C = (alpha*A)*B + beta*C
- CUBLAS.gemmBatchedEx!('N','N',alpha,bd_A,bd_B,beta,bd_C)
+ CUBLAS.gemmBatchedEx!('N', 'N', alpha, bd_A, bd_B, beta, bd_C)
for i in 1:length(bd_C)
- bC[i] = (alpha*bA[i])*bB[i] + beta*bC[i]
+ bC[i] = (alpha * bA[i]) * bB[i] + beta * bC[i]
h_C = Array(bd_C[i])
#compare
@test bC[i] ≈ h_C
end
- @test_throws DimensionMismatch CUBLAS.gemmBatchedEx!('N','N',alpha,bd_A,bd_bad,beta,bd_C)
+ @test_throws DimensionMismatch CUBLAS.gemmBatchedEx!('N', 'N', alpha, bd_A, bd_bad, beta, bd_C)
end
nbatch = 10
bA = rand(elty, m, k, nbatch)
bB = rand(elty, k, n, nbatch)
bC = rand(elty, m, n, nbatch)
- bbad = rand(elty, m+1, n+1, nbatch)
+ bbad = rand(elty, m + 1, n + 1, nbatch)
# move to device
bd_A = CuArray{elty, 3}(bA)
bd_B = CuArray{elty, 3}(bB)
@@ -402,16 +402,16 @@ k = 13
alpha = rand(elty, num_groups)
beta = rand(elty, num_groups)
# generate matrices
- bA = [[rand(elty,3*i,2*i) for j in 1:group_sizes[i]] for i in 1:num_groups]
- bB = [[rand(elty,2*i,5*i) for j in 1:group_sizes[i]] for i in 1:num_groups]
- bC = [[rand(elty,3*i,5*i) for j in 1:group_sizes[i]] for i in 1:num_groups]
+ bA = [[rand(elty, 3 * i, 2 * i) for j in 1:group_sizes[i]] for i in 1:num_groups]
+ bB = [[rand(elty, 2 * i, 5 * i) for j in 1:group_sizes[i]] for i in 1:num_groups]
+ bC = [[rand(elty, 3 * i, 5 * i) for j in 1:group_sizes[i]] for i in 1:num_groups]
# move to device
bd_A = [[CuArray(bA[i][j]) for j in 1:group_sizes[i]] for i in 1:num_groups]
bd_B = [[CuArray(bB[i][j]) for j in 1:group_sizes[i]] for i in 1:num_groups]
bd_C = [[CuArray(bC[i][j]) for j in 1:group_sizes[i]] for i in 1:num_groups]
@testset "gemm_grouped_batched!" begin
# C = (alpha*A)*B + beta*C
- CUBLAS.gemm_grouped_batched!(transA,transB,alpha,bd_A,bd_B,beta,bd_C)
+ CUBLAS.gemm_grouped_batched!(transA, transB, alpha, bd_A, bd_B, beta, bd_C)
for i in 1:num_groups, j in 1:group_sizes[i]
bC[i][j] = alpha[i] * bA[i][j] * bB[i][j] + beta[i] * bC[i][j]
h_C = Array(bd_C[i][j])
@@ -420,7 +420,7 @@ k = 13
end
@testset "gemm_grouped_batched" begin
- bd_C = CUBLAS.gemm_grouped_batched(transA,transB,bd_A,bd_B)
+ bd_C = CUBLAS.gemm_grouped_batched(transA, transB, bd_A, bd_B)
for i in 1:num_groups, j in 1:group_sizes[i]
bC[i][j] = bA[i][j] * bB[i][j]
h_C = Array(bd_C[i][j])
@@ -439,22 +439,22 @@ k = 13
alpha = rand(elty, 10)
beta = rand(elty, 10)
# generate matrices
- bA = [rand(elty,3*i,2*i) for i in 1:10]
- bB = [rand(elty,2*i,5*i) for i in 1:10]
- bC = [rand(elty,3*i,5*i) for i in 1:10]
+ bA = [rand(elty, 3 * i, 2 * i) for i in 1:10]
+ bB = [rand(elty, 2 * i, 5 * i) for i in 1:10]
+ bC = [rand(elty, 3 * i, 5 * i) for i in 1:10]
# move to device
bd_A = CuArray{elty, 2}[]
bd_B = CuArray{elty, 2}[]
bd_C = CuArray{elty, 2}[]
for i in 1:length(bA)
- push!(bd_A,CuArray(bA[i]))
- push!(bd_B,CuArray(bB[i]))
- push!(bd_C,CuArray(bC[i]))
+ push!(bd_A, CuArray(bA[i]))
+ push!(bd_B, CuArray(bB[i]))
+ push!(bd_C, CuArray(bC[i]))
end
@testset "gemm_grouped_batched!" begin
# C = (alpha*A)*B + beta*C
- CUBLAS.gemm_grouped_batched!(transA,transB,alpha,bd_A,bd_B,beta,bd_C)
+ CUBLAS.gemm_grouped_batched!(transA, transB, alpha, bd_A, bd_B, beta, bd_C)
for i in 1:length(bd_C)
bC[i] = alpha[i] * bA[i] * bB[i] + beta[i] * bC[i]
h_C = Array(bd_C[i])
@@ -463,7 +463,7 @@ k = 13
end
@testset "gemm_grouped_batched" begin
- bd_C = CUBLAS.gemm_grouped_batched(transA,transB,bd_A,bd_B)
+ bd_C = CUBLAS.gemm_grouped_batched(transA, transB, bd_A, bd_B)
for i in 1:length(bd_C)
bC[i] = bA[i] * bB[i]
h_C = Array(bd_C[i])
@@ -474,19 +474,21 @@ k = 13
end
@testset "mixed-precision matmul" begin
- m,k,n = 4,4,4
- cudaTypes = (Float16, Complex{Float16}, BFloat16, Complex{BFloat16}, Float32, Complex{Float32},
- Float64, Complex{Float64}, Int8, Complex{Int8}, UInt8, Complex{UInt8},
- Int16, Complex{Int16}, UInt16, Complex{UInt16}, Int32, Complex{Int32},
- UInt32, Complex{UInt32}, Int64, Complex{Int64}, UInt64, Complex{UInt64})
+ m, k, n = 4, 4, 4
+ cudaTypes = (
+ Float16, Complex{Float16}, BFloat16, Complex{BFloat16}, Float32, Complex{Float32},
+ Float64, Complex{Float64}, Int8, Complex{Int8}, UInt8, Complex{UInt8},
+ Int16, Complex{Int16}, UInt16, Complex{UInt16}, Int32, Complex{Int32},
+ UInt32, Complex{UInt32}, Int64, Complex{Int64}, UInt64, Complex{UInt64},
+ )
for AT in cudaTypes, CT in cudaTypes
BT = AT # gemmEx requires identical A and B types
# we only test combinations of types that are supported by gemmEx
- if CUBLAS.gemmExComputeType(AT, BT, CT, m,k,n) !== nothing
- A = AT <: BFloat16 ? AT.(rand(m,k)) : rand(AT, m,k)
- B = BT <: BFloat16 ? BT.(rand(k,n)) : rand(BT, k,n)
+ if CUBLAS.gemmExComputeType(AT, BT, CT, m, k, n) !== nothing
+ A = AT <: BFloat16 ? AT.(rand(m, k)) : rand(AT, m, k)
+ B = BT <: BFloat16 ? BT.(rand(k, n)) : rand(BT, k, n)
C = similar(B, CT)
mul!(C, A, B)
@@ -501,18 +503,18 @@ k = 13
mul!(dC, dA, dB)
rtol = Base.rtoldefault(AT, BT, 0)
- @test C ≈ Array(dC) rtol=rtol
+ @test C ≈ Array(dC) rtol = rtol
end
end
# also test an unsupported combination (falling back to GPUArrays)
if VERSION < v"1.11-" # JuliaGPU/CUDA.jl#2441
- AT=BFloat16
- BT=Int32
- CT=Float64
+ AT = BFloat16
+ BT = Int32
+ CT = Float64
- A = AT.(rand(m,k))
- B = rand(BT, k,n)
+ A = AT.(rand(m, k))
+ B = rand(BT, k, n)
C = similar(B, CT)
mul!(C, A, B)
@@ -522,15 +524,15 @@ k = 13
mul!(dC, dA, dB)
rtol = Base.rtoldefault(AT, BT, 0)
- @test C ≈ Array(dC) rtol=rtol
+ @test C ≈ Array(dC) rtol = rtol
end
end
@testset "gemm! with strided inputs" begin # JuliaGPU/CUDA.jl#78
inn = 784; out = 32
- testf(randn(784*100), rand(Float32, 784, 100)) do p, x
- p[reshape(1:(out*inn),out,inn)] * x
- @view(p[reshape(1:(out*inn),out,inn)]) * x
+ testf(randn(784 * 100), rand(Float32, 784, 100)) do p, x
+ p[reshape(1:(out * inn), out, inn)] * x
+ @view(p[reshape(1:(out * inn), out, inn)]) * x
end
end
end
diff --git a/test/libraries/cublas/xt.jl b/test/libraries/cublas/xt.jl
index 0f7f8d098..51e7339df 100644
--- a/test/libraries/cublas/xt.jl
+++ b/test/libraries/cublas/xt.jl
@@ -20,13 +20,13 @@ k = 13
@testset "xt_trmm! gpu" begin
alpha = rand(elty)
A = triu(rand(elty, m, m))
- B = rand(elty,m,n)
- C = zeros(elty,m,n)
+ B = rand(elty, m, n)
+ C = zeros(elty, m, n)
dA = CuArray(A)
dB = CuArray(B)
dC = CuArray(C)
- C = alpha*A*B
- CUBLAS.xt_trmm!('L','U','N','N',alpha,dA,dB,dC)
+ C = alpha * A * B
+ CUBLAS.xt_trmm!('L', 'U', 'N', 'N', alpha, dA, dB, dC)
# move to host and compare
h_C = Array(dC)
@test C ≈ h_C
@@ -34,22 +34,22 @@ k = 13
@testset "xt_trmm! cpu" begin
alpha = rand(elty)
A = triu(rand(elty, m, m))
- B = rand(elty,m,n)
- C = alpha*A*B
+ B = rand(elty, m, n)
+ C = alpha * A * B
h_C = zeros(elty, m, n)
- CUBLAS.xt_trmm!('L','U','N','N',alpha,copy(A),copy(B),h_C)
+ CUBLAS.xt_trmm!('L', 'U', 'N', 'N', alpha, copy(A), copy(B), h_C)
@test C ≈ h_C
end
@testset "xt_trmm gpu" begin
alpha = rand(elty)
A = triu(rand(elty, m, m))
- B = rand(elty,m,n)
- C = zeros(elty,m,n)
+ B = rand(elty, m, n)
+ C = zeros(elty, m, n)
dA = CuArray(A)
dB = CuArray(B)
dC = CuArray(C)
- C = alpha*A*B
- d_C = CUBLAS.xt_trmm('L','U','N','N',alpha,dA,dB)
+ C = alpha * A * B
+ d_C = CUBLAS.xt_trmm('L', 'U', 'N', 'N', alpha, dA, dB)
# move to host and compare
@test d_C isa CuArray
h_C = Array(d_C)
@@ -58,9 +58,9 @@ k = 13
@testset "xt_trmm cpu" begin
alpha = rand(elty)
A = triu(rand(elty, m, m))
- B = rand(elty,m,n)
- C = alpha*A*B
- h_C = CUBLAS.xt_trmm('L','U','N','N',alpha,copy(A),copy(B))
+ B = rand(elty, m, n)
+ C = alpha * A * B
+ h_C = CUBLAS.xt_trmm('L', 'U', 'N', 'N', alpha, copy(A), copy(B))
@test h_C isa Array
@test C ≈ h_C
end
@@ -68,13 +68,13 @@ k = 13
@testset "xt_trsm! gpu" begin
alpha = rand(elty)
A = triu(rand(elty, m, m))
- B = rand(elty,m,n)
+ B = rand(elty, m, n)
dA = CuArray(A)
dB = CuArray(B)
- C = alpha*(A\B)
+ C = alpha * (A \ B)
dC = copy(dB)
synchronize()
- CUBLAS.xt_trsm!('L','U','N','N',alpha,dA,dC)
+ CUBLAS.xt_trsm!('L', 'U', 'N', 'N', alpha, dA, dC)
# move to host and compare
h_C = Array(dC)
@test C ≈ h_C
@@ -82,16 +82,16 @@ k = 13
@testset "xt_symm! gpu" begin
alpha = rand(elty)
beta = rand(elty)
- sA = rand(elty,m,m)
+ sA = rand(elty, m, m)
sA = sA + transpose(sA)
dsA = CuArray(sA)
- B = rand(elty,m,n)
- C = rand(elty,m,n)
- Bbad = rand(elty,m+1,n+1)
+ B = rand(elty, m, n)
+ C = rand(elty, m, n)
+ Bbad = rand(elty, m + 1, n + 1)
d_B = CuArray(B)
d_C = CuArray(C)
- CUBLAS.xt_symm!('L','U',alpha,dsA,d_B,beta,d_C)
- C = (alpha*sA)*B + beta*C
+ CUBLAS.xt_symm!('L', 'U', alpha, dsA, d_B, beta, d_C)
+ C = (alpha * sA) * B + beta * C
# compare
h_C = Array(d_C)
@test C ≈ h_C
@@ -99,36 +99,36 @@ k = 13
@testset "xt_symm! cpu" begin
alpha = rand(elty)
beta = rand(elty)
- sA = rand(elty,m,m)
+ sA = rand(elty, m, m)
sA = sA + transpose(sA)
- B = rand(elty,m,n)
- C = rand(elty,m,n)
- h_C = copy(C)
- CUBLAS.xt_symm!('L','U',alpha,copy(sA),copy(B),beta,h_C)
- C = (alpha*sA)*B + beta*C
+ B = rand(elty, m, n)
+ C = rand(elty, m, n)
+ h_C = copy(C)
+ CUBLAS.xt_symm!('L', 'U', alpha, copy(sA), copy(B), beta, h_C)
+ C = (alpha * sA) * B + beta * C
# compare
@test C ≈ h_C
end
@testset "xt_symm gpu" begin
- sA = rand(elty,m,m)
+ sA = rand(elty, m, m)
sA = sA + transpose(sA)
dsA = CuArray(sA)
- B = rand(elty,m,n)
+ B = rand(elty, m, n)
d_B = CuArray(B)
- d_C = CUBLAS.xt_symm('L','U',dsA,d_B)
- C = sA*B
+ d_C = CUBLAS.xt_symm('L', 'U', dsA, d_B)
+ C = sA * B
# compare
@test d_C isa CuArray
h_C = Array(d_C)
@test C ≈ h_C
end
@testset "xt_symm cpu" begin
- sA = rand(elty,m,m)
+ sA = rand(elty, m, m)
sA = sA + transpose(sA)
- B = rand(elty,m,n)
- h_C = CUBLAS.xt_symm('L','U',copy(sA),copy(B))
- C = sA*B
+ B = rand(elty, m, n)
+ h_C = CUBLAS.xt_symm('L', 'U', copy(sA), copy(B))
+ C = sA * B
# compare
@test h_C isa Array
@test C ≈ h_C
@@ -136,53 +136,53 @@ k = 13
@testset "xt_gemm! gpu" begin
alpha = rand(elty)
beta = rand(elty)
- A = rand(elty,m,k)
- B = rand(elty,k,n)
- C1 = rand(elty,m,n)
- C2 = copy(C1)
+ A = rand(elty, m, k)
+ B = rand(elty, k, n)
+ C1 = rand(elty, m, n)
+ C2 = copy(C1)
d_A = CuArray(A)
d_B = CuArray(B)
- Bbad = rand(elty,k+1,n+1)
+ Bbad = rand(elty, k + 1, n + 1)
d_Bbad = CuArray(Bbad)
d_C1 = CuArray(C1)
d_C2 = CuArray(C2)
- @test_throws DimensionMismatch CUBLAS.xt_gemm!('N','N',alpha,d_A,d_Bbad,beta,d_C1)
+ @test_throws DimensionMismatch CUBLAS.xt_gemm!('N', 'N', alpha, d_A, d_Bbad, beta, d_C1)
synchronize()
- CUBLAS.xt_gemm!('N','N',alpha,d_A,d_B,beta,d_C1)
+ CUBLAS.xt_gemm!('N', 'N', alpha, d_A, d_B, beta, d_C1)
mul!(d_C2, d_A, d_B)
h_C1 = Array(d_C1)
h_C2 = Array(d_C2)
- C1 = (alpha*A)*B + beta*C1
- C2 = A*B
+ C1 = (alpha * A) * B + beta * C1
+ C2 = A * B
# compare
@test C1 ≈ h_C1
@test C2 ≈ h_C2
end
@testset "xt_gemm! cpu" begin
alpha = rand(elty)
- beta = rand(elty)
- A = rand(elty,m,k)
- B = rand(elty,k,n)
- C1 = rand(elty,m,n)
- C2 = copy(C1)
- C3 = copy(C1)
- C4 = copy(C2)
- CUBLAS.xt_gemm!('N','N',alpha,A,B,beta,C1)
+ beta = rand(elty)
+ A = rand(elty, m, k)
+ B = rand(elty, k, n)
+ C1 = rand(elty, m, n)
+ C2 = copy(C1)
+ C3 = copy(C1)
+ C4 = copy(C2)
+ CUBLAS.xt_gemm!('N', 'N', alpha, A, B, beta, C1)
mul!(C2, A, B)
- C3 = (alpha*A)*B + beta*C3
- C4 = A*B
+ C3 = (alpha * A) * B + beta * C3
+ C4 = A * B
# compare
@test C1 ≈ C3
@test C2 ≈ C4
end
@testset "xt_gemm gpu" begin
- A = rand(elty,m,k)
- B = rand(elty,k,n)
+ A = rand(elty, m, k)
+ B = rand(elty, k, n)
d_A = CuArray(A)
d_B = CuArray(B)
synchronize()
- d_C = CUBLAS.xt_gemm('N','N',d_A,d_B)
- C = A*B
+ d_C = CUBLAS.xt_gemm('N', 'N', d_A, d_B)
+ C = A * B
C2 = d_A * d_B
# compare
@test d_C isa CuArray
@@ -192,34 +192,34 @@ k = 13
@test C ≈ h_C2
end
@testset "xt_gemm cpu" begin
- A = rand(elty,m,k)
- B = rand(elty,k,n)
- C = CUBLAS.xt_gemm('N','N',A,B)
- C2 = A*B
+ A = rand(elty, m, k)
+ B = rand(elty, k, n)
+ C = CUBLAS.xt_gemm('N', 'N', A, B)
+ C2 = A * B
# compare
@test C isa Array
- @test C ≈ A*B
+ @test C ≈ A * B
@test C ≈ C2
end
@testset "xt_trsm! cpu" begin
alpha = rand(elty)
A = triu(rand(elty, m, m))
- B = rand(elty,m,n)
- C = alpha*(A\B)
+ B = rand(elty, m, n)
+ C = alpha * (A \ B)
h_C = copy(B)
synchronize()
- CUBLAS.xt_trsm!('L','U','N','N',alpha,copy(A),h_C)
+ CUBLAS.xt_trsm!('L', 'U', 'N', 'N', alpha, copy(A), h_C)
@test C ≈ h_C
end
@testset "xt_trsm gpu" begin
alpha = rand(elty)
A = triu(rand(elty, m, m))
- B = rand(elty,m,n)
+ B = rand(elty, m, n)
dA = CuArray(A)
dB = CuArray(B)
- C = alpha*(A\B)
+ C = alpha * (A \ B)
synchronize()
- dC = CUBLAS.xt_trsm('L','U','N','N',alpha,dA,dB)
+ dC = CUBLAS.xt_trsm('L', 'U', 'N', 'N', alpha, dA, dB)
# move to host and compare
@test dC isa CuArray
h_C = Array(dC)
@@ -228,10 +228,10 @@ k = 13
@testset "xt_trsm cpu" begin
alpha = rand(elty)
A = triu(rand(elty, m, m))
- B = rand(elty,m,n)
- C = alpha*(A\B)
+ B = rand(elty, m, n)
+ C = alpha * (A \ B)
synchronize()
- h_C = CUBLAS.xt_trsm('L','U','N','N',alpha,copy(A),copy(B))
+ h_C = CUBLAS.xt_trsm('L', 'U', 'N', 'N', alpha, copy(A), copy(B))
@test h_C isa Array
@test C ≈ h_C
end
@@ -248,17 +248,17 @@ k = 13
d_syrkx_C = CuArray(syrkx_C)
# C = (alpha*A)*transpose(B) + beta*C
synchronize()
- d_syrkx_C = CUBLAS.xt_syrkx!('U','N',alpha,d_syrkx_A,d_syrkx_B,beta,d_syrkx_C)
- final_C = (alpha*syrkx_A)*transpose(syrkx_B) + beta*syrkx_C
+ d_syrkx_C = CUBLAS.xt_syrkx!('U', 'N', alpha, d_syrkx_A, d_syrkx_B, beta, d_syrkx_C)
+ final_C = (alpha * syrkx_A) * transpose(syrkx_B) + beta * syrkx_C
# move to host and compare
h_C = Array(d_syrkx_C)
@test triu(final_C) ≈ triu(h_C)
badC = rand(elty, m, n)
d_badC = CuArray(badC)
- @test_throws DimensionMismatch CUBLAS.xt_syrkx!('U','N',alpha,d_syrkx_A,d_syrkx_B,beta,d_badC)
- badC = rand(elty, n+1, n+1)
+ @test_throws DimensionMismatch CUBLAS.xt_syrkx!('U', 'N', alpha, d_syrkx_A, d_syrkx_B, beta, d_badC)
+ badC = rand(elty, n + 1, n + 1)
d_badC = CuArray(badC)
- @test_throws DimensionMismatch CUBLAS.xt_syrkx!('U','N',alpha,d_syrkx_A,d_syrkx_B,beta,d_badC)
+ @test_throws DimensionMismatch CUBLAS.xt_syrkx!('U', 'N', alpha, d_syrkx_A, d_syrkx_B, beta, d_badC)
end
@testset "xt_syrkx! cpu" begin
alpha = rand(elty)
@@ -268,8 +268,8 @@ k = 13
syrkx_B = rand(elty, n, k)
syrkx_C = rand(elty, n, n)
syrkx_C += syrkx_C'
- final_C = (alpha*syrkx_A)*transpose(syrkx_B) + beta*syrkx_C
- CUBLAS.xt_syrkx!('U','N',alpha,syrkx_A,syrkx_B,beta,syrkx_C)
+ final_C = (alpha * syrkx_A) * transpose(syrkx_B) + beta * syrkx_C
+ CUBLAS.xt_syrkx!('U', 'N', alpha, syrkx_A, syrkx_B, beta, syrkx_C)
# move to host and compare
@test triu(final_C) ≈ triu(syrkx_C)
end
@@ -280,8 +280,8 @@ k = 13
d_syrkx_A = CuArray(syrkx_A)
d_syrkx_B = CuArray(syrkx_B)
synchronize()
- d_syrkx_C = CUBLAS.xt_syrkx('U','N',d_syrkx_A,d_syrkx_B)
- final_C = syrkx_A*transpose(syrkx_B)
+ d_syrkx_C = CUBLAS.xt_syrkx('U', 'N', d_syrkx_A, d_syrkx_B)
+ final_C = syrkx_A * transpose(syrkx_B)
# move to host and compare
@test d_syrkx_C isa CuArray
h_C = Array(d_syrkx_C)
@@ -291,17 +291,17 @@ k = 13
# generate matrices
syrkx_A = rand(elty, n, k)
syrkx_B = rand(elty, n, k)
- h_C = CUBLAS.xt_syrkx('U','N',syrkx_A,syrkx_B)
- final_C = syrkx_A*transpose(syrkx_B)
+ h_C = CUBLAS.xt_syrkx('U', 'N', syrkx_A, syrkx_B)
+ final_C = syrkx_A * transpose(syrkx_B)
@test h_C isa Array
@test triu(final_C) ≈ triu(h_C)
end
@testset "xt_syrk gpu" begin
# C = A*transpose(A)
- A = rand(elty,m,k)
+ A = rand(elty, m, k)
d_A = CuArray(A)
- d_C = CUBLAS.xt_syrk('U','N',d_A)
- C = A*transpose(A)
+ d_C = CUBLAS.xt_syrk('U', 'N', d_A)
+ C = A * transpose(A)
C = triu(C)
# move to host and compare
@test d_C isa CuArray
@@ -310,10 +310,10 @@ k = 13
@test C ≈ h_C
end
@testset "xt_syrk cpu" begin
- A = rand(elty,m,k)
+ A = rand(elty, m, k)
# C = A*transpose(A)
- h_C = CUBLAS.xt_syrk('U','N',copy(A))
- C = A*transpose(A)
+ h_C = CUBLAS.xt_syrk('U', 'N', copy(A))
+ C = A * transpose(A)
C = triu(C)
# move to host and compare
@test h_C isa Array
@@ -324,16 +324,16 @@ k = 13
@testset "xt_hemm! gpu" begin
alpha = rand(elty)
beta = rand(elty)
- hA = rand(elty,m,m)
+ hA = rand(elty, m, m)
hA = hA + hA'
dhA = CuArray(hA)
- B = rand(elty,m,n)
- C = rand(elty,m,n)
+ B = rand(elty, m, n)
+ C = rand(elty, m, n)
d_B = CuArray(B)
d_C = CuArray(C)
# compute
- C = alpha*(hA*B) + beta*C
- CUBLAS.xt_hemm!('L','L',alpha,dhA,d_B,beta,d_C)
+ C = alpha * (hA * B) + beta * C
+ CUBLAS.xt_hemm!('L', 'L', alpha, dhA, d_B, beta, d_C)
# move to host and compare
h_C = Array(d_C)
@test C ≈ h_C
@@ -341,35 +341,35 @@ k = 13
@testset "xt_hemm! cpu" begin
alpha = rand(elty)
beta = rand(elty)
- hA = rand(elty,m,m)
+ hA = rand(elty, m, m)
hA = hA + hA'
- B = rand(elty,m,n)
- C = rand(elty,m,n)
+ B = rand(elty, m, n)
+ C = rand(elty, m, n)
# compute
h_C = copy(C)
- C = alpha*(hA*B) + beta*C
- CUBLAS.xt_hemm!('L','L',alpha,copy(hA),copy(B),beta,h_C)
+ C = alpha * (hA * B) + beta * C
+ CUBLAS.xt_hemm!('L', 'L', alpha, copy(hA), copy(B), beta, h_C)
@test C ≈ h_C
end
@testset "xt_hemm gpu" begin
- hA = rand(elty,m,m)
- hA = hA + hA'
+ hA = rand(elty, m, m)
+ hA = hA + hA'
dhA = CuArray(hA)
- B = rand(elty,m,n)
+ B = rand(elty, m, n)
d_B = CuArray(B)
- C = hA*B
- d_C = CUBLAS.xt_hemm('L','U',dhA, d_B)
+ C = hA * B
+ d_C = CUBLAS.xt_hemm('L', 'U', dhA, d_B)
# move to host and compare
@test d_C isa CuArray
h_C = Array(d_C)
@test C ≈ h_C
end
@testset "xt_hemm cpu" begin
- hA = rand(elty,m,m)
+ hA = rand(elty, m, m)
hA = hA + hA'
- B = rand(elty,m,n)
- C = hA*B
- h_C = CUBLAS.xt_hemm('L','U',copy(hA), copy(B))
+ B = rand(elty, m, n)
+ C = hA * B
+ h_C = CUBLAS.xt_hemm('L', 'U', copy(hA), copy(B))
# move to host and compare
@test h_C isa Array
@test C ≈ h_C
@@ -377,13 +377,13 @@ k = 13
@testset "xt_herk! gpu" begin
alpha = rand(elty)
beta = rand(elty)
- A = rand(elty,m,m)
+ A = rand(elty, m, m)
hA = A + A'
- C = real(alpha)*(A*A') + real(beta)*copy(hA)
+ C = real(alpha) * (A * A') + real(beta) * copy(hA)
d_A = CuArray(A)
d_C = CuArray(hA)
synchronize()
- CUBLAS.xt_herk!('U','N',real(alpha),d_A,real(beta),d_C)
+ CUBLAS.xt_herk!('U', 'N', real(alpha), d_A, real(beta), d_C)
C = triu(C)
# move to host and compare
h_C = Array(d_C)
@@ -393,22 +393,22 @@ k = 13
@testset "xt_herk! cpu" begin
alpha = rand(elty)
beta = rand(elty)
- A = rand(elty,m,m)
+ A = rand(elty, m, m)
hA = A + A'
h_C = copy(hA)
- CUBLAS.xt_herk!('U','N',real(alpha),copy(A),real(beta),h_C)
- C = real(alpha)*(A*A') + real(beta)*copy(hA)
+ CUBLAS.xt_herk!('U', 'N', real(alpha), copy(A), real(beta), h_C)
+ C = real(alpha) * (A * A') + real(beta) * copy(hA)
C = triu(C)
# move to host and compare
h_C = triu(h_C)
@test C ≈ h_C
end
@testset "xt_herk gpu" begin
- A = rand(elty,m,m)
+ A = rand(elty, m, m)
d_A = CuArray(A)
synchronize()
- d_C = CUBLAS.xt_herk('U','N',d_A)
- C = A*A'
+ d_C = CUBLAS.xt_herk('U', 'N', d_A)
+ C = A * A'
C = triu(C)
# move to host and compare
@test d_C isa CuArray
@@ -417,9 +417,9 @@ k = 13
@test C ≈ h_C
end
@testset "xt_herk cpu" begin
- A = rand(elty,m,m)
- h_C = CUBLAS.xt_herk('U','N',copy(A))
- C = A*A'
+ A = rand(elty, m, m)
+ h_C = CUBLAS.xt_herk('U', 'N', copy(A))
+ C = A * A'
C = triu(C)
# move to host and compare
@test h_C isa Array
@@ -432,16 +432,16 @@ k = 13
# generate parameters
α = rand(elty1)
β = rand(elty2)
- A = rand(elty,m,k)
- B = rand(elty,m,k)
+ A = rand(elty, m, k)
+ B = rand(elty, m, k)
d_A = CuArray(A)
d_B = CuArray(B)
- C = rand(elty,m,m)
+ C = rand(elty, m, m)
C = C + C'
d_C = CuArray(C)
- C = α*(A*B') + conj(α)*(B*A') + β*C
+ C = α * (A * B') + conj(α) * (B * A') + β * C
synchronize()
- CUBLAS.xt_her2k!('U','N',α,d_A,d_B,β,d_C)
+ CUBLAS.xt_her2k!('U', 'N', α, d_A, d_B, β, d_C)
# move back to host and compare
C = triu(C)
h_C = Array(d_C)
@@ -454,13 +454,13 @@ k = 13
# generate parameters
α = rand(elty1)
β = rand(elty2)
- A = rand(elty,m,k)
- B = rand(elty,m,k)
- C = rand(elty,m,m)
+ A = rand(elty, m, k)
+ B = rand(elty, m, k)
+ C = rand(elty, m, m)
C = C + C'
h_C = copy(C)
- C = α*(A*B') + conj(α)*(B*A') + β*C
- CUBLAS.xt_her2k!('U','N',α,A,B,β,h_C)
+ C = α * (A * B') + conj(α) * (B * A') + β * C
+ CUBLAS.xt_her2k!('U', 'N', α, A, B, β, h_C)
# move back to host and compare
C = triu(C)
h_C = triu(h_C)
@@ -468,15 +468,15 @@ k = 13
end
@testset "xt_her2k gpu" begin
# generate parameters
- A = rand(elty,m,k)
- B = rand(elty,m,k)
+ A = rand(elty, m, k)
+ B = rand(elty, m, k)
d_A = CuArray(A)
d_B = CuArray(B)
- C = rand(elty,m,m)
+ C = rand(elty, m, m)
C = C + C'
- C = (A*B') + (B*A')
+ C = (A * B') + (B * A')
synchronize()
- d_C = CUBLAS.xt_her2k('U','N',d_A,d_B)
+ d_C = CUBLAS.xt_her2k('U', 'N', d_A, d_B)
# move back to host and compare
C = triu(C)
@test d_C isa CuArray
@@ -485,13 +485,13 @@ k = 13
@test C ≈ h_C
end
@testset "xt_her2k cpu" begin
- A = rand(elty,m,k)
- B = rand(elty,m,k)
- C = rand(elty,m,m)
+ A = rand(elty, m, k)
+ B = rand(elty, m, k)
+ C = rand(elty, m, m)
# generate parameters
C = C + C'
- C = (A*B') + (B*A')
- h_C = CUBLAS.xt_her2k('U','N',A,B)
+ C = (A * B') + (B * A')
+ h_C = CUBLAS.xt_her2k('U', 'N', A, B)
# move back to host and compare
@test h_C isa Array
C = triu(C)
@@ -499,12 +499,12 @@ k = 13
@test C ≈ h_C
end
@testset "her2k" begin
- A = rand(elty,m,k)
- B = rand(elty,m,k)
+ A = rand(elty, m, k)
+ B = rand(elty, m, k)
d_A = CuArray(A)
d_B = CuArray(B)
- C = A*B' + B*A'
- d_C = CUBLAS.her2k('U','N',d_A,d_B)
+ C = A * B' + B * A'
+ d_C = CUBLAS.her2k('U', 'N', d_A, d_B)
# move back to host and compare
C = triu(C)
h_C = Array(d_C) |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2610 +/- ##
==========================================
- Coverage 73.61% 73.61% -0.01%
==========================================
Files 157 157
Lines 15223 15227 +4
==========================================
+ Hits 11207 11209 +2
- Misses 4016 4018 +2 ☔ View full report in Codecov by Sentry. |
Testing locally, the level 3 and split-out level 3 GEMM-y tests seem to take the same amount of time. Should help with parallelization. Also removed an extraneous comment.