Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split linalg tests into multiple files #2609

Merged
merged 6 commits into from
Jan 8, 2025
Merged

Split linalg tests into multiple files #2609

merged 6 commits into from
Jan 8, 2025

Conversation

kshyatt
Copy link
Contributor

@kshyatt kshyatt commented Jan 7, 2025

Testing locally showed that we don't have undefined variable errors. More could likely be done here to decouple testsets within each level but this does allow us to run the two most expensive groups (level 2 and level 3) in parallel.

@kshyatt kshyatt added cuda libraries Stuff about CUDA library wrappers. tests Adds or changes tests. labels Jan 7, 2025
@kshyatt kshyatt requested a review from maleadt January 7, 2025 22:12
@kshyatt
Copy link
Contributor Author

kshyatt commented Jan 7, 2025

Sorry for the mega-diff btw.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: eba40e5 Previous: a0c2f4b Ratio
latency/precompile 45274142645.5 ns 45297385295 ns 1.00
latency/ttfp 6426012668 ns 6375596178 ns 1.01
latency/import 3053958391 ns 3036561495 ns 1.01
integration/volumerhs 9572279 ns 9567419 ns 1.00
integration/byval/slices=1 146444 ns 146746 ns 1.00
integration/byval/slices=3 425081 ns 425517.5 ns 1.00
integration/byval/reference 144620 ns 145010 ns 1.00
integration/byval/slices=2 285740 ns 286216 ns 1.00
integration/cudadevrt 103371 ns 103513 ns 1.00
kernel/indexing 14007 ns 14419 ns 0.97
kernel/indexing_checked 15224 ns 15499 ns 0.98
kernel/occupancy 696.9527027027027 ns 748.2734375 ns 0.93
kernel/launch 2103.4 ns 2194.6666666666665 ns 0.96
kernel/rand 14733 ns 17335 ns 0.85
array/reverse/1d 19399 ns 19412 ns 1.00
array/reverse/2d 24822 ns 24576 ns 1.01
array/reverse/1d_inplace 10033.333333333334 ns 11029 ns 0.91
array/reverse/2d_inplace 10790 ns 13223 ns 0.82
array/copy 20061 ns 20740 ns 0.97
array/iteration/findall/int 157486 ns 158179 ns 1.00
array/iteration/findall/bool 138323 ns 138583 ns 1.00
array/iteration/findfirst/int 152810 ns 153423 ns 1.00
array/iteration/findfirst/bool 154073.5 ns 154821 ns 1.00
array/iteration/scalar 76401 ns 77451 ns 0.99
array/iteration/logical 211056 ns 216735 ns 0.97
array/iteration/findmin/1d 40626 ns 41556.5 ns 0.98
array/iteration/findmin/2d 93579.5 ns 94128 ns 0.99
array/reductions/reduce/1d 38254 ns 42013 ns 0.91
array/reductions/reduce/2d 46399.5 ns 51911 ns 0.89
array/reductions/mapreduce/1d 36615 ns 39275 ns 0.93
array/reductions/mapreduce/2d 48354.5 ns 49505.5 ns 0.98
array/broadcast 21058 ns 21668 ns 0.97
array/copyto!/gpu_to_gpu 11424 ns 11569 ns 0.99
array/copyto!/cpu_to_gpu 209189 ns 211873 ns 0.99
array/copyto!/gpu_to_cpu 243384 ns 245423 ns 0.99
array/accumulate/1d 108525 ns 108388.5 ns 1.00
array/accumulate/2d 79667.5 ns 79823 ns 1.00
array/construct 1203.7 ns 1208.35 ns 1.00
array/random/randn/Float32 42407 ns 43873.5 ns 0.97
array/random/randn!/Float32 26229 ns 25937 ns 1.01
array/random/rand!/Int64 26884 ns 27271 ns 0.99
array/random/rand!/Float32 8450.666666666666 ns 8766.666666666666 ns 0.96
array/random/rand/Int64 29654 ns 29637 ns 1.00
array/random/rand/Float32 12550 ns 12723 ns 0.99
array/permutedims/4d 66640 ns 66923 ns 1.00
array/permutedims/2d 56190 ns 56439 ns 1.00
array/permutedims/3d 58685 ns 58867 ns 1.00
array/sorting/1d 2919790 ns 2933352 ns 1.00
array/sorting/by 3482848 ns 3500830 ns 0.99
array/sorting/2d 1079724 ns 1085059 ns 1.00
cuda/synchronization/stream/auto 1029.4 ns 1038.4 ns 0.99
cuda/synchronization/stream/nonblocking 6570.8 ns 6432 ns 1.02
cuda/synchronization/stream/blocking 812.8061224489796 ns 807.5918367346939 ns 1.01
cuda/synchronization/context/auto 1172.6 ns 1194.1 ns 0.98
cuda/synchronization/context/nonblocking 6811.6 ns 6649.8 ns 1.02
cuda/synchronization/context/blocking 888.8913043478261 ns 886.6415094339623 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

codecov bot commented Jan 8, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.59%. Comparing base (792aec5) to head (7f8cc3f).

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2609       +/-   ##
===========================================
+ Coverage    9.27%   73.59%   +64.31%     
===========================================
  Files         157      157               
  Lines       15025    15207      +182     
===========================================
+ Hits         1394    11191     +9797     
+ Misses      13631     4016     -9615     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@maleadt
Copy link
Member

maleadt commented Jan 8, 2025

Moving them into a subdirectory.


Before:

Test                    (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) | GC (s) | GC % | Alloc (MB) | RSS (MB) |
libraries/cublas  (2) |   183.98 |   0.02 |  0.0 |      43.14 |   612.00 |   3.16 |  1.7 |   22655.27 |  3237.44 |

After:

Test                    (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) | GC (s) | GC % | Alloc (MB) | RSS (MB) |
libraries/cublas/level1      (5) |    31.63 |   0.00 |  0.0 |       0.03 |   512.00 |   0.56 |  1.8 |    3639.77 |  1714.91 |
libraries/cublas/level2      (3) |    61.55 |   0.01 |  0.0 |       1.06 |   588.00 |   0.93 |  1.5 |    5854.03 |  1758.51 |
libraries/cublas/level3      (2) |   116.81 |   0.01 |  0.0 |      14.20 |   534.00 |   2.11 |  1.8 |   14023.70 |  2564.25 |
libraries/cublas/extensions  (4) |    66.44 |   0.01 |  0.0 |      29.49 |   586.00 |   1.04 |  1.6 |    6527.81 |  1714.91 |

@maleadt maleadt added the enhancement New feature or request label Jan 8, 2025
@maleadt maleadt merged commit 14ae82d into master Jan 8, 2025
0 of 2 checks passed
@maleadt maleadt deleted the ksh/split branch January 8, 2025 10:05
avik-pal pushed a commit to avik-pal/CUDA.jl that referenced this pull request Jan 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda libraries Stuff about CUDA library wrappers. enhancement New feature or request tests Adds or changes tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants