-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Work around NVML issue on Jetson Orin. #2620
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2620 +/- ##
===========================================
+ Coverage 11.24% 73.50% +62.26%
===========================================
Files 152 157 +5
Lines 14923 15228 +305
===========================================
+ Hits 1678 11194 +9516
+ Misses 13245 4034 -9211 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA.jl Benchmarks
Benchmark suite | Current: 6b7319e | Previous: 774abc6 | Ratio |
---|---|---|---|
latency/precompile |
45436554103.5 ns |
45532671418 ns |
1.00 |
latency/ttfp |
6431238243.5 ns |
6382276443.5 ns |
1.01 |
latency/import |
3058519609.5 ns |
3039078540.5 ns |
1.01 |
integration/volumerhs |
9568375 ns |
9567627 ns |
1.00 |
integration/byval/slices=1 |
147044 ns |
146713 ns |
1.00 |
integration/byval/slices=3 |
426020 ns |
425286 ns |
1.00 |
integration/byval/reference |
145116 ns |
144622 ns |
1.00 |
integration/byval/slices=2 |
286382 ns |
286077 ns |
1.00 |
integration/cudadevrt |
103646 ns |
103283 ns |
1.00 |
kernel/indexing |
14223 ns |
14073 ns |
1.01 |
kernel/indexing_checked |
15450.5 ns |
15126 ns |
1.02 |
kernel/occupancy |
686.1895424836601 ns |
710.5460992907801 ns |
0.97 |
kernel/launch |
2057.6 ns |
2120.3 ns |
0.97 |
kernel/rand |
15051 ns |
14743 ns |
1.02 |
array/reverse/1d |
19755.5 ns |
19325.5 ns |
1.02 |
array/reverse/2d |
25051 ns |
24669 ns |
1.02 |
array/reverse/1d_inplace |
11034 ns |
10913.666666666666 ns |
1.01 |
array/reverse/2d_inplace |
12765 ns |
11253 ns |
1.13 |
array/copy |
21133 ns |
20229 ns |
1.04 |
array/iteration/findall/int |
158802 ns |
157863.5 ns |
1.01 |
array/iteration/findall/bool |
139144.5 ns |
138404.5 ns |
1.01 |
array/iteration/findfirst/int |
162357.5 ns |
153375 ns |
1.06 |
array/iteration/findfirst/bool |
165076.5 ns |
154273 ns |
1.07 |
array/iteration/scalar |
76458.5 ns |
75697 ns |
1.01 |
array/iteration/logical |
210618 ns |
212853.5 ns |
0.99 |
array/iteration/findmin/1d |
41806 ns |
41543 ns |
1.01 |
array/iteration/findmin/2d |
94268 ns |
93933.5 ns |
1.00 |
array/reductions/reduce/1d |
40676 ns |
35999 ns |
1.13 |
array/reductions/reduce/2d |
47706 ns |
41907.5 ns |
1.14 |
array/reductions/mapreduce/1d |
38099.5 ns |
33891.5 ns |
1.12 |
array/reductions/mapreduce/2d |
52201.5 ns |
41528 ns |
1.26 |
array/broadcast |
21688.5 ns |
21376 ns |
1.01 |
array/copyto!/gpu_to_gpu |
11721 ns |
11516 ns |
1.02 |
array/copyto!/cpu_to_gpu |
211559 ns |
210665 ns |
1.00 |
array/copyto!/gpu_to_cpu |
244510 ns |
243223.5 ns |
1.01 |
array/accumulate/1d |
108964 ns |
108164 ns |
1.01 |
array/accumulate/2d |
80430 ns |
79823.5 ns |
1.01 |
array/construct |
1223.35 ns |
1284.3 ns |
0.95 |
array/random/randn/Float32 |
44283.5 ns |
49740 ns |
0.89 |
array/random/randn!/Float32 |
26478 ns |
26117 ns |
1.01 |
array/random/rand!/Int64 |
27293 ns |
27030 ns |
1.01 |
array/random/rand!/Float32 |
8848.666666666666 ns |
8836.333333333334 ns |
1.00 |
array/random/rand/Int64 |
30030 ns |
37762.5 ns |
0.80 |
array/random/rand/Float32 |
13023 ns |
13046 ns |
1.00 |
array/permutedims/4d |
67646 ns |
66810 ns |
1.01 |
array/permutedims/2d |
57500 ns |
56518 ns |
1.02 |
array/permutedims/3d |
59592.5 ns |
59273.5 ns |
1.01 |
array/sorting/1d |
2933746 ns |
2933200.5 ns |
1.00 |
array/sorting/by |
3500000.5 ns |
3500043 ns |
1.00 |
array/sorting/2d |
1085592 ns |
1084935 ns |
1.00 |
cuda/synchronization/stream/auto |
1090.4 ns |
1035.9 ns |
1.05 |
cuda/synchronization/stream/nonblocking |
6436.6 ns |
6536.8 ns |
0.98 |
cuda/synchronization/stream/blocking |
804.5 ns |
791.2244897959183 ns |
1.02 |
cuda/synchronization/context/auto |
1183.5 ns |
1182.9 ns |
1.00 |
cuda/synchronization/context/nonblocking |
6649.6 ns |
6769.6 ns |
0.98 |
cuda/synchronization/context/blocking |
920.390243902439 ns |
915.2666666666667 ns |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
With this, tests pass on a Jetson Orin Nano devkit. |
Works around #2580 until we have a proper issue.