-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move BFloat16 code out of extension #536
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Your PR no longer requires formatting changes. Thank you for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metal Benchmarks
Benchmark suite | Current: 91dca03 | Previous: 924a130 | Ratio |
---|---|---|---|
private array/construct |
23527.833333333332 ns |
25257 ns |
0.93 |
private array/broadcast |
458875 ns |
463958 ns |
0.99 |
private array/random/randn/Float32 |
795041 ns |
760208 ns |
1.05 |
private array/random/randn!/Float32 |
610500 ns |
624687.5 ns |
0.98 |
private array/random/rand!/Int64 |
551333 ns |
563395.5 ns |
0.98 |
private array/random/rand!/Float32 |
571167 ns |
587854 ns |
0.97 |
private array/random/rand/Int64 |
737833 ns |
755083 ns |
0.98 |
private array/random/rand/Float32 |
586750 ns |
611333 ns |
0.96 |
private array/copyto!/gpu_to_gpu |
647146 ns |
650916.5 ns |
0.99 |
private array/copyto!/cpu_to_gpu |
764624.5 ns |
820417 ns |
0.93 |
private array/copyto!/gpu_to_cpu |
620375 ns |
686687.5 ns |
0.90 |
private array/accumulate/1d |
1341334 ns |
1362854.5 ns |
0.98 |
private array/accumulate/2d |
1381667 ns |
1404375 ns |
0.98 |
private array/iteration/findall/int |
2081000 ns |
2111541.5 ns |
0.99 |
private array/iteration/findall/bool |
1830750 ns |
1849958 ns |
0.99 |
private array/iteration/findfirst/int |
1684896.5 ns |
1715916 ns |
0.98 |
private array/iteration/findfirst/bool |
1657249.5 ns |
1673542 ns |
0.99 |
private array/iteration/scalar |
3900291.5 ns |
3935958 ns |
0.99 |
private array/iteration/logical |
3161333.5 ns |
3227791 ns |
0.98 |
private array/iteration/findmin/1d |
1755916 ns |
1778916.5 ns |
0.99 |
private array/iteration/findmin/2d |
1347625 ns |
1352041 ns |
1.00 |
private array/reductions/reduce/1d |
1034083 ns |
1043458 ns |
0.99 |
private array/reductions/reduce/2d |
664479.5 ns |
669709 ns |
0.99 |
private array/reductions/mapreduce/1d |
1038270.5 ns |
1052375 ns |
0.99 |
private array/reductions/mapreduce/2d |
660792 ns |
675854.5 ns |
0.98 |
private array/permutedims/4d |
2491125 ns |
2557249.5 ns |
0.97 |
private array/permutedims/2d |
1002084 ns |
1026042 ns |
0.98 |
private array/permutedims/3d |
1582042 ns |
1594916 ns |
0.99 |
private array/copy |
583917 ns |
569146 ns |
1.03 |
latency/precompile |
8814246708 ns |
8851374292 ns |
1.00 |
latency/ttfp |
3612261417 ns |
3620472000 ns |
1.00 |
latency/import |
1237185791 ns |
1235528208 ns |
1.00 |
integration/metaldevrt |
709250 ns |
717959 ns |
0.99 |
integration/byval/slices=1 |
1581250 ns |
1539333 ns |
1.03 |
integration/byval/slices=3 |
8888209 ns |
9117125 ns |
0.97 |
integration/byval/reference |
1583000 ns |
1560292 ns |
1.01 |
integration/byval/slices=2 |
2645542 ns |
2731458.5 ns |
0.97 |
kernel/indexing |
453708 ns |
457750 ns |
0.99 |
kernel/indexing_checked |
448667 ns |
455833 ns |
0.98 |
kernel/launch |
8125 ns |
8166 ns |
0.99 |
metal/synchronization/stream |
14667 ns |
14708 ns |
1.00 |
metal/synchronization/context |
14542 ns |
14791 ns |
0.98 |
shared array/construct |
23527.75 ns |
25149.333333333332 ns |
0.94 |
shared array/broadcast |
456520.5 ns |
464959 ns |
0.98 |
shared array/random/randn/Float32 |
834208 ns |
835792 ns |
1.00 |
shared array/random/randn!/Float32 |
616792 ns |
635500 ns |
0.97 |
shared array/random/rand!/Int64 |
550208 ns |
560542 ns |
0.98 |
shared array/random/rand!/Float32 |
586833 ns |
598750 ns |
0.98 |
shared array/random/rand/Int64 |
786708.5 ns |
765583 ns |
1.03 |
shared array/random/rand/Float32 |
593604.5 ns |
604979 ns |
0.98 |
shared array/copyto!/gpu_to_gpu |
79291 ns |
79167 ns |
1.00 |
shared array/copyto!/cpu_to_gpu |
87167 ns |
82250 ns |
1.06 |
shared array/copyto!/gpu_to_cpu |
82500 ns |
83000 ns |
0.99 |
shared array/accumulate/1d |
1346750 ns |
1366937.5 ns |
0.99 |
shared array/accumulate/2d |
1385333 ns |
1400458 ns |
0.99 |
shared array/iteration/findall/int |
1795209 ns |
1854249.5 ns |
0.97 |
shared array/iteration/findall/bool |
1596417 ns |
1619625 ns |
0.99 |
shared array/iteration/findfirst/int |
1400916 ns |
1384771 ns |
1.01 |
shared array/iteration/findfirst/bool |
1365542 ns |
1375770.5 ns |
0.99 |
shared array/iteration/scalar |
155000 ns |
155000 ns |
1 |
shared array/iteration/logical |
2970604.5 ns |
3012021 ns |
0.99 |
shared array/iteration/findmin/1d |
1449292 ns |
1470417 ns |
0.99 |
shared array/iteration/findmin/2d |
1351458 ns |
1355875 ns |
1.00 |
shared array/reductions/reduce/1d |
723333 ns |
729437.5 ns |
0.99 |
shared array/reductions/reduce/2d |
660000 ns |
674104.5 ns |
0.98 |
shared array/reductions/mapreduce/1d |
733125 ns |
743833 ns |
0.99 |
shared array/reductions/mapreduce/2d |
669125 ns |
658375 ns |
1.02 |
shared array/permutedims/4d |
2540729 ns |
2575708 ns |
0.99 |
shared array/permutedims/2d |
1013854.5 ns |
1011875 ns |
1.00 |
shared array/permutedims/3d |
1572250 ns |
1573854 ns |
1.00 |
shared array/copy |
246417 ns |
246417 ns |
1 |
This comment was automatically generated by workflow using github-action-benchmark.
tgymnich
approved these changes
Feb 5, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is the first half of #446.
Separated out partly to see the benchmarks. Update: No performance difference!
Impact to import/precompile time should be minimal and the decreased code complexity is probably worth the tradeoff.