Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move BFloat16 code out of extension #536

Merged
merged 2 commits into from
Feb 6, 2025
Merged

Move BFloat16 code out of extension #536

merged 2 commits into from
Feb 6, 2025

Conversation

christiangnrd
Copy link
Contributor

@christiangnrd christiangnrd commented Feb 5, 2025

This is the first half of #446.

Separated out partly to see the benchmarks. Update: No performance difference!

Impact to import/precompile time should be minimal and the decreased code complexity is probably worth the tradeoff.

Copy link
Contributor

github-actions bot commented Feb 5, 2025

Your PR no longer requires formatting changes. Thank you for your contribution!

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metal Benchmarks

Benchmark suite Current: 91dca03 Previous: 924a130 Ratio
private array/construct 23527.833333333332 ns 25257 ns 0.93
private array/broadcast 458875 ns 463958 ns 0.99
private array/random/randn/Float32 795041 ns 760208 ns 1.05
private array/random/randn!/Float32 610500 ns 624687.5 ns 0.98
private array/random/rand!/Int64 551333 ns 563395.5 ns 0.98
private array/random/rand!/Float32 571167 ns 587854 ns 0.97
private array/random/rand/Int64 737833 ns 755083 ns 0.98
private array/random/rand/Float32 586750 ns 611333 ns 0.96
private array/copyto!/gpu_to_gpu 647146 ns 650916.5 ns 0.99
private array/copyto!/cpu_to_gpu 764624.5 ns 820417 ns 0.93
private array/copyto!/gpu_to_cpu 620375 ns 686687.5 ns 0.90
private array/accumulate/1d 1341334 ns 1362854.5 ns 0.98
private array/accumulate/2d 1381667 ns 1404375 ns 0.98
private array/iteration/findall/int 2081000 ns 2111541.5 ns 0.99
private array/iteration/findall/bool 1830750 ns 1849958 ns 0.99
private array/iteration/findfirst/int 1684896.5 ns 1715916 ns 0.98
private array/iteration/findfirst/bool 1657249.5 ns 1673542 ns 0.99
private array/iteration/scalar 3900291.5 ns 3935958 ns 0.99
private array/iteration/logical 3161333.5 ns 3227791 ns 0.98
private array/iteration/findmin/1d 1755916 ns 1778916.5 ns 0.99
private array/iteration/findmin/2d 1347625 ns 1352041 ns 1.00
private array/reductions/reduce/1d 1034083 ns 1043458 ns 0.99
private array/reductions/reduce/2d 664479.5 ns 669709 ns 0.99
private array/reductions/mapreduce/1d 1038270.5 ns 1052375 ns 0.99
private array/reductions/mapreduce/2d 660792 ns 675854.5 ns 0.98
private array/permutedims/4d 2491125 ns 2557249.5 ns 0.97
private array/permutedims/2d 1002084 ns 1026042 ns 0.98
private array/permutedims/3d 1582042 ns 1594916 ns 0.99
private array/copy 583917 ns 569146 ns 1.03
latency/precompile 8814246708 ns 8851374292 ns 1.00
latency/ttfp 3612261417 ns 3620472000 ns 1.00
latency/import 1237185791 ns 1235528208 ns 1.00
integration/metaldevrt 709250 ns 717959 ns 0.99
integration/byval/slices=1 1581250 ns 1539333 ns 1.03
integration/byval/slices=3 8888209 ns 9117125 ns 0.97
integration/byval/reference 1583000 ns 1560292 ns 1.01
integration/byval/slices=2 2645542 ns 2731458.5 ns 0.97
kernel/indexing 453708 ns 457750 ns 0.99
kernel/indexing_checked 448667 ns 455833 ns 0.98
kernel/launch 8125 ns 8166 ns 0.99
metal/synchronization/stream 14667 ns 14708 ns 1.00
metal/synchronization/context 14542 ns 14791 ns 0.98
shared array/construct 23527.75 ns 25149.333333333332 ns 0.94
shared array/broadcast 456520.5 ns 464959 ns 0.98
shared array/random/randn/Float32 834208 ns 835792 ns 1.00
shared array/random/randn!/Float32 616792 ns 635500 ns 0.97
shared array/random/rand!/Int64 550208 ns 560542 ns 0.98
shared array/random/rand!/Float32 586833 ns 598750 ns 0.98
shared array/random/rand/Int64 786708.5 ns 765583 ns 1.03
shared array/random/rand/Float32 593604.5 ns 604979 ns 0.98
shared array/copyto!/gpu_to_gpu 79291 ns 79167 ns 1.00
shared array/copyto!/cpu_to_gpu 87167 ns 82250 ns 1.06
shared array/copyto!/gpu_to_cpu 82500 ns 83000 ns 0.99
shared array/accumulate/1d 1346750 ns 1366937.5 ns 0.99
shared array/accumulate/2d 1385333 ns 1400458 ns 0.99
shared array/iteration/findall/int 1795209 ns 1854249.5 ns 0.97
shared array/iteration/findall/bool 1596417 ns 1619625 ns 0.99
shared array/iteration/findfirst/int 1400916 ns 1384771 ns 1.01
shared array/iteration/findfirst/bool 1365542 ns 1375770.5 ns 0.99
shared array/iteration/scalar 155000 ns 155000 ns 1
shared array/iteration/logical 2970604.5 ns 3012021 ns 0.99
shared array/iteration/findmin/1d 1449292 ns 1470417 ns 0.99
shared array/iteration/findmin/2d 1351458 ns 1355875 ns 1.00
shared array/reductions/reduce/1d 723333 ns 729437.5 ns 0.99
shared array/reductions/reduce/2d 660000 ns 674104.5 ns 0.98
shared array/reductions/mapreduce/1d 733125 ns 743833 ns 0.99
shared array/reductions/mapreduce/2d 669125 ns 658375 ns 1.02
shared array/permutedims/4d 2540729 ns 2575708 ns 0.99
shared array/permutedims/2d 1013854.5 ns 1011875 ns 1.00
shared array/permutedims/3d 1572250 ns 1573854 ns 1.00
shared array/copy 246417 ns 246417 ns 1

This comment was automatically generated by workflow using github-action-benchmark.

@maleadt maleadt merged commit a1c0b8f into main Feb 6, 2025
7 checks passed
@maleadt maleadt deleted the byeext branch February 6, 2025 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants