-
Notifications
You must be signed in to change notification settings - Fork 44
Pi
and e
to Float32
and Float16
#559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/src/device/intrinsics/math.jl b/src/device/intrinsics/math.jl
index 8c797363..bc92b054 100644
--- a/src/device/intrinsics/math.jl
+++ b/src/device/intrinsics/math.jl
@@ -11,8 +11,8 @@ import Core: Float16, Float32
### Constants
# π and ℯ
-for T in (:Float16,:Float32), R in (RoundUp, RoundDown), irr in (π, ℯ)
- @eval @device_override $T(::typeof($irr), ::typeof($R)) = $@eval($T($irr,$R))
+for T in (:Float16, :Float32), R in (RoundUp, RoundDown), irr in (π, ℯ)
+ @eval @device_override $T(::typeof($irr), ::typeof($R)) = $@eval($T($irr, $R))
end
### Common Intrinsics
diff --git a/test/device/intrinsics/math.jl b/test/device/intrinsics/math.jl
index ac1d7e9e..8e8c07ef 100644
--- a/test/device/intrinsics/math.jl
+++ b/test/device/intrinsics/math.jl
@@ -312,20 +312,20 @@ end
@test occursin(Regex("@air\\.sign\\.f$(8*sizeof(T))"), ir)
end
- # Borrowed from the Julia "Irrationals compared with Rationals and Floats" testset
- @testset "Comparisons with $irr" for irr in (π, ℯ)
- @eval function convert_test(res)
- res[1] = $T($irr, RoundDown) < $irr
- res[2] = $T($irr, RoundUp) > $irr
- res[3] = !($T($irr, RoundDown) > $irr)
- res[4] = !($T($irr, RoundUp) < $irr)
- return nothing
+ # Borrowed from the Julia "Irrationals compared with Rationals and Floats" testset
+ @testset "Comparisons with $irr" for irr in (π, ℯ)
+ @eval function convert_test(res)
+ res[1] = $T($irr, RoundDown) < $irr
+ res[2] = $T($irr, RoundUp) > $irr
+ res[3] = !($T($irr, RoundDown) > $irr)
+ res[4] = !($T($irr, RoundUp) < $irr)
+ return nothing
+ end
+
+ res = MtlArray(zeros(Bool, 4))
+ Metal.@sync @metal convert_test(res)
+ @test all(Array(res))
end
-
- res = MtlArray(zeros(Bool, 4))
- Metal.@sync @metal convert_test(res)
- @test all(Array(res))
- end
end
end
|
The Metal constants seem to have the same values as
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metal Benchmarks
Benchmark suite | Current: 9a13542 | Previous: de0e3f9 | Ratio |
---|---|---|---|
private array/construct |
24524.25 ns |
26263.833333333332 ns |
0.93 |
private array/broadcast |
457542 ns |
460542 ns |
0.99 |
private array/random/randn/Float32 |
762750 ns |
872604.5 ns |
0.87 |
private array/random/randn!/Float32 |
633709 ns |
633542 ns |
1.00 |
private array/random/rand!/Int64 |
571750 ns |
566625 ns |
1.01 |
private array/random/rand!/Float32 |
594458 ns |
601041 ns |
0.99 |
private array/random/rand/Int64 |
752167 ns |
798417 ns |
0.94 |
private array/random/rand/Float32 |
594834 ns |
595458 ns |
1.00 |
private array/copyto!/gpu_to_gpu |
667833.5 ns |
664459 ns |
1.01 |
private array/copyto!/cpu_to_gpu |
838229 ns |
703333 ns |
1.19 |
private array/copyto!/gpu_to_cpu |
837625 ns |
839229 ns |
1.00 |
private array/accumulate/1d |
1337625 ns |
1333500 ns |
1.00 |
private array/accumulate/2d |
1387833 ns |
1392333 ns |
1.00 |
private array/iteration/findall/int |
2077250 ns |
2059104.5 ns |
1.01 |
private array/iteration/findall/bool |
1830875 ns |
1834875 ns |
1.00 |
private array/iteration/findfirst/int |
1697083 ns |
1705958 ns |
0.99 |
private array/iteration/findfirst/bool |
1665875 ns |
1674708.5 ns |
0.99 |
private array/iteration/scalar |
3980604 ns |
3659250 ns |
1.09 |
private array/iteration/logical |
3178042 ns |
3209875 ns |
0.99 |
private array/iteration/findmin/1d |
1721958 ns |
1759750 ns |
0.98 |
private array/iteration/findmin/2d |
1352375 ns |
1356084 ns |
1.00 |
private array/reductions/reduce/1d |
1039791 ns |
1053000 ns |
0.99 |
private array/reductions/reduce/2d |
665334 ns |
661270.5 ns |
1.01 |
private array/reductions/mapreduce/1d |
1049500 ns |
1047041 ns |
1.00 |
private array/reductions/mapreduce/2d |
655209 ns |
664792 ns |
0.99 |
private array/permutedims/4d |
2526833.5 ns |
2507750 ns |
1.01 |
private array/permutedims/2d |
1019917 ns |
1021042 ns |
1.00 |
private array/permutedims/3d |
1571625 ns |
1571541.5 ns |
1.00 |
private array/copy |
577292 ns |
595479 ns |
0.97 |
latency/precompile |
9103113708 ns |
9065410083 ns |
1.00 |
latency/ttfp |
5136810374.5 ns |
3628496208 ns |
1.42 |
latency/import |
1242628667 ns |
1243334625 ns |
1.00 |
integration/metaldevrt |
714375 ns |
735687.5 ns |
0.97 |
integration/byval/slices=1 |
1507229 ns |
1580375 ns |
0.95 |
integration/byval/slices=3 |
8671417 ns |
11156583.5 ns |
0.78 |
integration/byval/reference |
1506708 ns |
1579979.5 ns |
0.95 |
integration/byval/slices=2 |
2639375 ns |
2627208 ns |
1.00 |
kernel/indexing |
476041.5 ns |
475541 ns |
1.00 |
kernel/indexing_checked |
471542 ns |
473334 ns |
1.00 |
kernel/launch |
36229.25 ns |
10639 ns |
3.41 |
metal/synchronization/stream |
14292 ns |
14833 ns |
0.96 |
metal/synchronization/context |
14209 ns |
15084 ns |
0.94 |
shared array/construct |
24475.75 ns |
24395.833333333332 ns |
1.00 |
shared array/broadcast |
461333 ns |
463333 ns |
1.00 |
shared array/random/randn/Float32 |
813709 ns |
838333 ns |
0.97 |
shared array/random/randn!/Float32 |
643667 ns |
638084 ns |
1.01 |
shared array/random/rand!/Int64 |
579500 ns |
583958 ns |
0.99 |
shared array/random/rand!/Float32 |
605917 ns |
596583 ns |
1.02 |
shared array/random/rand/Int64 |
793542 ns |
788708 ns |
1.01 |
shared array/random/rand/Float32 |
576000 ns |
643500 ns |
0.90 |
shared array/copyto!/gpu_to_gpu |
84000 ns |
81042 ns |
1.04 |
shared array/copyto!/cpu_to_gpu |
83709 ns |
82834 ns |
1.01 |
shared array/copyto!/gpu_to_cpu |
77042 ns |
83000 ns |
0.93 |
shared array/accumulate/1d |
1351250 ns |
1355208 ns |
1.00 |
shared array/accumulate/2d |
1391792 ns |
1399333 ns |
0.99 |
shared array/iteration/findall/int |
1825666 ns |
1860312.5 ns |
0.98 |
shared array/iteration/findall/bool |
1594083.5 ns |
1589563 ns |
1.00 |
shared array/iteration/findfirst/int |
1390312 ns |
1408437.5 ns |
0.99 |
shared array/iteration/findfirst/bool |
1364666.5 ns |
1374521 ns |
0.99 |
shared array/iteration/scalar |
151917 ns |
160500 ns |
0.95 |
shared array/iteration/logical |
2988000 ns |
2989208 ns |
1.00 |
shared array/iteration/findmin/1d |
1492396.5 ns |
1472959 ns |
1.01 |
shared array/iteration/findmin/2d |
1367125 ns |
1371958.5 ns |
1.00 |
shared array/reductions/reduce/1d |
737625 ns |
742458 ns |
0.99 |
shared array/reductions/reduce/2d |
670000.5 ns |
676125 ns |
0.99 |
shared array/reductions/mapreduce/1d |
734166 ns |
741958 ns |
0.99 |
shared array/reductions/mapreduce/2d |
670874.5 ns |
673646 ns |
1.00 |
shared array/permutedims/4d |
2537687.5 ns |
2503646 ns |
1.01 |
shared array/permutedims/2d |
1011166 ns |
1021750 ns |
0.99 |
shared array/permutedims/3d |
1617625 ns |
1584000 ns |
1.02 |
shared array/copy |
242417 ns |
233937.5 ns |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
3bddc3d
to
198feed
Compare
What do you mean with the Metal constants? As coincidentally being discussed in JuliaGPU/CUDA.jl#2644 (comment), I guess it's better to be consistent with other Julia code rather than with Metal C. |
Latest push makes the Metal behaviour the same as cpu behaviour at least for comparisons. |
src/device/intrinsics/math.jl
Outdated
### Constants | ||
# π | ||
@device_override Core.Float32(::typeof(π), ::RoundingMode) = reinterpret(Float32, 0x40490fdb) # 3.1415927f0 reinterpret(UInt32,Float32(reinterpret(Float64,0x400921FB60000000))) | ||
@device_override Core.Float32(::typeof(π), ::RoundingMode{:Down}) = reinterpret(Float32, 0x40490fda) # 3.1415925f0 prevfloat(reinterpret(UInt32,Float32(reinterpret(Float64,0x400921FB60000000)))) | ||
@device_override Core.Float16(::typeof(π), ::RoundingMode{:Up}) = reinterpret(Float16, 0x4249) # Float16(3.143) | ||
@device_override Core.Float16(::typeof(π), ::RoundingMode) = reinterpret(Float16, 0x4248) # Float16(3.14) | ||
|
||
# ℯ | ||
@device_override Core.Float32(::typeof(ℯ), ::RoundingMode{:Up}) = reinterpret(Float32, 0x402df855) # 2.718282f0 nextfloat(reinterpret(UInt32,Float32(reinterpret(Float64,0x4005BF0A80000000)))) | ||
@device_override Core.Float32(::typeof(ℯ), ::RoundingMode) = reinterpret(Float32, 0x402df854) # 2.7182817f0 reinterpret(UInt32,Float32(reinterpret(Float64,0x4005BF0A80000000))) | ||
@device_override Core.Float16(::typeof(ℯ), ::RoundingMode) = reinterpret(Float16, 0x4170) # Float16(2.719) | ||
@device_override Core.Float16(::typeof(ℯ), ::RoundingMode{:Down}) = reinterpret(Float16, 0x416f) # Float16(2.717) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to generate those definitions with some metaprogramming, computing the constants on the fly, instead of hard-coding them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best I could come up with (includes a definition for cpu that woudn't make it to the PR:
macro _const_convert(irr,T,r)
:($T($irr, $r))
end
for T in (:Float32, :Float16), irr in (:π, :ℯ), r in (:RoundUp, :RoundDown)
@eval begin
@device_override $T(::typeof($irr), ::typeof($r)) = @_const_convert($irr, $T, $r)
end
end
And while maybe not the best approach, the @code_llvm for the CPU is:
; Function Signature: newFloat32(Base.Irrational{:π}, Base.Rounding.RoundingMode{:Up})
; @ REPL[9]:4 within `newFloat32`
define float @julia_newFloat32_6871() #0 {
top:
ret float 0x400921FB60000000
}
But when I try to run it, I get a GPUCompiler error:
julia> @device_code_llvm @metal convert_test_32(res_32)
; GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}(MethodInstance for convert_test_32(::MtlDeviceVector{Bool, 1}), CompilerConfig for GPUCompiler.MetalCompilerTarget, 0x0000000000006877)
ERROR: old function still has uses (via a constant expr)
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] add_global_address_spaces!(job::GPUCompiler.CompilerJob, mod::LLVM.Module, entry::LLVM.Function)
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/metal.jl:414
[3] finish_ir!(job::GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget}, mod::LLVM.Module, entry::LLVM.Function)
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/metal.jl:166
[4] finish_ir!(job::GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, mod::LLVM.Module, entry::LLVM.Function)
@ Metal ~/.julia/dev/Metal/src/compiler/compilation.jl:14
[5] macro expansion
@ ~/.julia/dev/GPUCompiler/src/driver.jl:284 [inlined]
[6] emit_llvm(job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/utils.jl:110
[7] emit_llvm(job::GPUCompiler.CompilerJob)
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/utils.jl:108
[8] compile_unhooked(output::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/driver.jl:95
[9] compile_unhooked
@ ~/.julia/dev/GPUCompiler/src/driver.jl:80 [inlined]
[10] compile(target::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/driver.jl:67
[11] compile
@ ~/.julia/dev/GPUCompiler/src/driver.jl:55 [inlined]
[12] (::GPUCompiler.var"#235#236"{Bool, Symbol, Bool, GPUCompiler.CompilerJob{…}, GPUCompiler.CompilerConfig{…}})(ctx::Context)
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/reflection.jl:191
[13] JuliaContext(f::GPUCompiler.var"#235#236"{Bool, Symbol, Bool, GPUCompiler.CompilerJob{…}, GPUCompiler.CompilerConfig{…}}; kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/driver.jl:34
[14] JuliaContext(f::Function)
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/driver.jl:25
[15] code_llvm(io::Base.TTY, job::GPUCompiler.CompilerJob; optimize::Bool, raw::Bool, debuginfo::Symbol, dump_module::Bool, kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/reflection.jl:190
[16] code_llvm
@ ~/.julia/dev/GPUCompiler/src/reflection.jl:186 [inlined]
[17] (::GPUCompiler.var"#hook#246"{GPUCompiler.var"#hook#245#247"})(job::GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}; io::Base.TTY, kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/reflection.jl:337
[18] (::GPUCompiler.var"#hook#246"{GPUCompiler.var"#hook#245#247"})(job::GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams})
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/reflection.jl:335
[19] var"#3#outer_hook"(job::GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams})
@ Main ~/.julia/dev/GPUCompiler/src/reflection.jl:246
[20] compile(target::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/driver.jl:64
[21] compile
@ ~/.julia/dev/GPUCompiler/src/driver.jl:55 [inlined]
[22] (::Metal.var"#155#163"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}})(ctx::Context)
@ Metal ~/.julia/dev/Metal/src/compiler/compilation.jl:108
[23] JuliaContext(f::Metal.var"#155#163"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}}; kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/driver.jl:34
[24] JuliaContext(f::Function)
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/driver.jl:25
[25] macro expansion
@ ~/.julia/dev/Metal/src/compiler/compilation.jl:107 [inlined]
[26] macro expansion
@ ~/.julia/packages/ObjectiveC/TgrW6/src/os.jl:264 [inlined]
[27] compile(job::GPUCompiler.CompilerJob)
@ Metal ~/.julia/dev/Metal/src/compiler/compilation.jl:105
[28] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(Metal.compile), linker::typeof(Metal.link))
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/execution.jl:245
[29] cached_compilation(cache::Dict{Any, Any}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, compiler::Function, linker::Function)
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/execution.jl:159
[30] macro expansion
@ ~/.julia/dev/Metal/src/compiler/execution.jl:189 [inlined]
[31] macro expansion
@ ./lock.jl:273 [inlined]
[32] mtlfunction(f::typeof(convert_test_32), tt::Type{Tuple{MtlDeviceVector{Bool, 1}}}; name::Nothing, kwargs::@Kwargs{})
@ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:184
[33] mtlfunction(f::typeof(convert_test_32), tt::Type{Tuple{MtlDeviceVector{Bool, 1}}})
@ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:182
[34] macro expansion
@ ~/.julia/dev/Metal/src/compiler/execution.jl:85 [inlined]
[35] top-level scope
@ ~/.julia/dev/GPUCompiler/src/reflection.jl:257
[36] top-level scope
@ ~/.julia/dev/Metal/src/initialization.jl:79
Some type information was truncated. Use `show(err)` to see complete types.
While writing this I also tried:
for T in (:Float32, :Float16), irr in (:π, :ℯ), r in (:RoundUp, :RoundDown)
@eval begin
@device_override $T(::typeof($irr), ::typeof($r)) = Base.Rounding._convert_rounding($T, $irr, $r)
end
end
But that also gives the "constant expression still has uses" error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This errors in some really weird ways when running tests in CI and locally, so here's a copy-paste code snippet:
using Metal; begin
function convert_test_32(res)
res[1] = Float32(irr,RoundDown) < irr
res[2] = Float32(irr,RoundUp) > irr
res[3] = !(Float32(irr,RoundDown) > irr)
res[4] = !(Float32(irr,RoundUp) < irr)
return nothing
end
res_32 = MtlArray(zeros(Bool,4))
Metal.@sync @metal convert_test_32(res_32)
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@device_override $T(::typeof($irr), ::typeof($r)) = @_const_convert($irr, $T, $r)
Why the need for an extra macro? Can't you @eval @device_override Core.Float32($(...), $(...)) = reinterpret(Float32, $(reinterpret(UInt32, Float32(...)))
, i.e., generating and embedding the bit pattern in the AST?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally misunderstood what you were asking. I just pushed a commit that should work once synchronization problems are resolved (#564)
79f5197
to
1b74420
Compare
You're still hard-coding bit values, which you could evaluate at compile time. I wanted to demonstrate that, but it looks like the conversions here just work as intended? julia> @code_llvm Float32(ℯ)
; Function Signature: (::Type{Float32})(Base.Irrational{:ℯ})
; @ irrationals.jl:252 within `Float32`
define float @julia_Float32_1770() #0 {
top:
ret float 0x4005BF0A80000000
}
julia> @code_llvm Float32(pi)
; Function Signature: (::Type{Float32})(Base.Irrational{:π})
; @ irrationals.jl:252 within `Float32`
define float @julia_Float32_1772() #0 {
top:
ret float 0x400921FB60000000
} Or, in the context of the original issue: julia> f(y) = y < pi
f (generic function with 1 method)
julia> f(0.0)
true
julia> @code_llvm f(0.0)
; Function Signature: f(Float64)
; @ REPL[6]:1 within `f`
define i8 @julia_f_1698(double %"y::Float64") #0 {
top:
; ┌ @ irrationals.jl:108 within `<`
; │┌ @ float.jl:619 within `<=`
%0 = fcmp ole double %"y::Float64", 0x400921FB54442D18
%1 = zext i1 %0 to i8
ret i8 %1
; └└
} So this shouldn't be needed. We should look into if and why these don't get constant-propagated. |
The actual code that's failing is the conversion from the irrational to the float with
However I do agree that it's weird that it doesn't get const-propagated in the context of the issue. I did finally figure out how to automate it so that it gets calculated at compile time so maybe merge and open an issue to remember to look into the constant-propagation? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (to be squashed, of course).
051386d
to
9a13542
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #559 +/- ##
=======================================
Coverage 74.85% 74.85%
=======================================
Files 57 57
Lines 2708 2708
=======================================
Hits 2027 2027
Misses 681 681 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Bypasses the conversion to
BigFloat
when convertingpi
ande
toFloat32
andFloat16
on gpu. Values are taken from the constants in Tables 6.5 and 6.6 of the Metal shading language specification.Close #551