-
Notifications
You must be signed in to change notification settings - Fork 790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unchecked.defaultof
prevents other optimizations from working
#18128
Comments
I think it makes sense to verify how JIT handles this and how much performance is on the table here. Important to note, the optimization would have to be limited to bindings which can be compile-time guaranteed to have no side effect (in your example this is clear), which in general might exclude any .NET IL method call or constructor. A possible important side effect (= breaking change if not done in the same order) would file-level or module-level initialization, so even a |
I think everything which is |
Sure. I'd be happy to help -- what is the typical way one would go about doing that? I was peeking at Sharplab but I don't know how stack frame size manifests itself in the assembly, and besides I'd be curious to look at ARM/Apple Silicon too
This is true, and I'm really mainly interested in trivial cases with definitely optimizable literals (nulls at minimum, definitely integer literals, maybe string literals if that's not a can of worms, etc).
Good point
Interesting... I just assumed the compiler knows that open System
let seven = 7
let f (n: float32) =
Console.WriteLine n
let _ = seven
let _ = seven
let _ = seven
let n' = n * 2.f
Console.WriteLine n' IL disassembly:
Huh. That seems to elide just fine. Is there a way that |
Just noting that unused binding elimination indeed works just fine too for slightly more complex situations: // a simple enough function that the compiler auto-inlines it
let getSeven () = 7
let f (n: float32) =
Console.WriteLine n
let _ = getSeven ()
let _ = getSeven ()
let _ = getSeven ()
let n' = n * 2.f
Console.WriteLine n' |
Another interesting thing to note -- only unused variables with wildcard patterns are eliminated; naming them makes them get preserved: open System
let f (n: float32) =
Console.WriteLine n
let a = Unchecked.defaultof<decimal>
let b = Unchecked.defaultof<decimal>
let c = Unchecked.defaultof<decimal>
let n' = n * 2.f
Console.WriteLine n' IL:
This is less of an issue though of course because the programmer at least has a remedy (just change it to a wildcard) |
Not really, it's not a compiler intrinsic, just a normal function, and since there's no flow analysis for emitted IL (specifically via builtin (##)), it can't be proven there are no side effects. |
Not really, it's not a compiler intrinsic, just a normal function, and since there's no flow analysis for emitted IL (specifically via builtin (##)), it can't be proven there are no side effects.
Technically yes, there's "pure" attribute which can be used and can be restricted to the fslib, but it would be quite an adhoc solution. |
I would say that given enough benchmarked evidence, even an adhoc solution just for Unchecked.defaultOf could be made working - assuming this is indeed a common scenario when using FSharpPlus. @jwosty : |
OK so after some investigating, it looks like the Tier1 JIT is able to optimize away all of the unused local stack items in pretty much every circumstance I could find. Couldn't cook up a scenario where it actually caused the native code to be any different at all. However I still suspected that it could mess with other optimizations in the F# compiler. I eventually found a circumstance where it prevents tuple elision. In my benchmark it translates to a ~3.5x slowdown as well as causing heap allocations. Benchmark code: open System
open System.ComponentModel
open System.Diagnostics
open System.Runtime.CompilerServices
open BenchmarkDotNet.Attributes
open BenchmarkDotNet.Running
open FSharp.Core.LanguagePrimitives
type [<AllowNullLiteral>] T1 = class end
type [<AllowNullLiteral>] T2 = class end
module Tup =
let max3 (a: 'a, b: 'a, c: 'a) =
a |> max b |> max c
let fold3 (f: 'state -> 'a -> 'state) state (a: 'a, b: 'a, c: 'a) =
f (f (f state a) b) c
let reduce3 (f: 'a -> 'a -> 'a) (a: 'a, b: 'a, c: 'a) =
f (f a b) c
let sum3i (a, b, c) : int =
reduce3 (+) (a, b, c)
let map3 (f: 'a -> 'b) (a: 'a, b: 'a, c: 'a) =
(f a, f b, f c)
let map3DummyDefaultofs (f: 'a -> 'b) (a: 'a, b: 'a, c: 'a) =
let v1 = Unchecked.defaultof<T1>
let v2 = Unchecked.defaultof<T2>
(f a, f b, f c)
let map3DummyNulls (f: 'a -> 'b) (a: 'a, b: 'a, c: 'a) =
let v1: T1 = null
let v2: T2 = null
(f a, f b, f c)
let map3i (f: int -> 'a -> 'b) (a: 'a, b: 'a, c: 'a) = f 0 a, f 1 b, f 2 c
let init3 (f: int -> 'a) =
(f 0, f 1, f 2)
let ofArray3 (xs: 'a[]) = xs[0], xs[1], xs[2]
let sumSquares xs =
xs
|> map3 (fun x -> x * x)
|> sum3i
let sumSquaresWithDummyDefaultofVars xs =
xs
|> map3DummyDefaultofs (fun x -> x * x)
|> sum3i
let sumSquaresWithDummyNullVars xs =
xs
|> map3DummyNulls (fun x -> x * x)
|> sum3i
[<DisassemblyDiagnoser(maxDepth = 6, printSource = true, printInstructionAddresses = true)>]
[<MemoryDiagnoser>]
[<RyuJitX64Job>]
type Benchmarks(value: int) =
let n = 100
new() = Benchmarks(Random(42).Next 100)
[<Benchmark>]
[<MethodImpl(MethodImplOptions.NoInlining)>]
member this.SumSquaresManualInline () =
let mutable x = 0
for _ in 1..n do
let a = value
let b = value + 1
let c = value + 2
x <- (a * a) + (b * b) + (c * c)
x
[<Benchmark(Baseline = true)>]
[<MethodImpl(MethodImplOptions.NoInlining)>]
member this.SumSquaresFunc () =
let mutable x = 0
for _ in 1..n do
x <- Tup.sumSquares (value, value + 1, value + 2)
x
[<Benchmark>]
[<MethodImpl(MethodImplOptions.NoInlining)>]
member this.SumSquaresFuncDummyDefaultofVars () =
let mutable x = 0
for _ in 1..n do
x <- Tup.sumSquaresWithDummyDefaultofVars (value, value + 1, value + 2)
x
[<Benchmark>]
[<MethodImpl(MethodImplOptions.NoInlining)>]
member this.SumSquaresFuncDummyNullVars5 () =
let mutable x = 0
for _ in 1..n do
x <- Tup.sumSquaresWithDummyNullVars (value, value + 1, value + 2)
x
[<EntryPoint>]
let main args =
BenchmarkRunner.Run<Benchmarks> (args = args) |> ignore
0 Results:
I also included a function with unnamed variables set directly with a null literal to show that it really is the I suspect there are other optimizations which it impacts as well |
Unchecked.defaultof
prevents other optimizations from working
I wonder which other internals suffer from the same type of issue. For example type [<Measure>] m
let inline lenSquaredInt64 (x: int64<'u>, y: int64<'u>) =
(x * x) + (y * y)
let inline f1 () =
let x, y = 10<m>, 20<m>
lenSquaredInt64 (int64 x, int64 y)
let inline f2 () =
let x, y = 10<m>, 20<m>
lenSquaredInt64 (Int64WithMeasure (int64 x), Int64WithMeasure (int64 y)) C# disassembly (according to JetBrains Rider):
I'm sure this particular example would get optimized by the JIT, but I think the point still stands |
It is definitely end-user confusing why a change from null to Unchecked.defaultOf should be causing a slow down. Even though this is fairly degenerate for app code (especially when tooling can give diagnostics about unused bindings), it can happen by calling into libraries. The UoM case will be just a retyping primitive - if this is limited to IL snippets with 0/1 instructions (and not trying to optimize longer blocks), covering them in the optimizer should be possible. |
I wonder if it folds with extra optimisation loops? I don't have access to laptop now, can somebody please test it? There should be some flag like extraoptimizationloops or something I also wonder if it's worth it at all - trade compilation time for something which JIT will optimise. |
Sure thing; tried that out just now for you. Reporting back (via
I feel it's less about a compilation time tradeoff, and more about the fact that the F# compiler shouldn't shoot itself in the foot and stop applying optimizations it otherwise would when the user only adds a single call to a ubiquitous core function -- especially when it's a core primitive you can't write yourself. In other words: given that it had already been deemed worth it to have some constant folding, why should adding a One more thing - I'm not specifically worried about the lack of constant folding here per se; I meant more of a proof-by-induction that if simply adding Not asking for a way to solve this kind of thing in the general case, just specifically for some of these core ubiquitous functions (you know, get the most bang for your buck, diminishing returns, etc). |
https://compiler-explorer.com/z/MPME4hsnc We can't fold because the output of |
In situations like this:
I would expect the compiler to be able to eliminate these unused variables. The compiler sources imply that there's some kind of unused binding elimination (see
fsharp/src/Compiler/Optimize/Optimizer.fs
Line 1570 in 64bb76c
Here's the IL generated (Release mode of course):
This is relevant because you do sometimes see a pattern like this in the wild for dealing with SRTP:
See for example FSharpPlus: https://github.com/fsprojects/FSharpPlus/blob/db45914d2cacea7a749c3a271df876c56f7eeadb/src/FSharpPlus/Control/Tuple.fs#L185
The only way I can figure out to avoid unused bindings in these situations is to use explicit generic parameters on the inner function, which not only requires duplicating the generic constraints (increasing maintenance cost), but also manual lifting of the helper function to a top-level scope.
I haven't done a benchmark yet so I'm not sure if the CLR JIT is actually able to properly take care of this kind of stuff, but I figured it would be useful to at least document anyway. Additionally, the kind of functions you'd use this pattern for are also the kind of functions that are more likely to show up all over the place in tight loops.
This affects F# 8 and 9. Haven't checked any older versions.
The text was updated successfully, but these errors were encountered: