Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unchecked.defaultof prevents other optimizations from working #18128

Open
jwosty opened this issue Dec 10, 2024 · 14 comments
Open

Unchecked.defaultof prevents other optimizations from working #18128

jwosty opened this issue Dec 10, 2024 · 14 comments
Labels
Area-Compiler-Optimization The F# optimizer, release code gen etc. Bug Impact-Low (Internal MS Team use only) Describes an issue with limited impact on existing code.
Milestone

Comments

@jwosty
Copy link
Contributor

jwosty commented Dec 10, 2024

In situations like this:

open System

let f (n: float32) =
    Console.WriteLine n
    let _ = Unchecked.defaultof<decimal>
    let _ = Unchecked.defaultof<decimal>
    let _ = Unchecked.defaultof<decimal>
    let n' = n * 2.f
    Console.WriteLine n'

I would expect the compiler to be able to eliminate these unused variables. The compiler sources imply that there's some kind of unused binding elimination (see

// Dead binding elimination
), so I'm not sure if what I'm seeing is a bug or if Optimizer.fs is talking about something slightly different. Either way I would think it would be sane to remove those bindings up there.

Here's the IL generated (Release mode of course):

  .method public static void
    f(
      float32 n
    ) cil managed
  {
    .maxstack 4
    .locals init (
      [0] valuetype [System.Runtime]System.Decimal V_0
    )

    // [4 5 - 4 24]
    IL_0000: ldarg.0      // n
    IL_0001: call         void [System.Console]System.Console::WriteLine(float32)

    // [5 13 - 5 32]
    IL_0006: ldloc.0      // V_0
    IL_0007: pop

    // [6 13 - 6 32]
    IL_0008: ldloca.s     V_0
    IL_000a: initobj      [System.Runtime]System.Decimal
    IL_0010: ldloc.0      // V_0
    IL_0011: pop

    // [7 13 - 7 32]
    IL_0012: ldloca.s     V_0
    IL_0014: initobj      [System.Runtime]System.Decimal
    IL_001a: ldloc.0      // V_0
    IL_001b: pop

    // [9 5 - 9 25]
    IL_001c: ldarg.0      // n
    IL_001d: ldc.r4       2
    IL_0022: mul
    IL_0023: call         void [System.Console]System.Console::WriteLine(float32)
    IL_0028: ret

  } // end of method Program::f

This is relevant because you do sometimes see a pattern like this in the wild for dealing with SRTP:

open System.ComponentModel
open System.Diagnostics
open FSharp.Core.LanguagePrimitives

[<AbstractClass; Sealed; EditorBrowsable(EditorBrowsableState.Never)>]
type PreOps =
    static member inline Double (n: float<'u>) : float<'u> = n * 2.
    static member inline Double (n: float32<'u>) : float32<'u> = n * 2.f


// [FS0064] This construct causes code to be less generic than indicated by the type annotations. The type variable 'T has been constrained to be type 'OverloadedOperators'.
#nowarn "64"
module PreludeOperators =
    let inline private nil<'T> = Unchecked.defaultof<'T>
    
    let inline double (x: ^a) =
        // These unused parameters turn into unused variable bindings which theoretically can be eliminated
        let inline _call (_: ^M, input: ^I, _: ^R) = ((^M or ^I) : (static member Double : ^I -> ^R) input)
        _call (nil<PreOps>, x, nil<^b>)

open System
open PreludeOperators

let doWork (n: float) =
    Console.WriteLine n
    let n' = double n
    Console.WriteLine n'

See for example FSharpPlus: https://github.com/fsprojects/FSharpPlus/blob/db45914d2cacea7a749c3a271df876c56f7eeadb/src/FSharpPlus/Control/Tuple.fs#L185

The only way I can figure out to avoid unused bindings in these situations is to use explicit generic parameters on the inner function, which not only requires duplicating the generic constraints (increasing maintenance cost), but also manual lifting of the helper function to a top-level scope.

    let inline private _callDouble<^M,^I,^R when (^M or ^I) : (static member Double : ^I -> ^R)> (_: ^M, input: ^I, _: ^R) =
        ((^M or ^I) : (static member Double : ^I -> ^R) input)
    let inline double (x: ^a) = _callDouble (nil<PreOps>, x, nil<^b>)

I haven't done a benchmark yet so I'm not sure if the CLR JIT is actually able to properly take care of this kind of stuff, but I figured it would be useful to at least document anyway. Additionally, the kind of functions you'd use this pattern for are also the kind of functions that are more likely to show up all over the place in tight loops.

This affects F# 8 and 9. Haven't checked any older versions.

@T-Gro
Copy link
Member

T-Gro commented Dec 10, 2024

I think it makes sense to verify how JIT handles this and how much performance is on the table here.

Important to note, the optimization would have to be limited to bindings which can be compile-time guaranteed to have no side effect (in your example this is clear), which in general might exclude any .NET IL method call or constructor.

A possible important side effect (= breaking change if not done in the same order) would file-level or module-level initialization, so even a let _ = SomeOtherModule.somevalueInIt will have to be kept.

@vzarytovskii
Copy link
Member

I think everything which is (# #) is unverifiable, thus can't be analyzed for the lack of side effects. defaultof is directly emitting ilzero.

@jwosty
Copy link
Contributor Author

jwosty commented Dec 10, 2024

I think it makes sense to verify how JIT handles this and how much performance is on the table here.

Sure. I'd be happy to help -- what is the typical way one would go about doing that? I was peeking at Sharplab but I don't know how stack frame size manifests itself in the assembly, and besides I'd be curious to look at ARM/Apple Silicon too

Important to note, the optimization would have to be limited to bindings which can be compile-time guaranteed to have no side effect (in your example this is clear), which in general might exclude any .NET IL method call or constructor.

This is true, and I'm really mainly interested in trivial cases with definitely optimizable literals (nulls at minimum, definitely integer literals, maybe string literals if that's not a can of worms, etc).

A possible important side effect (= breaking change if not done in the same order) would file-level or module-level initialization, so even a let _ = SomeOtherModule.somevalueInIt will have to be kept.

Good point

I think everything which is (# #) is unverifiable, thus can't be analyzed for the lack of side effects. defaultof is directly emitting ilzero.

Interesting... I just assumed the compiler knows that defaultof has no side effects; I didn't think of the possibility that it still work for other kinds of expressions. Let's try something.

open System

let seven = 7

let f (n: float32) =
    Console.WriteLine n
    let _ = seven
    let _ = seven
    let _ = seven
    let n' = n * 2.f
    Console.WriteLine n'

IL disassembly:

  .method public static void
    'f\''(
      float32 n
    ) cil managed
  {
    .maxstack 8

    // [18 5 - 18 24]
    IL_0000: ldarg.0      // n
    IL_0001: call         void [System.Console]System.Console::WriteLine(float32)

    // [20 5 - 20 25]
    IL_0006: ldarg.0      // n
    IL_0007: ldc.r4       2
    IL_000c: mul
    IL_000d: call         void [System.Console]System.Console::WriteLine(float32)
    IL_0012: ret

  } // end of method Program::'f\''

Huh. That seems to elide just fine.

Is there a way that defaultof specifically could be made to play well with these kinds of optimizations since it's so pervasive (and I think most people would certainly expect it not to prevent optimizations)? Perhaps via an internal FSharp.Core-only attribute or something? I'm not talking something publicly surfaceable

@jwosty
Copy link
Contributor Author

jwosty commented Dec 10, 2024

Just noting that unused binding elimination indeed works just fine too for slightly more complex situations:

// a simple enough function that the compiler auto-inlines it
let getSeven () = 7

let f (n: float32) =
    Console.WriteLine n
    let _ = getSeven ()
    let _ = getSeven ()
    let _ = getSeven ()
    let n' = n * 2.f
    Console.WriteLine n'

@jwosty
Copy link
Contributor Author

jwosty commented Dec 10, 2024

Another interesting thing to note -- only unused variables with wildcard patterns are eliminated; naming them makes them get preserved:

open System

let f (n: float32) =
    Console.WriteLine n
    let a = Unchecked.defaultof<decimal>
    let b = Unchecked.defaultof<decimal>
    let c = Unchecked.defaultof<decimal>
    let n' = n * 2.f
    Console.WriteLine n'

IL:

  .method public static void
    f(
      float32 n
    ) cil managed
  {
    .maxstack 4
    .locals init (
      [0] valuetype [System.Runtime]System.Decimal a,
      [1] valuetype [System.Runtime]System.Decimal b,
      [2] valuetype [System.Runtime]System.Decimal c,
      [3] valuetype [System.Runtime]System.Decimal V_3
    )

    // [30 5 - 30 24]
    IL_0000: ldarg.0      // n
    IL_0001: call         void [System.Console]System.Console::WriteLine(float32)

    // [31 5 - 31 41]
    IL_0006: ldloc.1      // V_1
    IL_0007: stloc.0      // a

    // [32 5 - 32 41]
    IL_0008: ldloc.2      // V_2
    IL_0009: stloc.1      // b

    // [33 5 - 33 41]
    IL_000a: ldloc.3      // V_3
    IL_000b: stloc.2      // c

    // [35 5 - 35 25]
    IL_000c: ldarg.0      // n
    IL_000d: ldc.r4       2
    IL_0012: mul
    IL_0013: call         void [System.Console]System.Console::WriteLine(float32)
    IL_0018: ret

  } // end of method Program::f

This is less of an issue though of course because the programmer at least has a remedy (just change it to a wildcard)

@vzarytovskii
Copy link
Member

Interesting... I just assumed the compiler knows that ``defaultof``` has no side effects

Not really, it's not a compiler intrinsic, just a normal function, and since there's no flow analysis for emitted IL (specifically via builtin (##)), it can't be proven there are no side effects.

@vzarytovskii
Copy link
Member

vzarytovskii commented Dec 10, 2024

Interesting... I just assumed the compiler knows that ``defaultof``` has no side effects

Not really, it's not a compiler intrinsic, just a normal function, and since there's no flow analysis for emitted IL (specifically via builtin (##)), it can't be proven there are no side effects.

Is there a way that defaultof specifically could be made to play well with these kinds of optimizations since it's so pervasive (and I think most people would certainly expect it not to prevent optimizations)? Perhaps via an internal FSharp.Core-only attribute or something? I'm not talking something publicly surfaceable

Technically yes, there's "pure" attribute which can be used and can be restricted to the fslib, but it would be quite an adhoc solution.

@T-Gro
Copy link
Member

T-Gro commented Dec 11, 2024

I would say that given enough benchmarked evidence, even an adhoc solution just for Unchecked.defaultOf could be made working - assuming this is indeed a common scenario when using FSharpPlus.

@jwosty :
For this case, I would advice to do a BDN benchmark with code snippets - one having the uneliminated bindings, other with them being manually removed.

@jwosty
Copy link
Contributor Author

jwosty commented Dec 12, 2024

OK so after some investigating, it looks like the Tier1 JIT is able to optimize away all of the unused local stack items in pretty much every circumstance I could find. Couldn't cook up a scenario where it actually caused the native code to be any different at all.

However I still suspected that it could mess with other optimizations in the F# compiler. I eventually found a circumstance where it prevents tuple elision. In my benchmark it translates to a ~3.5x slowdown as well as causing heap allocations.

Benchmark code:

open System
open System.ComponentModel
open System.Diagnostics
open System.Runtime.CompilerServices
open BenchmarkDotNet.Attributes
open BenchmarkDotNet.Running
open FSharp.Core.LanguagePrimitives

type [<AllowNullLiteral>] T1 = class end
type [<AllowNullLiteral>] T2 = class end

module Tup =
    let max3 (a: 'a, b: 'a, c: 'a) =
        a |> max b |> max c
    let fold3 (f: 'state -> 'a -> 'state) state (a: 'a, b: 'a, c: 'a) =
        f (f (f state a) b) c
    let reduce3 (f: 'a -> 'a -> 'a) (a: 'a, b: 'a, c: 'a) =
        f (f a b) c
    let sum3i (a, b, c) : int =
        reduce3 (+) (a, b, c)
    let map3 (f: 'a -> 'b) (a: 'a, b: 'a, c: 'a) =
        (f a, f b, f c)
    let map3DummyDefaultofs (f: 'a -> 'b) (a: 'a, b: 'a, c: 'a) =
        let v1 = Unchecked.defaultof<T1>
        let v2 = Unchecked.defaultof<T2>
        (f a, f b, f c)
    let map3DummyNulls (f: 'a -> 'b) (a: 'a, b: 'a, c: 'a) =
        let v1: T1 = null
        let v2: T2 = null
        (f a, f b, f c)
    let map3i (f: int -> 'a -> 'b) (a: 'a, b: 'a, c: 'a) = f 0 a, f 1 b, f 2 c
    let init3 (f: int -> 'a) =
        (f 0, f 1, f 2)
    let ofArray3 (xs: 'a[]) = xs[0], xs[1], xs[2]
    
    let sumSquares xs =
        xs
        |> map3 (fun x -> x * x)
        |> sum3i
    let sumSquaresWithDummyDefaultofVars xs =
        xs
        |> map3DummyDefaultofs (fun x -> x * x)
        |> sum3i
    let sumSquaresWithDummyNullVars xs =
        xs
        |> map3DummyNulls (fun x -> x * x)
        |> sum3i


[<DisassemblyDiagnoser(maxDepth = 6, printSource = true, printInstructionAddresses = true)>]
[<MemoryDiagnoser>]
[<RyuJitX64Job>]
type Benchmarks(value: int) =
    let n = 100
    
    new() = Benchmarks(Random(42).Next 100)
    
    [<Benchmark>]
    [<MethodImpl(MethodImplOptions.NoInlining)>]
    member this.SumSquaresManualInline () =
        let mutable x = 0
        for _ in 1..n do
            let a = value
            let b = value + 1
            let c = value + 2
            
            x <- (a * a) + (b * b) + (c * c)
        x
    
    [<Benchmark(Baseline = true)>]
    [<MethodImpl(MethodImplOptions.NoInlining)>]
    member this.SumSquaresFunc () =
        let mutable x = 0
        for _ in 1..n do
            x <- Tup.sumSquares (value, value + 1, value + 2)
        x
    
    [<Benchmark>]
    [<MethodImpl(MethodImplOptions.NoInlining)>]
    member this.SumSquaresFuncDummyDefaultofVars () =
        let mutable x = 0
        for _ in 1..n do
            x <- Tup.sumSquaresWithDummyDefaultofVars (value, value + 1, value + 2)
        x
        
    [<Benchmark>]
    [<MethodImpl(MethodImplOptions.NoInlining)>]
    member this.SumSquaresFuncDummyNullVars5 () =
        let mutable x = 0
        for _ in 1..n do
            x <- Tup.sumSquaresWithDummyNullVars (value, value + 1, value + 2)
        x

[<EntryPoint>]
let main args =
    BenchmarkRunner.Run<Benchmarks> (args = args) |> ignore
    0

Results:


BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.4602/23H2/2023Update/SunValley3)
AMD Ryzen 9 7950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK 9.0.100
  [Host]    : .NET 9.0.0 (9.0.24.52809), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI DEBUG
  RyuJitX64 : .NET 9.0.0 (9.0.24.52809), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Job=RyuJitX64  Jit=RyuJit  Platform=X64  

Method Mean Error StdDev Ratio RatioSD Code Size Gen0 Allocated Alloc Ratio
SumSquaresManualInline 56.73 ns 0.370 ns 0.328 ns 1.01 0.01 64 B - - NA
SumSquaresFunc 56.22 ns 0.133 ns 0.118 ns 1.00 0.00 64 B - - NA
SumSquaresFuncDummyDefaultofVars 191.02 ns 3.125 ns 3.474 ns 3.40 0.06 137 B 0.1912 3200 B NA
SumSquaresFuncDummyNullVars5 56.33 ns 0.342 ns 0.286 ns 1.00 0.01 64 B - - NA

I also included a function with unnamed variables set directly with a null literal to show that it really is the Unchecked.defaultof at fault here; unused binding elimination otherwise indeed works.

I suspect there are other optimizations which it impacts as well

@jwosty jwosty changed the title Compiler should be able to optimize unused variables Unchecked.defaultof prevents other optimizations from working Dec 12, 2024
@jwosty
Copy link
Contributor Author

jwosty commented Dec 13, 2024

I wonder which other internals suffer from the same type of issue. For example LanguagePrimitives.XXXWithMeasure (which I think is fair to assume acts as a runtime no-op) can get in the way of constant folding:

type [<Measure>] m

let inline lenSquaredInt64 (x: int64<'u>, y: int64<'u>) =
    (x * x) + (y * y)

let inline f1 () =
    let x, y = 10<m>, 20<m>
    lenSquaredInt64 (int64 x, int64 y)

let inline f2 () =
    let x, y = 10<m>, 20<m>
    lenSquaredInt64 (Int64WithMeasure (int64 x), Int64WithMeasure (int64 y))

C# disassembly (according to JetBrains Rider):

  public static long lenSquaredInt64(long x, long y)
  {
    return x * x + y * y;
  }

  public static long f1()
  {
    return 500;
  }

  public static long f2()
  {
    long num1 = 10;
    long num2 = 20;
    return num1 * num1 + num2 * num2;
  }

I'm sure this particular example would get optimized by the JIT, but I think the point still stands

@T-Gro
Copy link
Member

T-Gro commented Dec 13, 2024

It is definitely end-user confusing why a change from null to Unchecked.defaultOf should be causing a slow down. Even though this is fairly degenerate for app code (especially when tooling can give diagnostics about unused bindings), it can happen by calling into libraries.

The UoM case will be just a retyping primitive - if this is limited to IL snippets with 0/1 instructions (and not trying to optimize longer blocks), covering them in the optimizer should be possible.

@vzarytovskii
Copy link
Member

vzarytovskii commented Dec 13, 2024

I wonder if it folds with extra optimisation loops? I don't have access to laptop now, can somebody please test it? There should be some flag like extraoptimizationloops or something

I also wonder if it's worth it at all - trade compilation time for something which JIT will optimise.

@jwosty
Copy link
Contributor Author

jwosty commented Dec 13, 2024

I wonder if it folds with extra optimisation loops? I don't have access to laptop now, can somebody please test it? There should be some flag like extraoptimizationloops or something

Sure thing; tried that out just now for you. Reporting back (via <OtherFlags>--extraoptimizationloops:1</OtherFlags in the fsproj) -- unfortunately nothing changed

I also wonder if it's worth it at all - trade compilation time for something which JIT will optimise.

I feel it's less about a compilation time tradeoff, and more about the fact that the F# compiler shouldn't shoot itself in the foot and stop applying optimizations it otherwise would when the user only adds a single call to a ubiquitous core function -- especially when it's a core primitive you can't write yourself. In other words: given that it had already been deemed worth it to have some constant folding, why should adding a Int64WithMeasure in the middle somewhere ruin that feature (from an end-user perspective)? Especially when you can't avoid using the primitive in question.

One more thing - I'm not specifically worried about the lack of constant folding here per se; I meant more of a proof-by-induction that if simply adding XXXWithMeasure can prevent a particular optimization, then other optimizations may be impaired as well (I just haven't taken the time to explore further).

Not asking for a way to solve this kind of thing in the general case, just specifically for some of these core ubiquitous functions (you know, get the most bang for your buck, diminishing returns, etc).

@vzarytovskii
Copy link
Member

*WithMeasure uses retype which is an inline IL, which is, I think, the main issue here:

https://compiler-explorer.com/z/MPME4hsnc

We can't fold because the output of retype is no longer a constant for compiler, and it can't evaluate it at compile time.

@0101 0101 added Bug Impact-Low (Internal MS Team use only) Describes an issue with limited impact on existing code. Area-Compiler-Optimization The F# optimizer, release code gen etc. and removed Needs-Triage labels Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compiler-Optimization The F# optimizer, release code gen etc. Bug Impact-Low (Internal MS Team use only) Describes an issue with limited impact on existing code.
Projects
Status: New
Development

No branches or pull requests

4 participants