Skip to content

NativeAOT: Run-time simd checking #68110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
SupinePandora43 opened this issue Apr 16, 2022 · 7 comments
Open

NativeAOT: Run-time simd checking #68110

SupinePandora43 opened this issue Apr 16, 2022 · 7 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@SupinePandora43
Copy link

SupinePandora43 commented Apr 16, 2022

Run-time simd checks can be beneficial in long-running cases (SpanHelpers.SequenceEqual for example)

Run-time simd checks can hurt performance because of redundant checks...

Could it be fixed? Eg;

void Method(){
    if(Avx2.IsSupported) avx2path();
    else softwarepath();
}
Method();
Method();
Method();

Can be transformed into:

if(Avx2.IsSupported){
    Method_avx2path();
    Method_avx2path();
    Method_avx2path();
} else {
    Method_softwarepath();
    Method_softwarepath();
    Method_softwarepath();
}

This could be done by inlining, but it won't work for the most part...

Compiler somehow should produce an unique codegen for every simd for every subsequent method:

void Start(){
    if(Avx2.IsSupported) ...;
    else ...;
    Continue();
}
void Continue(){
    End()
}
void End(){
    if(Avx2.IsSupported) ...;
    else ...;
}

Will produce

Start_Avx2
Start
Continue_Avx2
Continue
End_Avx2
End

But this unbiased codegeneration will cause dramatic size increase.
Could PGO be possibly used to know where checks are actually useful?

category:cq
theme:ready-to-run
skill-level:expert
cost:medium
impact:small

@ghost ghost added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Apr 16, 2022
@ghost
Copy link

ghost commented Apr 16, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

AFAIK NativeAOT just targets some low cpu instruction set.
Could compiler emit run-time simd checks based on profiles?

Like:

if(Avx2.IsSupported){ // not jit constant, an actual getter.
    for(...) ExpensiveCallWithAvx2();
} else {
    for(...) ExpensiveCall();
}

Obviously don't emit them if it will not provide a massive performance boost.

... Yet I don't know how much those instruction sets add performance ... Hold on... Bepuphysics2 may benefit from it! (I think)

Author: SupinePandora43
Assignees: -
Labels:

area-CodeGen-coreclr, untriaged

Milestone: -

@EgorBo
Copy link
Member

EgorBo commented Apr 16, 2022

Related: #68038

@teo-tsirpanis teo-tsirpanis added area-NativeAOT-coreclr and removed area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Apr 16, 2022
@MichalStrehovsky
Copy link
Member

The compiler already generates runtime checks wherever possible (e.g. ssse3.issupported is a runtime check with default settings). AVX.issupported cannot be a runtime check because of how ISA support is structured in RyuJIT. Not all use of VEX encoding in RyuJIT happens under IsSupported checks.

@jkotas jkotas added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 16, 2022
@ghost
Copy link

ghost commented Apr 16, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

AFAIK NativeAOT just targets some low cpu instruction set.
Could compiler emit run-time simd checks based on profiles?

Like:

if(Avx2.IsSupported){ // not jit constant, an actual getter.
    for(...) ExpensiveCallWithAvx2();
} else {
    for(...) ExpensiveCall();
}

Obviously don't emit them if it will not provide a massive performance boost.

... Yet I don't know how much those instruction sets add performance ... Hold on... Bepuphysics2 may benefit from it! (I think)

Author: SupinePandora43
Assignees: -
Labels:

area-CodeGen-coreclr, untriaged, area-NativeAOT-coreclr

Milestone: -

@jakobbotsch
Copy link
Member

jakobbotsch commented Apr 16, 2022

Would the expectation be that RyuJIT would know only to use VEX encoding on paths that have checked Avx2.IsSupported?
I guess as an alternative NativeAOT could monomorphize these functions containing such checks and link them once on program start (or lazily) to avoid repeated runtime checks.

@jkotas
Copy link
Member

jkotas commented Apr 16, 2022

When Avx+ instruction set is allowed, the VEX encoding can be used implicitly by RyuJIT pretty much anywhere, e.g. even for zeroing locals in the method prolog.

I think that the simplest variant of this would be for RyuJIT to only use the VEX encoding for explicit Avx and similar hardware intrinsics, and nothing else.

I guess as an alternative NativeAOT could monomorphize these functions containing such checks and link them once on program start (or lazily) to avoid repeated runtime checks.

Yes, that would be a more advanced version. We would need a profitability function that decides where the code duplication is worth it.

@JulieLeeMSFT JulieLeeMSFT added help wanted [up-for-grabs] Good issue for external contributors and removed untriaged New issue has not been triaged by the area owner labels Apr 18, 2022
@JulieLeeMSFT JulieLeeMSFT added this to the Future milestone Apr 18, 2022
@SupinePandora43
Copy link
Author

We would need a profitability function that decides where the code duplication is worth it.

Could be done with profiles?!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

No branches or pull requests

7 participants