Skip to content

Regressions in System.Collections.Sort<Int32> #71214

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
performanceautofiler bot opened this issue Jun 23, 2022 · 11 comments
Closed

Regressions in System.Collections.Sort<Int32> #71214

performanceautofiler bot opened this issue Jun 23, 2022 · 11 comments
Assignees
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI runtime-coreclr specific to the CoreCLR runtime
Milestone

Comments

@performanceautofiler
Copy link

Run Information

Architecture arm64
OS Windows 10.0.19041
Baseline 9ab00ef5c8904f28fba9b01b4fefd4ad672567fb
Compare 10438a57b888cbdc6dc15771dc662adde9fd6b14
Diff Diff

Regressions in System.Collections.Sort<Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
Array - Duration of single invocation 5.49 μs 6.01 μs 1.10 0.24 False
List - Duration of single invocation 5.49 μs 6.37 μs 1.16 0.17 False

graph
graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Collections.Sort&lt;Int32&gt;*'

Payloads

Baseline
Compare

Histogram

System.Collections.Sort<Int32>.Array(Size: 512)


Description of detection logic

IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 6.012585555555557 > 5.747378230263158.
IsChangePoint: Marked as a change because one of 6/16/2022 10:45:33 PM, 6/23/2022 3:07:42 AM falls between 6/14/2022 10:05:19 AM and 6/23/2022 3:07:42 AM.
IsRegressionStdDev: Marked as regression because -15.026510107388873 (T) = (0 -6212.707966783625) / Math.Sqrt((29561.33241260086 / (21)) + (33639.99194715355 / (25))) is less than -2.0153675744421933 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (21) + (25) - 2, .025) and -0.14535986474704712 = (5424.241025029895 - 6212.707966783625) / 5424.241025029895 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

```#### System.Collections.Sort&lt;Int32&gt;.List(Size: 512)

```log

Description of detection logic

IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 6.367123333333333 > 5.7733494.
IsChangePoint: Marked as a change because one of 6/6/2022 12:57:23 AM, 6/16/2022 10:45:33 PM, 6/23/2022 3:07:42 AM falls between 6/14/2022 10:05:19 AM and 6/23/2022 3:07:42 AM.
IsRegressionStdDev: Marked as regression because -18.256655481072343 (T) = (0 -6290.907881769642) / Math.Sqrt((15144.937245487698 / (21)) + (38150.12749039728 / (23))) is less than -2.0180817028167235 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (21) + (23) - 2, .025) and -0.1649239776783329 = (5400.273324536833 - 6290.907881769642) / 5400.273324536833 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@performanceautofiler performanceautofiler bot added arm64 untriaged New issue has not been triaged by the area owner labels Jun 23, 2022
@EgorBo EgorBo changed the title [Perf] Changes at 6/17/2022 3:07:11 AM Regressions in System.Collections.Sort<Int32> Jun 23, 2022
@EgorBo EgorBo removed the untriaged New issue has not been triaged by the area owner label Jun 23, 2022
@EgorBo EgorBo transferred this issue from dotnet/perf-autofiling-issues Jun 23, 2022
@ghost
Copy link

ghost commented Jun 23, 2022

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jun 23, 2022
@EgorBo
Copy link
Member

EgorBo commented Jun 23, 2022

The only suspect is #70809

@EgorBo
Copy link
Member

EgorBo commented Jun 23, 2022

arm64: dotnet/perf-autofiling-issues#6319

@danmoseley danmoseley added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 23, 2022
@ghost
Copy link

ghost commented Jun 23, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Architecture arm64
OS Windows 10.0.19041
Baseline 9ab00ef5c8904f28fba9b01b4fefd4ad672567fb
Compare 10438a57b888cbdc6dc15771dc662adde9fd6b14
Diff Diff

Regressions in System.Collections.Sort<Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
Array - Duration of single invocation 5.49 μs 6.01 μs 1.10 0.24 False
List - Duration of single invocation 5.49 μs 6.37 μs 1.16 0.17 False

graph
graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Collections.Sort&lt;Int32&gt;*'

Payloads

Baseline
Compare

Histogram

System.Collections.Sort<Int32>.Array(Size: 512)


Description of detection logic

IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 6.012585555555557 > 5.747378230263158.
IsChangePoint: Marked as a change because one of 6/16/2022 10:45:33 PM, 6/23/2022 3:07:42 AM falls between 6/14/2022 10:05:19 AM and 6/23/2022 3:07:42 AM.
IsRegressionStdDev: Marked as regression because -15.026510107388873 (T) = (0 -6212.707966783625) / Math.Sqrt((29561.33241260086 / (21)) + (33639.99194715355 / (25))) is less than -2.0153675744421933 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (21) + (25) - 2, .025) and -0.14535986474704712 = (5424.241025029895 - 6212.707966783625) / 5424.241025029895 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

```#### System.Collections.Sort&lt;Int32&gt;.List(Size: 512)

```log

Description of detection logic

IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 6.367123333333333 > 5.7733494.
IsChangePoint: Marked as a change because one of 6/6/2022 12:57:23 AM, 6/16/2022 10:45:33 PM, 6/23/2022 3:07:42 AM falls between 6/14/2022 10:05:19 AM and 6/23/2022 3:07:42 AM.
IsRegressionStdDev: Marked as regression because -18.256655481072343 (T) = (0 -6290.907881769642) / Math.Sqrt((15144.937245487698 / (21)) + (38150.12749039728 / (23))) is less than -2.0180817028167235 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (21) + (23) - 2, .025) and -0.1649239776783329 = (5400.273324536833 - 6290.907881769642) / 5400.273324536833 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author: performanceautofiler[bot]
Assignees: EgorBo
Labels:

area-CodeGen-coreclr, untriaged, refs/heads/main, RunKind=micro, Windows 10.0.19041, Regression, CoreClr, arm64

Milestone: -

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Jun 23, 2022
@JulieLeeMSFT JulieLeeMSFT added this to the 7.0.0 milestone Jun 23, 2022
@EgorBo EgorBo removed their assignment Jun 28, 2022
@AndyAyersMS
Copy link
Member

#70809 was supposed to have very minimal diffs per SPMI. Let's see in this case.

@AndyAyersMS
Copy link
Member

My local box doesn't see a regression; it has main about 7% faster.

diff is 42c321a
base is 1a02f0d (just before #70809).

Method Job Toolchain Size Mean Error StdDev Median Min Max Ratio RatioSD Allocated Alloc Ratio
Array Job-YYXLIM \base-rel\corerun.exe 512 2.942 us 0.0938 us 0.1080 us 2.891 us 2.823 us 3.213 us 1.00 0.00 - NA
Array Job-JMMJOI \diff-rel\corerun.exe 512 2.750 us 0.0381 us 0.0338 us 2.753 us 2.682 us 2.814 us 0.93 0.04 - NA
List Job-YYXLIM \base-rel\corerun.exe 512 2.957 us 0.0862 us 0.0993 us 2.908 us 2.858 us 3.183 us 1.00 0.00 - NA
List Job-JMMJOI \diff-rel\corerun.exe 512 2.734 us 0.0516 us 0.0530 us 2.714 us 2.681 us 2.852 us 0.92 0.02 - NA

Global history generally doesn't agree and thinks the regression persists (but note quite different HW). Here is win arm64:

newplot

and here is windows on Ampere (seems oddly way too slow):

newplot (1)

and here is linux on Ampere (also seems oddly slow)

newplot (3)

and (for completness's sake here is linux arm64 on the perf lab qualcomm box which does not see a regression either:

newplot (4)

I can try looking at a narrower range of commits but given that none of the lab runs show any recent changes (other than the big regression/fix) I don't think it matters.

So some mysteries to sort through:

  • why is my local box so much faster than anything else?
  • is there a codegen diff in any key method

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Jul 13, 2022

As a sanity check here's the local net6 vs net7(p5) data.

Method Job Runtime Toolchain Size Mean Error StdDev Median Min Max Ratio RatioSD Allocated Alloc Ratio
Array Job-OAKTBI .NET 6.0 net6.0 512 3.339 us 0.0643 us 0.0632 us 3.321 us 3.254 us 3.471 us 1.00 0.00 - NA
Array Job-RYTKRW .NET 7.0 net7.0 512 3.027 us 0.0681 us 0.0784 us 2.986 us 2.941 us 3.154 us 0.91 0.04 - NA
List Job-OAKTBI .NET 6.0 net6.0 512 3.316 us 0.0296 us 0.0262 us 3.314 us 3.281 us 3.373 us 1.00 0.00 - NA
List Job-RYTKRW .NET 7.0 net7.0 512 3.056 us 0.0590 us 0.0655 us 3.024 us 2.977 us 3.190 us 0.93 0.02 - NA

and the same to releases on the Ampere (Linux)

Method Job Runtime Toolchain Size Mean Error StdDev Median Min Max Ratio RatioSD Allocated Alloc Ratio
Array Job-FZSDXF .NET 6.0 net6.0 512 6.659 us 0.0584 us 0.0488 us 6.656 us 6.590 us 6.777 us 1.00 0.00 - NA
Array Job-VMSZCE .NET 7.0 net7.0 512 5.670 us 0.1128 us 0.1159 us 5.679 us 5.481 us 5.907 us 0.85 0.02 - NA
List Job-FZSDXF .NET 6.0 net6.0 512 6.492 us 0.0741 us 0.0657 us 6.501 us 6.394 us 6.616 us 1.00 0.00 - NA
List Job-VMSZCE .NET 7.0 net7.0 512 5.782 us 0.1062 us 0.0941 us 5.786 us 5.594 us 5.912 us 0.89 0.02 - NA

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Jul 13, 2022

I wonder if this benchmark is just broken. -p ETW profiling shows really odd time distributions. Guessing that the R2R samples are not resolving properly and that the top method should be a prejitted sort.

image

@AndyAyersMS
Copy link
Member

For comparison here's the same profile on windows x64. Also seems a bit messed up (R2R not resolving) but otherwise more or less reasonable.

image

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Aug 3, 2022

Looks like this regression is device specific and only seems to happen on the surface pro x's in the lab.

Still puzzled why the results on other Arm64 boxes are so bad, but that's a separate issue (#73315)

@AndyAyersMS
Copy link
Member

Given this only seems to affect surface pro x, going to close.

@ghost ghost locked as resolved and limited conversation to collaborators Sep 2, 2022
@jeffhandley jeffhandley added arch-arm64 runtime-coreclr specific to the CoreCLR runtime and removed arm64 labels Dec 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI runtime-coreclr specific to the CoreCLR runtime
Projects
None yet
Development

No branches or pull requests

5 participants