Skip to content

[Perf] Linux/arm64: 15 Regressions on 5/4/2024 5:29:12 AM #102047

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
performanceautofiler bot opened this issue May 9, 2024 · 5 comments
Closed

[Perf] Linux/arm64: 15 Regressions on 5/4/2024 5:29:12 AM #102047

performanceautofiler bot opened this issue May 9, 2024 · 5 comments
Assignees
Labels
arch-arm64 area-System.Numerics needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Milestone

Comments

@performanceautofiler
Copy link

performanceautofiler bot commented May 9, 2024

Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline e965312582a33c0acf2020648b54a152a80c139a
Compare 5962fd511e3eacf7fe91520392c041e94e5d31cc
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
53.07 ns 150.70 ns 2.84 0.04 True
1.62 μs 3.21 μs 1.99 0.03 True
658.28 ns 2.85 μs 4.33 0.02 True
2.08 μs 2.56 μs 1.23 0.01 True
48.40 μs 60.44 μs 1.25 0.01 True
48.94 μs 61.04 μs 1.25 0.01 True
872.01 ns 3.18 μs 3.65 0.04 True
68.76 ns 140.01 ns 2.04 0.02 True
2.05 μs 2.51 μs 1.22 0.01 True
817.70 ns 3.20 μs 3.91 0.14 True
2.07 μs 2.52 μs 1.22 0.01 True
48.04 μs 60.23 μs 1.25 0.01 True
54.65 ns 151.26 ns 2.77 0.03 True

graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives&lt;Double&gt;*'

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.FusedMultiplyAdd_ScalarAddend(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.FusedMultiplyAdd_Vectors(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Truncate(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_ScalarBase(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_Vectors(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_ScalarBase(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.FusedMultiplyAdd_ScalarMultiplier(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.FusedMultiplyAdd_Vectors(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_ScalarExponent(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.FusedMultiplyAdd_ScalarAddend(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_Vectors(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_ScalarExponent(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.FusedMultiplyAdd_ScalarMultiplier(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline e965312582a33c0acf2020648b54a152a80c139a
Compare 5962fd511e3eacf7fe91520392c041e94e5d31cc
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Memory.Span<Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
3.46 ns 5.32 ns 1.54 0.24 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Memory.Span&lt;Int32&gt;*'

System.Memory.Span<Int32>.SequenceEqual(Size: 4)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline e965312582a33c0acf2020648b54a152a80c139a
Compare 5962fd511e3eacf7fe91520392c041e94e5d31cc
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.TryGetValueFalse<Int32, Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
4.13 μs 5.96 μs 1.44 0.08 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.TryGetValueFalse&lt;Int32, Int32&gt;*'

System.Collections.TryGetValueFalse<Int32, Int32>.Dictionary(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@performanceautofiler performanceautofiler bot added arch-arm64 os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime untriaged New issue has not been triaged by the area owner labels May 9, 2024
@DrewScoggins DrewScoggins removed the untriaged New issue has not been triaged by the area owner label May 9, 2024
@DrewScoggins DrewScoggins transferred this issue from dotnet/perf-autofiling-issues May 9, 2024
@ghost ghost added the area-System.Numerics label May 9, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label May 9, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

@DrewScoggins DrewScoggins added tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark labels May 9, 2024
@DrewScoggins
Copy link
Member

Looks related to #101800.

We also saw many improvements, and those are linked in the above PR.

@tannergooding tannergooding added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed untriaged New issue has not been triaged by the area owner labels Jun 24, 2024
@stephentoub stephentoub added this to the 9.0.0 milestone Jul 22, 2024
@stephentoub
Copy link
Member

Many of the benchmarks fully recovered, but it looks like those related to Pow did not.

@tannergooding
Copy link
Member

tannergooding commented Jul 23, 2024

@stephentoub, pow is because of aab8803: Disable TensorPrimitives vectorization of Log, Cbrt, Pow, and RootN

The diff range displayed for some of these is very off and is different if you manually go into https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu%2022.04/System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives(Double).Pow_Vectors(BufferLength:%203079).html and set the compare range in the graph (which is f88ab88...f9207e6)

@stephentoub
Copy link
Member

Ah, thanks, in that case it's expected, and I believe we can close this.

@github-actions github-actions bot locked and limited conversation to collaborators Aug 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-System.Numerics needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Projects
None yet
Development

No branches or pull requests

3 participants