[Perf] Linux/arm64: 15 Regressions on 5/4/2024 5:29:12 AM #102047

performanceautofiler · 2024-05-09T07:52:35Z

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 22.04
Queue	AmpereUbuntu
Baseline	e965312582a33c0acf2020648b54a152a80c139a
Compare	5962fd511e3eacf7fe91520392c041e94e5d31cc
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector
FusedMultiplyAdd_ScalarAddend - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	53.07 ns	150.70 ns	2.84	0.04	True
FusedMultiplyAdd_Vectors - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	1.62 μs	3.21 μs	1.99	0.03	True
Truncate - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	658.28 ns	2.85 μs	4.33	0.02	True
Pow_ScalarBase - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	2.08 μs	2.56 μs	1.23	0.01	True
Pow_Vectors - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	48.40 μs	60.44 μs	1.25	0.01	True
Pow_ScalarBase - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	48.94 μs	61.04 μs	1.25	0.01	True
FusedMultiplyAdd_ScalarMultiplier - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	872.01 ns	3.18 μs	3.65	0.04	True
FusedMultiplyAdd_Vectors - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	68.76 ns	140.01 ns	2.04	0.02	True
Pow_ScalarExponent - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	2.05 μs	2.51 μs	1.22	0.01	True
FusedMultiplyAdd_ScalarAddend - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	817.70 ns	3.20 μs	3.91	0.14	True
Pow_Vectors - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	2.07 μs	2.52 μs	1.22	0.01	True
Pow_ScalarExponent - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	48.04 μs	60.23 μs	1.25	0.01	True
FusedMultiplyAdd_ScalarMultiplier - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	54.65 ns	151.26 ns	2.77	0.03	True

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives&lt;Double&gt;*'

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.FusedMultiplyAdd_ScalarAddend(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.FusedMultiplyAdd_Vectors(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Truncate(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_ScalarBase(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_Vectors(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_ScalarBase(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.FusedMultiplyAdd_ScalarMultiplier(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.FusedMultiplyAdd_Vectors(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_ScalarExponent(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.FusedMultiplyAdd_ScalarAddend(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_Vectors(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_ScalarExponent(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.FusedMultiplyAdd_ScalarMultiplier(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 22.04
Queue	AmpereUbuntu
Baseline	e965312582a33c0acf2020648b54a152a80c139a
Compare	5962fd511e3eacf7fe91520392c041e94e5d31cc
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Memory.Span<Int32>

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
SequenceEqual - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	3.46 ns	5.32 ns	1.54	0.24	False

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Memory.Span&lt;Int32&gt;*'

System.Memory.Span<Int32>.SequenceEqual(Size: 4)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 22.04
Queue	AmpereUbuntu
Baseline	e965312582a33c0acf2020648b54a152a80c139a
Compare	5962fd511e3eacf7fe91520392c041e94e5d31cc
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.TryGetValueFalse<Int32, Int32>

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
Dictionary - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	4.13 μs	5.96 μs	1.44	0.08	False

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.TryGetValueFalse&lt;Int32, Int32&gt;*'

System.Collections.TryGetValueFalse<Int32, Int32>.Dictionary(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

dotnet-policy-service · 2024-05-09T16:37:12Z

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

DrewScoggins · 2024-05-09T16:38:00Z

Looks related to #101800.

We also saw many improvements, and those are linked in the above PR.

stephentoub · 2024-07-22T17:33:38Z

Many of the benchmarks fully recovered, but it looks like those related to Pow did not.

tannergooding · 2024-07-23T14:04:52Z

@stephentoub, pow is because of aab8803: Disable TensorPrimitives vectorization of Log, Cbrt, Pow, and RootN

The diff range displayed for some of these is very off and is different if you manually go into https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu%2022.04/System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives(Double).Pow_Vectors(BufferLength:%203079).html and set the compare range in the graph (which is f88ab88...f9207e6)

stephentoub · 2024-07-23T14:11:54Z

Ah, thanks, in that case it's expected, and I believe we can close this.

performanceautofiler bot added arch-arm64 os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime untriaged New issue has not been triaged by the area owner labels May 9, 2024

performanceautofiler bot mentioned this issue May 9, 2024

[SENTINEL] Autofile run complete at 5/9/2024 7:54:50 AM. 12 issues filed. dotnet/perf-autofiling-issues#34056

Closed

DrewScoggins removed the untriaged New issue has not been triaged by the area owner label May 9, 2024

DrewScoggins transferred this issue from dotnet/perf-autofiling-issues May 9, 2024

ghost added the area-System.Numerics label May 9, 2024

dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label May 9, 2024

DrewScoggins added tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark labels May 9, 2024

DrewScoggins assigned tannergooding May 9, 2024

tannergooding added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed untriaged New issue has not been triaged by the area owner labels Jun 24, 2024

stephentoub added this to the 9.0.0 milestone Jul 22, 2024

tannergooding closed this as completed Jul 23, 2024

github-actions bot locked and limited conversation to collaborators Aug 23, 2024

[Perf] Linux/arm64: 15 Regressions on 5/4/2024 5:29:12 AM #102047

[Perf] Linux/arm64: 15 Regressions on 5/4/2024 5:29:12 AM #102047

Comments

performanceautofiler bot commented May 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Run Information

Regressions in System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>

Repro

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.FusedMultiplyAdd_ScalarAddend(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.FusedMultiplyAdd_Vectors(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Truncate(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_ScalarBase(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_Vectors(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_ScalarBase(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.FusedMultiplyAdd_ScalarMultiplier(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.FusedMultiplyAdd_Vectors(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_ScalarExponent(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.FusedMultiplyAdd_ScalarAddend(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_Vectors(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_ScalarExponent(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.FusedMultiplyAdd_ScalarMultiplier(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

Docs

Run Information

Regressions in System.Memory.Span<Int32>

Repro

System.Memory.Span<Int32>.SequenceEqual(Size: 4)

ETL Files

Histogram

JIT Disasms

Docs

Run Information

Regressions in System.Collections.TryGetValueFalse<Int32, Int32>

Repro

System.Collections.TryGetValueFalse<Int32, Int32>.Dictionary(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

dotnet-policy-service bot commented May 9, 2024

Uh oh!

DrewScoggins commented May 9, 2024

performanceautofiler bot commented May 9, 2024 •

edited

Loading

tannergooding commented Jul 23, 2024 •

edited

Loading