Skip to content

Profiling x64 processes can’t resolve R2R symbols #72578

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hoyosjs opened this issue Jul 21, 2022 · 27 comments · Fixed by #73747
Closed

Profiling x64 processes can’t resolve R2R symbols #72578

hoyosjs opened this issue Jul 21, 2022 · 27 comments · Fixed by #73747
Assignees
Milestone

Comments

@hoyosjs
Copy link
Member

hoyosjs commented Jul 21, 2022

r2r

@ghost ghost added area-ReadyToRun-coreclr untriaged New issue has not been triaged by the area owner labels Jul 21, 2022
@hoyosjs hoyosjs added area-Diagnostics-coreclr and removed area-ReadyToRun-coreclr untriaged New issue has not been triaged by the area owner labels Jul 21, 2022
@ghost
Copy link

ghost commented Jul 21, 2022

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

Issue Details

r2r

Author: hoyosjs
Assignees: -
Labels:

area-Diagnostics-coreclr

Milestone: -

@AndyAyersMS
Copy link
Member

Note this is not just x64. I see similar issues on arm64.

@davmason
Copy link
Contributor

davmason commented Aug 6, 2022

@AndyAyersMS I am having trouble reproducing this, can you share what you were doing when you saw the failures?

@AndyAyersMS
Copy link
Member

Looking at the profile from System.Collections.Sort<Int32>.Array - one of the runtime micro benchmarks from dotnet/performance -- collected with --corerun and -p ETW

See #71214 (comment)

@hoyosjs
Copy link
Member Author

hoyosjs commented Aug 8, 2022

I ran that same benchmark and can see events from r2r and non-r2r bodies. Oddly, I don't see the [Optimized...] and such annotations.

@AndyAyersMS
Copy link
Member

Let me see if I can still repro.

I ran that same benchmark and can see events from r2r and non-r2r bodies. Oddly, I don't see the [Optimized...] and such annotations.

There is a tiering-annotated profile under the advanced group in perfview.

@AndyAyersMS
Copy link
Member

This still repros for me, eg using the latest nightly build (on windows arm64) and the sort benchmark from #71214:

image

There is no way this benchmark is spending 86% of its time in PoolingAsyncValueTaskMethodBuilder.

Similar thing shows up in the normal profile view.

@hoyosjs
Copy link
Member Author

hoyosjs commented Aug 8, 2022

Hmm I still see the r2r frames in that view for that benchmark:

image

Probably if you could share the log we could see if it found SPC's ni.pdb?

@hoyosjs
Copy link
Member Author

hoyosjs commented Aug 8, 2022

And yes, in my profile I see:

image

@AndyAyersMS
Copy link
Member

Sent it to you via email.

@AndyAyersMS
Copy link
Member

If I'm parsing this right the PDB it finds seems awfully small

FindSymbolFilePath: Waiting for initial connection to \\redmond/symfiles\VSOSYMPRXY16\cache\8F\1D\FA0B71E98A2A2DFEBD2441816A4658952F3B12930736721726CAC37693EB00\system.private.corelib.ni.pdb.
CopyStreamToFile: Copying \\redmond\symfiles\VSOSYMPRXY16\cache\8F\1D\FA0B71E98A2A2DFEBD2441816A4658952F3B12930736721726CAC37693EB00\system.private.corelib.ni.pdb to C:\Users\andya\AppData\Local\Temp\SymbolCache\System.Private.CoreLib.ni.pdb\D571163A5E469B27E5E464F1C8D65FA31\System.Private.CoreLib.ni.pdb.new
.................................................................................................... 1.0 Meg
.................................................................................................... 2.0 Meg
.................................................................................................... 3.0 Meg
................................
CopyStreamToFile: Copy Done, moving to C:\Users\andya\AppData\Local\Temp\SymbolCache\System.Private.CoreLib.ni.pdb\D571163A5E469B27E5E464F1C8D65FA31\System.Private.CoreLib.ni.pdb

@hoyosjs
Copy link
Member Author

hoyosjs commented Aug 9, 2022

A local build on a native x64 machine and the ni pdb's for 6.0.7 are all ~3mb. They don't contain much info - just symbol to native offset, but no line info. EDIT: meant to say this is expected - cvdump shows the functions with offsets and are roughtly the same size.

@davmason
Copy link
Contributor

davmason commented Aug 9, 2022

@mangod9 could you have someone look in to what Andy is reporting above? He is investigating a trace where the symbols for System.Private.CoreLib (which is R2R) are loaded, but methods are resolving incorrectly.

@mangod9
Copy link
Member

mangod9 commented Aug 9, 2022

Adding @AntonLapounov since he had investigated a similar issue earlier in 7.

@mangod9
Copy link
Member

mangod9 commented Aug 10, 2022

Do we know if this is a regression or we had similar behavior in 6 ?

@AndyAyersMS
Copy link
Member

I'll try the same thing on 6.0

@AntonLapounov
Copy link
Member

@AndyAyersMS Could you please share the exact repro steps?

@AndyAyersMS
Copy link
Member

6.0 profile looks similar. It could be a broken benchmark perhaps.

image

@AndyAyersMS Could you please share the exact repro steps?

See #71214 (comment)

@AntonLapounov
Copy link
Member

If it affects only few benchmarks and not a regression, we should remove the 'blocking-release' label.

@tommcdon
Copy link
Member

If it affects only few benchmarks and not a regression, we should remove the 'blocking-release' label.

Agreed and removed

@AndyAyersMS
Copy link
Member

Not sure how many benchmarks are affected. This one is perhaps unusual in that R2R code figures prominently during the "actual" intervals (which is not typical, we usually try and measure Tier1 code).

@AntonLapounov
Copy link
Member

AntonLapounov commented Aug 10, 2022

@AndyAyersMS Could you please share the exact repro steps?

See #71214 (comment)

That was not helpful. I can run benchmarks_ci.py -f net6.0 --filter "System.Collections.Sort<Int32>*", but it doesn't accept -p ETW.

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Aug 10, 2022

Sorry, here's something more specific:

  • windows x64 or arm64
  • install 6.0, 7.0 product
  • clone dotnet/performance repo
  • start admin shell
  • cd to src/benchmarks/micro in the repo
  • dotnet run -c Release -f net7.0 -- --filter "System.Collections.Sort<Int32>.Array" --runtimes net6.0 net7.0 -p ETW

The above will create two ETL files. Look at those with perfview.

  • select cpu stacks (or cpu stacks with optmization tiers) view
  • dotnet process.
  • Ungroup.
  • Search for "engineactual" and include BenchmarkDotNet.Engines.EngineActualStage.RunAuto.
  • Sort by inclusive or exclusive.

@AntonLapounov
Copy link
Member

Thank you, I have reproduced the issue with SDK 6.0.303. I see the same behavior in Windows Performance Analyzer, so the issue is not specific to PerfView. Both apps show [ReadyToRun]System.Array::Fill(System.Int32[], System.Int32) calling [ReadyToRun]System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1+SyncSuccessSentinelStateMachineBox[System.Int32]::.ctor(), which is displayed calling itself recursively (and that is obviously incorrect). In reality, Array::Fill is a very short ReadyToRun function that delegates the work to System.SpanHelpers::Fill, which has no ReadyToRun version. Is there a way to dump raw stack offsets from the .etl file?

image

@AndyAyersMS
Copy link
Member

Possibly? The data is there but you might need a custom parser and I think there is possibly some cleverness in how it's encoded. @brianrob would likely know but I think he's OOF (or soon to be)

You might try peeking at the perfview log, sometimes they spew interesting stuff there.

@AntonLapounov
Copy link
Member

AntonLapounov commented Aug 11, 2022

I think I identified the issue. Below is a snippet from cvdump output for System.Private.CoreLib.ni.pdb (SDK 6.0.303) sorted by the function offset. Note the large gap in the middle (42D060..42EF00):

S_PUB32: [0001:0042D040], Flags: 00000002, System.Int16 System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1+StateMachineBox[System.Int32]::get_Version()$##600508A
S_PUB32: [0001:0042D050], Flags: 00000002, System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1+StateMachineBox[System.Int32]::.ctor()$##600508D
S_PUB32: [0001:0042D060], Flags: 00000002, System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1+SyncSuccessSentinelStateMachineBox[System.Int32]::.ctor()$##600508E
S_PUB32: [0001:0042EF00], Flags: 00000002, System.Int32 System.Collections.Generic.NullableEqualityComparer`1[System.Int32]::IndexOf(System.Nullable`1[System.Int32][], System.Nullable`1[System.Int32], System.Int32, System.Int32)$##6006C84
S_PUB32: [0001:0042F030], Flags: 00000002, System.Int32 System.Collections.Generic.NullableEqualityComparer`1[System.Int32]::LastIndexOf(System.Nullable`1[System.Int32][], System.Nullable`1[System.Int32], System.Int32, System.Int32)$##6006C85
S_PUB32: [0001:0042F0C0], Flags: 00000002, System.Boolean System.Collections.Generic.NullableEqualityComparer`1[System.Int32]::Equals(System.Nullable`1[System.Int32], System.Nullable`1[System.Int32])$##6006C86

According to R2RDump, that gap actually contains GenericArraySortHelper`1<int> ReadyToRun methods, which are missing from the System.Private.CoreLib.ni.pdb file. That is why those methods are incorrectly displayed with the name of the last method before the gap in PerfView and WPA. Similary, the gap after Array::Fill

S_PUB32: [0001:004EF6E0], Flags: 00000002, System.Array::Fill(System.Int32[], System.Int32)$##60001B6
S_PUB32: [0001:004EF990], Flags: 00000002, System.Array::Sort(System.Int32[], System.Comparison`1[System.Int32])$##60001E2

contains one of Array::Sort overloads, which is incorrectly displayed as Array::Fill.

There are 28,822 ReadyToRun methods in System.Private.CoreLib.dll, but only 21,978 S_PUB32 entries in the corresponding .ni.pdb file (~76%). We could have a sanity check verifying that these numbers are either equal or close enough.

@AndyAyersMS
Copy link
Member

Thanks @AntonLapounov -- sounds like real progress. Hopefully not too hard to fix.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Aug 11, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Aug 11, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Sep 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants