-
Notifications
You must be signed in to change notification settings - Fork 5k
corerun with DOTNET_EnableEventLog: "Heap contamination detected! "HeapFree was called on a heap other than the one that memory was allocated from." #56888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
cc @josalem, @janvorli |
I'm not sure what would be different for LTTng between corehost and corerun. That seems odd. Is there any more to the stack when the assert is thrown? clrhost_nodependencies.cpp:296 is just ( |
This is only happening in debug build, I tested with release build and the test passed. Here is the root@968b4a55cce2:/runtime/artifacts/tests/coreclr/Linux.x64.Debug/Tests/Core_Root# lldb -f ./corerun /tmp/helloworld/bin/Release/net6.0/helloworld.dll
(lldb) target create "./corerun"
Current executable set to './corerun' (x86_64).
(lldb) settings set -- target.run-args "/tmp/helloworld/bin/Release/net6.0/helloworld.dll"
(lldb) settings set target.env-vars DOTNET_EnableEventLog=1
(lldb) r
Process 1756 launched: './corerun' (x86_64)
Assert failure(PID 1756 [0x000006dc], Thread: 1756 [0x06dc]): !"Heap contamination detected! HeapFree was called on a heap other than the one that memory was allocated from.\n" "Possible cause: you used new (executable) to allocate the memory, but didn't use DeleteExecutable() to free it."
File: /runtime/src/coreclr/utilcode/clrhost_nodependencies.cpp Line: 296
Image: /runtime/artifacts/tests/coreclr/Linux.x64.Debug/Tests/Core_Root/corerun
Process 1756 stopped
* thread #1, name = 'corerun', stop reason = signal SIGTRAP
frame #0: 0x00007ffff6b52559 libcoreclr.so`DBG_DebugBreak at debugbreak.S:9
6
7 LEAF_ENTRY DBG_DebugBreak, _TEXT
8 int3
-> 9 ret
10 LEAF_END_MARKED DBG_DebugBreak, _TEXT
11
(lldb) bt all
* thread #1, name = 'corerun', stop reason = signal SIGTRAP
* frame #0: 0x00007ffff6b52559 libcoreclr.so`DBG_DebugBreak at debugbreak.S:9
frame #1: 0x00007ffff6aa43cb libcoreclr.so`::DebugBreak() at debug.cpp:406
frame #2: 0x00007ffff680e828 libcoreclr.so`::DbgAssertDialog(szFile="/runtime/src/coreclr/utilcode/clrhost_nodependencies.cpp", iLine=296, szExpr="!\"Heap contamination detected! HeapFree was called on a heap other than the one that memory was allocated from.\\n\" \"Possible cause: you used new (executable) to allocate the memory, but didn't use DeleteExecutable() to free it.\"") at debug.cpp:697
frame #3: 0x00007ffff67c13c2 libcoreclr.so`ClrFree(p=0x0000555555661c30) at clrhost_nodependencies.cpp:295
frame #4: 0x00007ffff67c1175 libcoreclr.so`operator delete[](p=0x0000555555661c40) at clrhost_nodependencies.cpp:410
frame #5: 0x00007ffff6bab80d libcoreclr.so`::FireEtXplatRuntimeInformationStart(ClrInstanceID=0, Sku=2, BclMajorVersion=6, BclMinorVersion=0, BclBuildNumber=0, BclQfeNumber=0, VMMajorVersion=42, VMMinorVersion=42, VMBuildNumber=42, VMQfeNumber=42424, StartupFlags=8388611, StartupMode='\0', CommandLine=u"", ComObjectGuid=0x00007ffff5fa9600, RuntimeDllPath=u"/runtime/artifacts/tests/coreclr/Linux.x64.Debug/Tests/Core_Root/libcoreclr.so") at eventprovdotnetruntime.cpp:4905
frame #6: 0x00007ffff665ee35 libcoreclr.so`FireEtwRuntimeInformationStart(ClrInstanceID=0, Sku=2, BclMajorVersion=6, BclMinorVersion=0, BclBuildNumber=0, BclQfeNumber=0, VMMajorVersion=42, VMMinorVersion=42, VMBuildNumber=42, VMQfeNumber=42424, StartupFlags=8388611, StartupMode='\0', CommandLine=u"", ComObjectGuid=0x00007ffff5fa9600, RuntimeDllPath=u"/runtime/artifacts/tests/coreclr/Linux.x64.Debug/Tests/Core_Root/libcoreclr.so", ActivityId=0x0000000000000000, RelatedActivityId=0x0000000000000000) at clretwallmain.h:2811
frame #7: 0x00007ffff664b58e libcoreclr.so`ETW::InfoLog::RuntimeInformation(type=0) at eventtrace.cpp:5111
frame #8: 0x00007ffff6a73c1f libcoreclr.so`EEStartupHelper() at ceemain.cpp:767
frame #9: 0x00007ffff6a75684 libcoreclr.so`EEStartup(this=0x00007fffffffda88, p=0x0000000000000000)::$_1::operator()(void*) const at ceemain.cpp:1153
frame #10: 0x00007ffff6a72a5b libcoreclr.so`EEStartup() at ceemain.cpp:1155
frame #11: 0x00007ffff6a72862 libcoreclr.so`EnsureEEStarted() at ceemain.cpp:321
frame #12: 0x00007ffff6219bed libcoreclr.so`CorHost2::Start(this=0x0000555555634690) at corhost.cpp:101
frame #13: 0x00007ffff6153006 libcoreclr.so`::coreclr_initialize(exePath="/runtime/artifacts/tests/coreclr/Linux.x64.Debug/Tests/Core_Root/corerun", appDomainFriendlyName="corerun", propertyCount=4, propertyKeys=0x000055555558a0c0, propertyValues=0x000055555558a140, hostHandle=0x0000555555566068, domainId=0x0000555555566070) at unixinterface.cpp:251
frame #14: 0x000055555555f127 corerun`run(config=0x00007fffffffe460) at corerun.cpp:352
frame #15: 0x000055555555c616 corerun`main(argc=2, argv=0x00007fffffffe5d8) at corerun.cpp:554
frame #16: 0x00007ffff6ca1bf7 libc.so.6`__libc_start_main(main=(corerun`main at corerun.cpp:546), argc=2, argv=0x00007fffffffe5d8, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fffffffe5c8) at libc-start.c:310
frame #17: 0x000055555555b02a corerun`_start + 42
thread #2, name = 'corerun-ust'
frame #0: 0x00007ffff79c2657 libpthread.so.0`__libc_recvmsg + 71
frame #1: 0x00007ffff5985c23 liblttng-ust.so.0`ustcomm_recv_unix_sock + 179
frame #2: 0x00007ffff598b241 liblttng-ust.so.0`___lldb_unnamed_symbol62$$liblttng-ust.so.0 + 2241
frame #3: 0x00007ffff79b76db libpthread.so.0`start_thread + 219
frame #4: 0x00007ffff6da171f libc.so.6`__GI___clone at clone.S:95
thread #3, name = 'corerun-ust'
frame #0: 0x00007ffff6d9b639 libc.so.6`syscall at syscall.S:38
frame #1: 0x00007ffff598abd3 liblttng-ust.so.0`___lldb_unnamed_symbol62$$liblttng-ust.so.0 + 595
frame #2: 0x00007ffff79b76db libpthread.so.0`start_thread + 219
frame #3: 0x00007ffff6da171f libc.so.6`__GI___clone at clone.S:95
thread #4, name = 'corerun'
frame #0: 0x00007ffff6d94cb9 libc.so.6`__GI___poll(fds=0x00007fffed55ca20, nfds=1, timeout=-1) at poll.c:29
frame #1: 0x00007ffff6b30459 libcoreclr.so`CorUnix::CPalSynchronizationManager::ReadBytesFromProcessPipe(this=0x0000555555638900, iTimeout=-1, pRecvBuf="", iBytes=1) at synchmanager.cpp:2233
frame #2: 0x00007ffff6b2f618 libcoreclr.so`CorUnix::CPalSynchronizationManager::ReadCmdFromProcessPipe(this=0x0000555555638900, iPollTimeout=-1, pswcWorkerCmd=0x00007fffed55cde4, pshridMarshaledData=0x00007fffed55cdd8, pdwData=0x00007fffed55cdd4) at synchmanager.cpp:2011
frame #3: 0x00007ffff6b2e1c6 libcoreclr.so`CorUnix::CPalSynchronizationManager::WorkerThread(pArg=0x0000555555638900) at synchmanager.cpp:1714
frame #4: 0x00007ffff6b4d315 libcoreclr.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x0000555555639620) at thread.cpp:1862
frame #5: 0x00007ffff79b76db libpthread.so.0`start_thread + 219
frame #6: 0x00007ffff6da171f libc.so.6`__GI___clone at clone.S:95
thread #5, name = 'corerun'
frame #0: 0x00007ffff6d94cb9 libc.so.6`__GI___poll(fds=0x00007fff70008ee0, nfds=1, timeout=-1) at poll.c:29
frame #1: 0x00007ffff689db13 libcoreclr.so`ipc_poll_fds(fds=0x00007fff70008ee0, nfds=1, timeout=4294967295) at ds-ipc-pal-socket.c:485
frame #2: 0x00007ffff689d904 libcoreclr.so`ds_ipc_poll(poll_handles_data=0x00007fffecd5bb60, poll_handles_data_len=1, timeout_ms=4294967295, callback=(libcoreclr.so`server_warning_callback(char const*, unsigned int) at ds-server.c:110))(char const*, unsigned int)) at ds-ipc-pal-socket.c:1082
frame #3: 0x00007ffff6a5177b libcoreclr.so`ds_ipc_stream_factory_get_next_available_stream(callback=(libcoreclr.so`server_warning_callback(char const*, unsigned int) at ds-server.c:110))(char const*, unsigned int)) at ds-ipc.c:395
frame #4: 0x00007ffff6a4eba7 libcoreclr.so`server_thread(data=0x0000000000000000) at ds-server.c:127
frame #5: 0x00007ffff6b4d315 libcoreclr.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x0000555555657640) at thread.cpp:1862
frame #6: 0x00007ffff79b76db libpthread.so.0`start_thread + 219
frame #7: 0x00007ffff6da171f libc.so.6`__GI___clone at clone.S:95 and without (lldb) r
Process 1576 launched: './corerun' (x86_64)
Hello World!
Process 1576 exited with status = 0 (0x00000000)
$ /runtime/.dotnet/dotnet new console -o /tmp/helloworld
$ cd /tmp/helloworld
$ /runtime/.dotnet/dotnet build -c Release |
I wonder if it is yet another case of a mismatch between standard new operator and our overriden delete[], like we've seen in mscordaccore. Isn't this code by any chance using the STL? That would be bad. |
Ok, seems no STL is there. But I can see the problem. The ResizeBuffer is in PAL and as such uses standard new operator. The code that asserts is built on top of PAL though (like runtime) and so it cannot use the standard delete operator and uses our override instead. |
It seems the right fix would be to implement a function to delete the buffer in the eventproviderhelpers.cpp like the ResizeBuffer and call it instead of the delete. That way both new and delete would be guaranteed to be compatible. |
Tagging subscribers to this area: @tommcdon Issue DetailsBased on current main (5bef921), build product and tests in Debug configuration on Ubuntu 18.04 x64: ./build.sh
./src/tests/build.sh -p:LibrariesConfiguration=Debug Running a simple hello world app works: cd artifacts/bin/testhost/net6.0-Linux-Debug-x64/shared/Microsoft.NETCore.App/6.0.0
corerun /tmp/helloworld.dll running the same app with event log enabled fails an assertion: DOTNET_EnableEventLog=1 corerun /tmp/helloworld.dll
Assert failure(PID 543 [0x0000021f], Thread: 543 [0x021f]): !"Heap contamination detected! HeapFree was called on a heap other than the one that memory was allocated from.\n" "Possible cause: you used new (executable) to allocate the memory, but didn't use DeleteExecutable() to free it."
File: /runtime/src/coreclr/utilcode/clrhost_nodependencies.cpp Line: 296
Image: /runtime/artifacts/bin/testhost/net6.0-Linux-Debug-x64/shared/Microsoft.NETCore.App/6.0.0/corerun corehost, OTOH, has no issue with
|
To make sure we're on the same page, this is the runtime/src/coreclr/pal/src/eventprovider/lttngprovider/eventproviderhelpers.cpp Lines 14 to 39 in 1fb151f
Looks like EventPipe got around this by having runtime/src/coreclr/scripts/genEventPipe.py Lines 419 to 453 in 1fb151f
|
Yes, I was talking about the one in the eventproviderhelpers.cpp. If you added a DeleteBuffer function there and called it instead of using the delete in the generated eventprovdotnetruntime.cpp, it would fix the issue. The way eventpipe did that would work as well, but it seems that the way I've suggested is kind of cleaner. |
I am not sure how would the prospective On unix, based on
runtime/src/coreclr/gc/unix/events.cpp Line 282 in fa21e0b
|
@am11, I am not sure I understand your question. The new function code would be in src/coreclr/pal/src/eventprovider/lttngprovider/eventproviderhelpers.cpp, so it would use the same delete [] operator as the ResizeBuffer uses. The generated eventprovdotnetruntime.cpp would then call the DeleteBuffer instead of calling delete [] operator. extern "C" ULONG FireEtXplatGCHeapStats(
const unsigned __int64 GenerationSize0,
const unsigned __int64 TotalPromotedSize0,
const unsigned __int64 GenerationSize1,
const unsigned __int64 TotalPromotedSize1,
const unsigned __int64 GenerationSize2,
const unsigned __int64 TotalPromotedSize2,
const unsigned __int64 GenerationSize3,
const unsigned __int64 TotalPromotedSize3,
const unsigned __int64 FinalizationPromotedSize,
const unsigned __int64 FinalizationPromotedCount,
const unsigned int PinnedObjectCount,
const unsigned int SinkBlockCount,
const unsigned int GCHandleCount)
{
if (!EventXplatEnabledGCHeapStats())
return ERROR_SUCCESS;
char stackBuffer[92];
char *buffer = stackBuffer;
size_t offset = 0;
size_t size = 92;
bool fixedBuffer = true;
bool success = true;
success &= WriteToBuffer(GenerationSize0, buffer, offset, size, fixedBuffer);
success &= WriteToBuffer(TotalPromotedSize0, buffer, offset, size, fixedBuffer);
success &= WriteToBuffer(GenerationSize1, buffer, offset, size, fixedBuffer);
success &= WriteToBuffer(TotalPromotedSize1, buffer, offset, size, fixedBuffer);
success &= WriteToBuffer(GenerationSize2, buffer, offset, size, fixedBuffer);
success &= WriteToBuffer(TotalPromotedSize2, buffer, offset, size, fixedBuffer);
success &= WriteToBuffer(GenerationSize3, buffer, offset, size, fixedBuffer);
success &= WriteToBuffer(TotalPromotedSize3, buffer, offset, size, fixedBuffer);
success &= WriteToBuffer(FinalizationPromotedSize, buffer, offset, size, fixedBuffer);
success &= WriteToBuffer(FinalizationPromotedCount, buffer, offset, size, fixedBuffer);
success &= WriteToBuffer(PinnedObjectCount, buffer, offset, size, fixedBuffer);
success &= WriteToBuffer(SinkBlockCount, buffer, offset, size, fixedBuffer);
success &= WriteToBuffer(GCHandleCount, buffer, offset, size, fixedBuffer);
if (!success)
{
if (!fixedBuffer)
delete[] buffer;
return ERROR_WRITE_FAULT;
}
do_tracepoint(DotNETRuntime, GCHeapStats, offset, buffer);
if (!fixedBuffer)
delete[] buffer;
return ERROR_SUCCESS;
} So instead of if (!fixedBuffer)
delete[] buffer; it would use if (!fixedBuffer)
DeleteBuffer(buffer); |
Something like this: am11@f07b947? we get an extra frame for DeleteBuffer, but it still ends up calling ClrFree with corerun: $ DOTNET_EnableEventLog=1 lldb /runtime/artifacts/tests/coreclr/Linux.x64.Debug/Tests/Core_Root/corerun /tmp/helloworld/bin/release/net6.0/helloworld.dll
(lldb) target create "/runtime/artifacts/tests/coreclr/Linux.x64.Debug/Tests/Core_Root/corerun"
Current executable set to '/runtime/artifacts/tests/coreclr/Linux.x64.Debug/Tests/Core_Root/corerun' (x86_64).
(lldb) settings set -- target.run-args "/tmp/helloworld/bin/release/net6.0/helloworld.dll"
(lldb) r
Process 630 launched: '/runtime/artifacts/tests/coreclr/Linux.x64.Debug/Tests/Core_Root/corerun' (x86_64)
Assert failure(PID 630 [0x00000276], Thread: 630 [0x0276]): !"Heap contamination detected! HeapFree was called on a heap other than the one that memory was allocated from.\n" "Possible cause: you used new (executable) to allocate the memory, but didn't use DeleteExecutable() to free it."
File: /runtime/src/coreclr/utilcode/clrhost_nodependencies.cpp Line: 296
Image: /runtime/artifacts/tests/coreclr/Linux.x64.Debug/Tests/Core_Root/corerun
Process 630 stopped
* thread #1, name = 'corerun', stop reason = signal SIGTRAP
frame #0: 0x00007ffff6b52ac9 libcoreclr.so`DBG_DebugBreak at debugbreak.S:9
6
7 LEAF_ENTRY DBG_DebugBreak, _TEXT
8 int3
-> 9 ret
10 LEAF_END_MARKED DBG_DebugBreak, _TEXT
11
(lldb) bt all
* thread #1, name = 'corerun', stop reason = signal SIGTRAP
* frame #0: 0x00007ffff6b52ac9 libcoreclr.so`DBG_DebugBreak at debugbreak.S:9
frame #1: 0x00007ffff6aa493b libcoreclr.so`::DebugBreak() at debug.cpp:406
frame #2: 0x00007ffff6814cd8 libcoreclr.so`::DbgAssertDialog(szFile="/runtime/src/coreclr/utilcode/clrhost_nodependencies.cpp", iLine=296, szExpr="!\"Heap contamination detected! HeapFree was called on a heap other than the one that memory was allocated from.\\n\" \"Possible cause: you used new (executable) to allocate the memory, but didn't use DeleteExecutable() to free it.\"") at debug.cpp:697
frame #3: 0x00007ffff67c7872 libcoreclr.so`ClrFree(p=0x0000555555628930) at clrhost_nodependencies.cpp:295
frame #4: 0x00007ffff67c7625 libcoreclr.so`operator delete[](p=0x0000555555628940) at clrhost_nodependencies.cpp:410
frame #5: 0x00007ffff6bddc67 libcoreclr.so`::DeleteBuffer(buffer=0x0000555555628940) at eventproviderhelpers.cpp:18
frame #6: 0x00007ffff6bab6b8 libcoreclr.so`::FireEtXplatRuntimeInformationStart(ClrInstanceID=0, Sku=2, BclMajorVersion=6, BclMinorVersion=0, BclBuildNumber=0, BclQfeNumber=0, VMMajorVersion=42, VMMinorVersion=42, VMBuildNumber=42, VMQfeNumber=42424, StartupFlags=8388611, StartupMode='\0', CommandLine=u"", ComObjectGuid=0x00007ffff5faf640, RuntimeDllPath=u"/runtime/artifacts/tests/coreclr/Linux.x64.Debug/Tests/Core_Root/libcoreclr.so") at eventprovdotnetruntime.cpp:4905
frame #7: 0x00007ffff6664e65 libcoreclr.so`FireEtwRuntimeInformationStart(ClrInstanceID=0, Sku=2, BclMajorVersion=6, BclMinorVersion=0, BclBuildNumber=0, BclQfeNumber=0, VMMajorVersion=42, VMMinorVersion=42, VMBuildNumber=42, VMQfeNumber=42424, StartupFlags=8388611, StartupMode='\0', CommandLine=u"", ComObjectGuid=0x00007ffff5faf640, RuntimeDllPath=u"/runtime/artifacts/tests/coreclr/Linux.x64.Debug/Tests/Core_Root/libcoreclr.so", ActivityId=0x0000000000000000, RelatedActivityId=0x0000000000000000) at clretwallmain.h:2811
frame #8: 0x00007ffff66515be libcoreclr.so`ETW::InfoLog::RuntimeInformation(type=0) at eventtrace.cpp:5111
frame #9: 0x00007ffff6a7418f libcoreclr.so`EEStartupHelper() at ceemain.cpp:767
frame #10: 0x00007ffff6a75bf4 libcoreclr.so`EEStartup(this=0x00007fffffffda18, p=0x0000000000000000)::$_1::operator()(void*) const at ceemain.cpp:1153
frame #11: 0x00007ffff6a72fcb libcoreclr.so`EEStartup() at ceemain.cpp:1155
frame #12: 0x00007ffff6a72dd2 libcoreclr.so`EnsureEEStarted() at ceemain.cpp:321
frame #13: 0x00007ffff621fc1d libcoreclr.so`CorHost2::Start(this=0x0000555555600e10) at corhost.cpp:101
frame #14: 0x00007ffff6159006 libcoreclr.so`::coreclr_initialize(exePath="/runtime/artifacts/tests/coreclr/Linux.x64.Debug/Tests/Core_Root/corerun", appDomainFriendlyName="corerun", propertyCount=4, propertyKeys=0x000055555558a070, propertyValues=0x000055555558a0f0, hostHandle=0x0000555555566068, domainId=0x0000555555566070) at unixinterface.cpp:251
frame #15: 0x000055555555f127 corerun`run(config=0x00007fffffffe3f0) at corerun.cpp:352
frame #16: 0x000055555555c616 corerun`main(argc=2, argv=0x00007fffffffe568) at corerun.cpp:554
frame #17: 0x00007ffff6ca1bf7 libc.so.6`__libc_start_main(main=(corerun`main at corerun.cpp:546), argc=2, argv=0x00007fffffffe568, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fffffffe558) at libc-start.c:310
frame #18: 0x000055555555b02a corerun`_start + 42
thread #2, name = 'corerun-ust'
frame #0: 0x00007ffff79c2657 libpthread.so.0`__libc_recvmsg + 71
frame #1: 0x00007ffff598bc23 liblttng-ust.so.0`ustcomm_recv_unix_sock + 179
frame #2: 0x00007ffff5991241 liblttng-ust.so.0`___lldb_unnamed_symbol62$$liblttng-ust.so.0 + 2241
frame #3: 0x00007ffff79b76db libpthread.so.0`start_thread + 219
frame #4: 0x00007ffff6da171f libc.so.6`__GI___clone at clone.S:95
thread #3, name = 'corerun-ust'
frame #0: 0x00007ffff6d9b639 libc.so.6`syscall at syscall.S:38
frame #1: 0x00007ffff5990bd3 liblttng-ust.so.0`___lldb_unnamed_symbol62$$liblttng-ust.so.0 + 595
frame #2: 0x00007ffff79b76db libpthread.so.0`start_thread + 219
frame #3: 0x00007ffff6da171f libc.so.6`__GI___clone at clone.S:95
thread #4, name = 'corerun'
frame #0: 0x00007ffff6d94cb9 libc.so.6`__GI___poll(fds=0x00007fffef374a20, nfds=1, timeout=-1) at poll.c:29
frame #1: 0x00007ffff6b309c9 libcoreclr.so`CorUnix::CPalSynchronizationManager::ReadBytesFromProcessPipe(this=0x00005555555ff630, iTimeout=-1, pRecvBuf="", iBytes=1) at synchmanager.cpp:2233
frame #2: 0x00007ffff6b2fb88 libcoreclr.so`CorUnix::CPalSynchronizationManager::ReadCmdFromProcessPipe(this=0x00005555555ff630, iPollTimeout=-1, pswcWorkerCmd=0x00007fffef374de4, pshridMarshaledData=0x00007fffef374dd8, pdwData=0x00007fffef374dd4) at synchmanager.cpp:2011
frame #3: 0x00007ffff6b2e736 libcoreclr.so`CorUnix::CPalSynchronizationManager::WorkerThread(pArg=0x00005555555ff630) at synchmanager.cpp:1714
frame #4: 0x00007ffff6b4d885 libcoreclr.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x0000555555600350) at thread.cpp:1862
frame #5: 0x00007ffff79b76db libpthread.so.0`start_thread + 219
frame #6: 0x00007ffff6da171f libc.so.6`__GI___clone at clone.S:95
thread #5, name = 'corerun'
frame #0: 0x00007ffff6d94cb9 libc.so.6`__GI___poll(fds=0x00007fff70008ee0, nfds=1, timeout=-1) at poll.c:29
frame #1: 0x00007ffff68a4053 libcoreclr.so`ipc_poll_fds(fds=0x00007fff70008ee0, nfds=1, timeout=4294967295) at ds-ipc-pal-socket.c:485
frame #2: 0x00007ffff68a3e44 libcoreclr.so`ds_ipc_poll(poll_handles_data=0x00007fffeeb73b60, poll_handles_data_len=1, timeout_ms=4294967295, callback=(libcoreclr.so`server_warning_callback(char const*, unsigned int) at ds-server.c:110))(char const*, unsigned int)) at ds-ipc-pal-socket.c:1082
frame #3: 0x00007ffff6a51ceb libcoreclr.so`ds_ipc_stream_factory_get_next_available_stream(callback=(libcoreclr.so`server_warning_callback(char const*, unsigned int) at ds-server.c:110))(char const*, unsigned int)) at ds-ipc.c:395
frame #4: 0x00007ffff6a4f117 libcoreclr.so`server_thread(data=0x0000000000000000) at ds-server.c:127
frame #5: 0x00007ffff6b4d885 libcoreclr.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x000055555561e340) at thread.cpp:1862
frame #6: 0x00007ffff79b76db libpthread.so.0`start_thread + 219
frame #7: 0x00007ffff6da171f libc.so.6`__GI___clone at clone.S:95 |
It looks like you changed the runtime/src/coreclr/scripts/genEventPipe.py Lines 430 to 453 in e0472a1
Does this still happen if you forward declare |
CMIIW, but from the backtrace, it is hitting our corerun-specific placement delete[] operator:
which we don't want, right? It is hitting with main branch, with DeleteBuffer in pal.h as well as forward declaration of We can try defining the |
That didn't work either because geneventpipe.py and genlttngprovider.py seem to have different ways of using PAL. Former seem to generate source which builds as a runtime component (using typedefs from PAL
Didn't mean to block; I gave it a fair try, but couldn't make it work. So please go ahead. 😅 |
Thanks for giving it a try! You did all the things I would have done as well. Seems there may be some extra complexity here. Since this is only happening in debug, I'll try to get it fixed in 6, but this is definitely a candidate for being bumped to 7. @janvorli does @am11's experience trigger any top-of-mind thoughts for what might be going on? |
pre-emptively moving this to 7. I'll hopefully be able to get this, but the |
related, just noticed that compiling runtime with gcc-11 gives a lots of these errors on current main branch:
gcc 10 and 9 or clang don't have this analyzer. @janvorli, this seems like a good error to keep un-suppressed. and a slightly related warning (from same family of new warnings I think):
|
related gcc 11 warnings have been disabled: runtime/eng/native/configurecompiler.cmake Lines 402 to 404 in 790cd89
|
@AaronRobinsonMSFT, yup, these seem to be similar reports of incorrect pairing of {de}allocators in different places. |
@josalem, while going through these compiler warnings, I noticed that in extern "C" ULONG EventPipeWriteEventExecExeEnd(
LPCGUID ActivityId,
LPCGUID RelatedActivityId)
{
if (!EventPipeEventEnabledExecExeEnd())
return ERROR_SUCCESS;
BYTE stackBuffer[32];
BYTE *buffer = stackBuffer;
size_t offset = 0;
size_t size = 32;
bool fixedBuffer = true;
bool success = true;
if (!success)
{
if (!fixedBuffer)
delete[] buffer;
return ERROR_WRITE_FAULT;
}
EventPipeAdapter::WriteEvent(EventPipeEventExecExeEnd, (BYTE *)buffer, (unsigned int)offset, ActivityId, RelatedActivityId);
if (!fixedBuffer)
delete[] buffer;
return ERROR_SUCCESS;
} In this case
|
I think I'm missing what is leaking. The buffer is stack allocated here, no? I would really hope the compiler would remove the dead code blocks and reduce this to just the call to You're right though, those appear to be dead code. |
Ah right, then why
Yes compiler has enough mileage that it will produce fairly highly optimized code, but during the compilation different level of analysis are warning about various issues especially in |
100% agree. Based on the structure of the generated code, those if statements should be something we know the answer to at generation time.
Yeah... I saw that when I was reading the snippet. Luckily, it shouldn't hit the I think this is all pointing to the need to go through the generation scripts and update them. |
I have looked into fixing the delete[] / new discrepancy. I believe the problem is caused by the fact that the new (std::nothrow) is an operator that we never define ourselves, thus the code always calls the one from the standard libraries. However, we do define the global delete[] operator in the VM. And since VM is linked statically with PAL, the linker links the code with our global delete[]. |
See the PR where I've removed usages of new / delete here: dotnet/coreclr#1822 |
I would still add the DeleteBuffer function though. I have also noticed that for some reason, the genEventPipe.py uses BYTE* and the other char* for the buffer. It seems it would make sense to unify them. And then there will be no need for the extra |
Is this a conscious decision or has it just never been needed before? When I search for it in the runtime, I see lots of uses of |
That is not std::nothrow, but rather our own "nothrow" constant defined here:
and declared here: runtime/src/coreclr/inc/new.hpp Line 17 in c496700
As such, it is not available in the PAL. |
@davmason, thanks for testing it, great news! I forgot about this issue. :) On linux-x64 machine, I cannot reproduce it with latest It is very possible that it has been fixed during the SDL compliance work by @AaronRobinsonMSFT and others (#66154). Closing. |
Based on current main (5bef921), build product and tests in Debug configuration on Ubuntu 18.04 x64:
Running a simple hello world app works:
cd artifacts/bin/testhost/net6.0-Linux-Debug-x64/shared/Microsoft.NETCore.App/6.0.0 corerun /tmp/helloworld.dll
running the same app with event log enabled fails an assertion:
corehost, OTOH, has no issue with
DOTNET_EnableEventLog=1
(orCOMPlus_EnableEventLog=1
).The text was updated successfully, but these errors were encountered: