-
-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unreal Crash Handler Investigation #674
Comments
We need the list of info we get from crashpad, so we can understand what we're missing. Notes on in-proc vs out-of-proc Out-of-process and in-process crash handlers each have distinct advantages depending on the needs of the application and the level of resilience desired. Here’s a comparison of their main benefits: Out-of-Process Crash Handler Detailed Crash Reporting: Since it’s unaffected by the crashing process’s state, it can often gather more detailed diagnostics, including the memory dump, stack trace, and error codes, without being impacted by the corrupted state. Isolation from Application Failures: Because it operates separately, an out-of-process handler isn’t at risk of crashing alongside the application. This isolation helps ensure a higher success rate in recording crash data, particularly in scenarios with critical memory issues or extensive runtime faults. Lower Performance Overhead: Out-of-process handlers tend to have lower performance impact on the main application since they don’t continuously operate in the same memory space or on the same thread, which can help maintain performance under normal operations. Security and Stability: Running out of process limits the scope of what can be accessed or affected by the crashing process. This design is more secure and less likely to interfere with the crash handler’s stability. In-Process Crash Handler More Contextual Information: The in-process handler can access specific runtime details that may be lost in an out-of-process handler. For example, it can capture local variables and more immediate context about the function calls leading up to the crash. Simpler Implementation: In-process crash handlers can be easier to implement because they don’t require inter-process communication or separate handling mechanisms. This can be beneficial for lightweight applications or scenarios where crash handling is not mission-critical. Direct Memory Access: With direct access to the same memory space, in-process handlers can gather more granular details from the application state, especially if the crash isn’t catastrophic (e.g., less severe memory violations). Choosing Between the Two |
In the list of files u linked, I didn't see what actually creates the dumps. They just look for the dumps in the filesystem. Which tells me the engine is creating the dumps (in-proc)/ For example, Mac and iOS use PLCrashReporter: On Windows I see MiniDumpWriteDump, at least from Are these the actual mechanisms used by the crash reporter?
What does this actually look like in code (the files reading this data and adding to the crash dump)?
Shouldn't we be able to get this already from
Would we though? I'm not convinced yet because it's not clear to me what code we'd nee to use from the Unreal Engine and the effort it would be to take them. It sounds like we'd just be moving over to other standard crash reporting libraries and functions so not much gain there. We can already call
We're definitely not doing that. |
We already have a simple integration with the UE Crash Handler that allows us to enrich the captured crashes with some of the abovementioned properties. Basically, we're setting the One more place to look for ways to hook into Unreal's crash-handling flow is the FOutputDeviceError and the platform-specific implementations of its |
Overview
Out of the box unreal provides its own crash reporting tools to handle incidents on supported platforms. After initial investigation its been deemed work a deeper investigation and experimental implementation.
Locations of Interest
Unreal has aspects of the crash reporter distributed across multiple areas of the engine but there are a few key locations of note.
Crash Report Client :https://github.com/EpicGames/UnrealEngine/tree/release/Engine/Source/Programs/CrashReportClient
Root for core level abstractions.(This includes the stackwalker, crash context and more) :https://github.com/EpicGames/UnrealEngine/tree/release/Engine/Source/Runtime/Core/Public/GenericPlatform
Crash Report Core:https://github.com/EpicGames/UnrealEngine/tree/release/Engine/Source/Runtime/CrashReportCore
Gpu Crash:
UnrealEngine/Engine/Source/Runtime/RenderCore/Private/GPUDebugCrashUtils.cpp at release · EpicGames/UnrealEngine
UnrealEngine/Engine/Source/Runtime/RenderCore/Private/DumpGPU.cpp at release · EpicGames/UnrealEngine
Information Tracked
E.g. 4.3.0.0-2215663+UE-Releases+4.3
BuildVersion-BuiltFromCL-BranchName
f true the following properties are retrieved from the system: UserName (for non-launcher build) and EpicAccountID.
Can be empty and the CRC's default text will be shown.
Implementation Challenges
After reaching out to get more information on how native handles backends it came to my attention that we will run into issues supporting this crash handler. This is because the selected backend must exist at compile time with the current setup. This wouldn't be possible for unreal with the current setup as you would have to make unreal a 3rd party to native.
Reasons to Support Unreal Crash
There are a handful of reasons we may want to move forward with the Unreal Crash Handler. The biggest reason is we gain access to GPU related crash data such as Hangs and Device loss errors. In game development this is a very important aspect of error coverage. This is something that was recently request GPU crashes not captured by Sentry plugin · Issue #673
In terms of the wider dump information it doesn't look like there is to much information here that we don't already track but there is the potential for more detailed engine information that we do not normally tack such as the execution command line and the crash report message from the engine potentially pointing to the problem.
With Unreal's crash handler I believe it supports all target platforms at least in some form (currently not able to validate closed platforms). This could potentially expedite any porting projects and ensuring compatibility with all engine exports.
Additionally we also gain much better legacy support for the engine as the crash handler very rarely receives API breaking changes. This would mean that we can support much older engine versions with at worse just some minor version defines.
Finally this would solve ongoing issues with the current crash handler has been known to break compatibility via its updates. Currently this has required significant resources to fix these recurring update issues that could better be applied elsewhere.
What do we lose
One thing to note is we would likely have to do everything in-process this is a mix of a few limitations, first of all i believe we are limited on the marketplace side where we are not able to distribute executables programs. Additionally I believe that plugins are not capable of building executables therefore we would have to run a engine fork instead of a plugin similar to how NVidia operate there forks.
From what i can see the current Crashpad solution captures more raw system and crash information not directly related to Unreal itself. This level of information could potentially still be useful for developers.
Solutions
Option A
One potential option if we intend to move forward is open up native to allow for runtime backends. By doing this we would be able to add Unreal Crash Core as a backend into native with minimal changes to the current unreal plugin. I also believe this could be a benefit to native allowing other developers to inject there custom crash handlers as required.
Option B
Another potential solution is compiling a custom sentry native as part of the plugin, this version would have Unreal as a dependency and implement the crash handler in question. Currently this is my least desired approach as it would likely cause maintenance issues ensuring that this native version says in step with the main version. It would also introduce build complexity adding a custom 3rd party to compile as part of the build process.
Option 3
An option that was suggested whilst gathering info on native is potentially dropping native completely a re-implementing its behavior via unreal. This means we would re-implement much of the higher level API on sentry whilst using unreals packet and messaging system to send the data to Sentry. This provides a few gains on paper such as a simplified build system as we no longer need to pull native from CI but also comes with some tradeoffs like scale of work required to implement and maintain this new API.
The text was updated successfully, but these errors were encountered: