Linux: Implements a fault safe memcpy routine #3375

Sonicadvance1 · 2024-01-17T12:14:58Z

We are required in our syscall emulation to handle cases where pointers
are invalid. This means we need to pessimistically assume a memcpy will
fault when reading application memory.

This implements a signal handler based approach to catching the SIGSEGV
on memcpy and returning an EFAULT if it faults.

neobrain · 2024-01-22T21:02:27Z

unittests/FEXLinuxTests/tests/syscalls/syscalls_efault.cpp

+  CHECK(errno == EFAULT);
+}
+
+TEST_CASE("ppoll", "[!mayfail]") {


Why is this one marked as mayfail but the others aren't? Shouldn't all of them fail since we don't emulate the kernel-internal recovery in the latest version?

Because only this one may fail and only on 32-bit.

Why do the tests pass otherwise, even though we don't emulate the tested property? What do the tests test?

The 64-bit tests pass successfully because we pass the syscalls to the kernel. The 32-bit tests need the emulation, thus the EFAULT handling. If the EFAULT capturing is disabled then Catch2 will catch the SIGSEGV resulting in a fail, if the EFAULT capturing is enabling then we capture and result in a SIGABRT that Catch2 doesn't catch.

Not sure what "EFAULT capturing" means, but I'm assuming you meant to write "EFAULT handling" again, i.e. returning EFAULT to reproduce the kernel's behavior.

Why is it faulting with SIGABRT if EFAULT handling is enabled?

It will exit with SIGABRT because of the log message in SignalDelegator when it catches a fault LogMan::Msg::AFmt("Received invalid data to syscall. Crashing now!");

You're now referring to the case I was not asking about. I probably misinterpreted what you meant with EFAULT capturing, and it's hard to interpret your explanation based on guesswork.

What I do understand now is why the tests pass on 64-bit regardless of whether we enable the new emulation code or not. You fully lost me on what's going on for 32-bit though. Could you try explaining again why some of the tests are marked as mayfail for 32-bit and why others are not, while clearly mentioning which configurations you're referring to?

I'll make this easier for everyone. I removed the tag. The test is supposed to pass. It'll correctly pass when we return EFAULT instead of crashing.

I don't understand what disconnect we're having here. The main feature of the revised PR is to detect if system calls are called with invalid arguments. With the unit tests intentionally triggering this faulty behavior, how is it possible that our detection logic isn't making all of them fail on 32-bit? Is the detection logic really working properly?

We're likely talking past each other because of test bifurcation. 32-bit, 64-bit, efault handling returning efault or just aborting.
Fundamentally there are three tests

poll

ppoll

ppoll_time64 (This one is duplicated on 64-bit since it maps to the ppoll syscall)

Each test broken down.

poll test

This one doesn't really matter, we pass the arguments untouched to the kernel where it correctly returns EFAULT for us. It's just coverage of the poll family of syscalls.

ppoll (32-bit)

The pollfds and sigset_t argument are passed to kernel where it handles EFAULT for us

The timespec argument does CopyFromUser

If EFAULT handling will return EFAULT or abort depending on the constexpr

Current code it will SIGABRT which kills the entire Catch2 test process

Once EFAULT handling inevitably gets switched over to returning EFAULT, this test can be removed from the Known_Failures

ppoll (64-bit)

Everything is passed to the kernel directly. We don't need to handle shit.

ppoll_time64 (32-bit only because the previously mentioned syscall aliasing)

Same as ppoll (64-bit), Every argument is passed directly to the kernel without userspace touching it.

As you can see, our detection logic can't make them all fail, because we are using the kernel's mechanisms to handle most of the arguments.

And yes the detection logic is working properly.

Thanks, that helps. The term "efault handling" was my main source of confusion (considering EFAULT isn't involved in FEX's new behavior at all), but with the breakdown I could understand what you're referring to.

With this explained, I would've been happy to keep the original mayfail tag, but it doesn't make a big difference where we declare the potential test failure.

Source/Tools/LinuxEmulation/LinuxSyscalls/SignalDelegator.cpp

neobrain · 2024-01-23T09:50:30Z

unittests/FEXLinuxTests/tests/syscalls/syscalls_efault.cpp

+  CHECK(errno == EFAULT);
+}
+
+TEST_CASE("ppoll", "[!mayfail]") {


Not sure what "EFAULT capturing" means, but I'm assuming you meant to write "EFAULT handling" again, i.e. returning EFAULT to reproduce the kernel's behavior.

Why is it faulting with SIGABRT if EFAULT handling is enabled?

neobrain · 2024-01-23T09:52:39Z

Source/Tools/LinuxEmulation/LinuxSyscalls/x32/FD.cpp

+        struct timespec32 timeout{};
+        timeout = tp64;
+
+        // If the copy fails to update the timeout, then this is safe to ignore the result.


Why is it "safe" to ignore the result here but not above? The comment doesn't actually explain this.

Updated comment.

neobrain · 2024-01-23T09:54:04Z

Source/Tools/LinuxEmulation/LinuxSyscalls/Syscalls.h

+// These are little helper functions for cases when FEX needs to copy data to or from the application in a robust fashion.
+// CopyFromUser and CopyToUser are memcpy routines that expect to safely SIGSEGV when reading or writing application memory respectively.
+// Returns zero if the memcpy completed, or EFAULT if the memcpy faulted.


These comments need updating now, since the functions don't return EFAULT anymore but instead are merely safeguards that reliably crash if invalid arguments are given.

Source/Tools/LinuxEmulation/LinuxSyscalls/FaultSafeMemcpy.cpp

Source/Tools/LinuxEmulation/LinuxSyscalls/SignalDelegator.cpp

unittests/FEXLinuxTests/tests/syscalls/syscalls_efault.cpp

unittests/FEXLinuxTests/Known_Failures

neobrain · 2024-01-25T12:53:58Z

unittests/FEXLinuxTests/tests/syscalls/syscalls_efault.cpp

+  CHECK(errno == EFAULT);
+}
+
+TEST_CASE("ppoll", "[!mayfail]") {


Thanks, that helps. The term "efault handling" was my main source of confusion (considering EFAULT isn't involved in FEX's new behavior at all), but with the breakdown I could understand what you're referring to.

With this explained, I would've been happy to keep the original mayfail tag, but it doesn't make a big difference where we declare the potential test failure.

Source/Tools/LinuxEmulation/LinuxSyscalls/Syscalls.h

Source/Tools/LinuxEmulation/LinuxSyscalls/x32/FD.cpp

unittests/FEXLinuxTests/tests/syscalls/syscalls_efault.cpp

We are required in our syscall emulation to handle cases where pointers are invalid. This means we need to pessimistically assume a memcpy will fault when reading application memory. This implements a signal handler based approach to catching the SIGSEGV on memcpy and returning an EFAULT if it faults.

unittests/FEXLinuxTests/tests/syscalls/syscalls_efault.cpp

unittests/FEXLinuxTests/Known_Failures

Only need to handle the timeout structure, the rest is handled in the kernel itself.

Sonicadvance1 force-pushed the implement_efault_support branch 3 times, most recently from e9af4c1 to cd9a1a3 Compare January 18, 2024 21:17

neobrain reviewed Jan 22, 2024

View reviewed changes

alyssarosenzweig reviewed Jan 22, 2024

View reviewed changes

Source/Tools/LinuxEmulation/LinuxSyscalls/SignalDelegator.cpp Show resolved Hide resolved

neobrain reviewed Jan 23, 2024

View reviewed changes

Sonicadvance1 force-pushed the implement_efault_support branch from cd9a1a3 to 7e52689 Compare January 23, 2024 16:34

neobrain reviewed Jan 25, 2024

View reviewed changes

Sonicadvance1 force-pushed the implement_efault_support branch 2 times, most recently from ece29b3 to dc42949 Compare January 25, 2024 22:02

neobrain approved these changes Jan 26, 2024

View reviewed changes

unittests/FEXLinuxTests/tests/syscalls/syscalls_efault.cpp Outdated Show resolved Hide resolved

unittests/FEXLinuxTests/Known_Failures Show resolved Hide resolved

Sonicadvance1 force-pushed the implement_efault_support branch from dc42949 to 82f6b7a Compare January 26, 2024 09:52

Sonicadvance1 added 3 commits January 26, 2024 01:54

Implements a FEXLinuxTest for testing ppoll EFAULT behaviour

69ff984

Linux: Implements support for EFAULT with ppoll's timeout

929193c

Only need to handle the timeout structure, the rest is handled in the kernel itself.

Linux: Disable EFAULT handler until we find something that uses it.

0913741

Sonicadvance1 force-pushed the implement_efault_support branch from 82f6b7a to 0913741 Compare January 26, 2024 09:54

Sonicadvance1 merged commit 8e3d4a3 into FEX-Emu:main Jan 26, 2024
10 checks passed

Sonicadvance1 deleted the implement_efault_support branch January 26, 2024 10:15

Sonicadvance1 mentioned this pull request Aug 11, 2024

Document, Detect, Implement EFAULT handling in syscall handler #3942

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linux: Implements a fault safe memcpy routine #3375

Linux: Implements a fault safe memcpy routine #3375

Sonicadvance1 commented Jan 17, 2024

neobrain Jan 22, 2024

Sonicadvance1 Jan 22, 2024

neobrain Jan 22, 2024

Sonicadvance1 Jan 22, 2024

neobrain Jan 23, 2024

neobrain Jan 23, 2024

Sonicadvance1 Jan 23, 2024

neobrain Jan 24, 2024 •

edited

Loading

Sonicadvance1 Jan 24, 2024

neobrain Jan 25, 2024

neobrain Jan 23, 2024

neobrain Jan 23, 2024

Sonicadvance1 Jan 23, 2024

neobrain Jan 23, 2024

Sonicadvance1 Jan 23, 2024

neobrain Jan 25, 2024

Linux: Implements a fault safe memcpy routine #3375

Linux: Implements a fault safe memcpy routine #3375

Conversation

Sonicadvance1 commented Jan 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neobrain Jan 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neobrain Jan 24, 2024 •

edited

Loading