diff --git a/_posts/2024-09-05-FEX-2409.md b/_posts/2024-09-05-FEX-2409.md new file mode 100644 index 0000000..4e2e543 --- /dev/null +++ b/_posts/2024-09-05-FEX-2409.md @@ -0,0 +1,146 @@ +--- +layout: post +title: FEX 2409 Tagged +author: FEX-Emu Maintainers +--- + +FEX-2409 is now released... with a big performance boost. + +
+
+ +## I'm tired, carry me. + +Little differences between x86 and Arm can cause big performance penalties for +emulators. None more so than flags. + +What are those? + +Flags are bits representing the processor state. For example, if an operation +results in zero, the "zero flag" is set. Both x86 and Arm have flags, so for +emulating x86 on Arm, we map x86 flags to Arm flags to reduce emulation +overhead. That is possible because x86 and Arm have similar flags. By contrast, +architectures like RISC-V lack flags, slowing down x86-on-RISC-V emulators. + +Many arithmetic operations set flags. Programs can then conditionally jump +("branch") according to the flags. On x86, the flags are thus the building blocks +of if statements and loops. To check if two variables are equal, x86 code +subtracts them and checks the zero flag. To check if one variable is less than +another, x86 code subtracts and checks the negative flag. This pattern -- +subtracting, setting flags, and discarding the actual result -- is so common +that it has a special instruction: **CMP** ("**C**o**MP**are"). + +If the story ended here, emulation would be easy. Unfortunately, we need to +talk about the *carry* flag. + +After an addition, the *carry* flag indicates if the result overflowed. +Programs can then check the carry flag to detect overflows. The flag can also +be input to another addition to implement 128-bit additions. + +Subtractions are similar. In hardware, subtractions are additions with an +operand negated. Because they are additions in hardware, subtractions set the +carry flag. Precisely how is the carry flag defined for subtraction? There are +two competing conventions. + +The first sets the flag when there is a *borrow*, by mathematical analogy with +addition. x86 uses this "borrow flag" convention, as it seems more natural. + +The second option sets the flag when there is *not* a borrow. Isn't that +backwards? It turns out that adding a (two's complement) negated operand +overflows exactly when the subtraction does *not* borrow. This "true carry" +convention matches actual hardware behaviour, while the "borrow" x86 convention +requires extra gates to invert carry. Arm uses the "true carry" convention to +save a few gates. + +Which convention should FEX use? + +We could store the x86 carry flag in the Arm carry flag. Unfortunately, that +requires an extra instruction after each subtraction to invert carry to get the +borrow flag. + +The counter-intuitive alternative is storing the *opposite* of the x86 flag. +That requires an extra invert after every *addition*, but it eliminates the +invert after subtraction. + +Either we pay after additions or after subtractions. Which should we pick? + +While addition is common, using the *flags* from an addition is not. Flags are +typically used with comparisons, which are subtractions. Therefore, the +inverted convention usually wins. This month, Alyssa adjusted FEX to invert +carries, speeding up typical workloads by a few percent. + +After tackling the carry flag, Alyssa optimized FEX's translations of address +modes, push/pop, AVX load/stores, and more. Overall, benchmarks are upwards of +10% faster since the last release. + +## A Qt change + +What about more user-visible changes? If you use the FEXConfig tool to +configure the emulator, you're in for a treat. While it works, this ImGui-based +tool isn't exactly known for its convenience. In +between his work optimizing the [redacted] out of FEX's [redacted], Tony +rewrote FEXConfig as a simple Qt application, improving aesthetics, usability, +and accessibility all in one go. Here's a preview: + + + +Besides look and feel, we've polished first-time setup for logging, +library forwarding, and RootFS images. We've also made tweaking various +emulation settings a bit nicer. Users of our Ubuntu PPA can simply +update to unlock these improvements without any further action. + +But with so much optimization, who needs speedhacks anymore? + + + + + + diff --git a/images/posts/2409-09-05/FEXQonfig.png b/images/posts/2409-09-05/FEXQonfig.png new file mode 100644 index 0000000..eda35ec Binary files /dev/null and b/images/posts/2409-09-05/FEXQonfig.png differ