Skip to content

Commit

Permalink
FEX 2409
Browse files Browse the repository at this point in the history
Signed-off-by: Alyssa Rosenzweig <[email protected]>
  • Loading branch information
alyssarosenzweig authored and Sonicadvance1 committed Sep 6, 2024
1 parent cc9fdcb commit e98670b
Show file tree
Hide file tree
Showing 2 changed files with 146 additions and 0 deletions.
146 changes: 146 additions & 0 deletions _posts/2024-09-05-FEX-2409.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
---
layout: post
title: FEX 2409 Tagged
author: FEX-Emu Maintainers
---

FEX-2409 is now released... with a big performance boost.

<div id="geekbench" style="min-width: 250px; height: 400px; margin: 0 auto">
</div>

## I'm tired, carry me.

Little differences between x86 and Arm can cause big performance penalties for
emulators. None more so than flags.

What are those?

Flags are bits representing the processor state. For example, if an operation
results in zero, the "zero flag" is set. Both x86 and Arm have flags, so for
emulating x86 on Arm, we map x86 flags to Arm flags to reduce emulation
overhead. That is possible because x86 and Arm have similar flags. By contrast,
architectures like RISC-V lack flags, slowing down x86-on-RISC-V emulators.

Many arithmetic operations set flags. Programs can then conditionally jump
("branch") according to the flags. On x86, the flags are thus the building blocks
of if statements and loops. To check if two variables are equal, x86 code
subtracts them and checks the zero flag. To check if one variable is less than
another, x86 code subtracts and checks the negative flag. This pattern --
subtracting, setting flags, and discarding the actual result -- is so common
that it has a special instruction: **CMP** ("**C**o**MP**are").

If the story ended here, emulation would be easy. Unfortunately, we need to
talk about the *carry* flag.

After an addition, the *carry* flag indicates if the result overflowed.
Programs can then check the carry flag to detect overflows. The flag can also
be input to another addition to implement 128-bit additions.

Subtractions are similar. In hardware, subtractions are additions with an
operand negated. Because they are additions in hardware, subtractions set the
carry flag. Precisely how is the carry flag defined for subtraction? There are
two competing conventions.

The first sets the flag when there is a *borrow*, by mathematical analogy with
addition. x86 uses this "borrow flag" convention, as it seems more natural.

The second option sets the flag when there is *not* a borrow. Isn't that
backwards? It turns out that adding a (two's complement) negated operand
overflows exactly when the subtraction does *not* borrow. This "true carry"
convention matches actual hardware behaviour, while the "borrow" x86 convention
requires extra gates to invert carry. Arm uses the "true carry" convention to
save a few gates.

Which convention should FEX use?

We could store the x86 carry flag in the Arm carry flag. Unfortunately, that
requires an extra instruction after each subtraction to invert carry to get the
borrow flag.

The counter-intuitive alternative is storing the *opposite* of the x86 flag.
That requires an extra invert after every *addition*, but it eliminates the
invert after subtraction.

Either we pay after additions or after subtractions. Which should we pick?

While addition is common, using the *flags* from an addition is not. Flags are
typically used with comparisons, which are subtractions. Therefore, the
inverted convention usually wins. This month, Alyssa adjusted FEX to invert
carries, speeding up typical workloads by a few percent.

After tackling the carry flag, Alyssa optimized FEX's translations of address
modes, push/pop, AVX load/stores, and more. Overall, benchmarks are upwards of
10% faster since the last release.

## A Qt change

What about more user-visible changes? If you use the FEXConfig tool to
configure the emulator, you're in for a treat. While it works, this ImGui-based
tool isn't exactly known for its convenience. In
between his work optimizing the [redacted] out of FEX's [redacted], Tony
rewrote FEXConfig as a simple Qt application, improving aesthetics, usability,
and accessibility all in one go. Here's a preview:

<a href="{{site.baseurl}}/images/posts/2409-09-05/FEXQonfig.png"><img title="Qt-based FEX configuration tool" width="100%" src="{{site.baseurl}}/images/posts/2409-09-05/FEXQonfig.png"/></a>

Besides look and feel, we've polished first-time setup for logging,
library forwarding, and RootFS images. We've also made tweaking various
emulation settings a bit nicer. Users of our Ubuntu PPA can simply
update to unlock these improvements without any further action.

But with so much optimization, who needs speedhacks anymore?

<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js">
</script>
<script src="https://code.highcharts.com/highcharts.js">
</script>
<script src="https://code.highcharts.com/modules/exporting.js">
</script>

<script type="text/javascript">
Highcharts.chart('geekbench', {
chart: {
type: 'column'
},
title: {
text: 'GeekBench 5.4.0 improvement from FEX-2408 to FEX-2409',
align: 'left'
},
subtitle: {
text:
'<a target="_blank" ' +
'href="https://browser.geekbench.com/v5/cpu/compare/22841382?baseline=22841389">Raw Results</a>',
align: 'left'
},
xAxis: {
categories: ['AES-XTS', 'Text Compression', 'Image Compression', 'Navigation', 'HTML5', 'SQLite', 'PDF Rendering', 'Text Rendering', 'Clang', 'Camera', 'N-Body Physics', 'Rigid Body Physics', 'Gaussian Blur', 'Face Detection', 'Horizon Detection', 'Image Inpainting', 'HDR', 'Ray Tracing', 'Structure from Motion', 'Speech Recognition', 'Machine Learning'],
crosshair: true,
accessibility: {
description: 'Sub-benchmark'
}
},
yAxis: {
min: -5,
title: {
text: '%'
}
},
tooltip: {
valueSuffix: ' %'
},
plotOptions: {
column: {
pointPadding: 0.2,
borderWidth: 0
}
},
series: [
{
name: 'Oryon-1 Improvement',
data: [-2.3, 6.6, 7.3, 5.2, 5.4, 6.7, 6.4, 7.2, 5.1, 9.7, 0.8, 3.9, 11.9, 4.4, 5.5, 8.1, 13.2, 5.0, 8.6, 5.3, 24.9]
}
]
});

</script>
Binary file added images/posts/2409-09-05/FEXQonfig.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit e98670b

Please sign in to comment.