-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
- Loading branch information
1 parent
cc9fdcb
commit e98670b
Showing
2 changed files
with
146 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,146 @@ | ||
--- | ||
layout: post | ||
title: FEX 2409 Tagged | ||
author: FEX-Emu Maintainers | ||
--- | ||
|
||
FEX-2409 is now released... with a big performance boost. | ||
|
||
<div id="geekbench" style="min-width: 250px; height: 400px; margin: 0 auto"> | ||
</div> | ||
|
||
## I'm tired, carry me. | ||
|
||
Little differences between x86 and Arm can cause big performance penalties for | ||
emulators. None more so than flags. | ||
|
||
What are those? | ||
|
||
Flags are bits representing the processor state. For example, if an operation | ||
results in zero, the "zero flag" is set. Both x86 and Arm have flags, so for | ||
emulating x86 on Arm, we map x86 flags to Arm flags to reduce emulation | ||
overhead. That is possible because x86 and Arm have similar flags. By contrast, | ||
architectures like RISC-V lack flags, slowing down x86-on-RISC-V emulators. | ||
|
||
Many arithmetic operations set flags. Programs can then conditionally jump | ||
("branch") according to the flags. On x86, the flags are thus the building blocks | ||
of if statements and loops. To check if two variables are equal, x86 code | ||
subtracts them and checks the zero flag. To check if one variable is less than | ||
another, x86 code subtracts and checks the negative flag. This pattern -- | ||
subtracting, setting flags, and discarding the actual result -- is so common | ||
that it has a special instruction: **CMP** ("**C**o**MP**are"). | ||
|
||
If the story ended here, emulation would be easy. Unfortunately, we need to | ||
talk about the *carry* flag. | ||
|
||
After an addition, the *carry* flag indicates if the result overflowed. | ||
Programs can then check the carry flag to detect overflows. The flag can also | ||
be input to another addition to implement 128-bit additions. | ||
|
||
Subtractions are similar. In hardware, subtractions are additions with an | ||
operand negated. Because they are additions in hardware, subtractions set the | ||
carry flag. Precisely how is the carry flag defined for subtraction? There are | ||
two competing conventions. | ||
|
||
The first sets the flag when there is a *borrow*, by mathematical analogy with | ||
addition. x86 uses this "borrow flag" convention, as it seems more natural. | ||
|
||
The second option sets the flag when there is *not* a borrow. Isn't that | ||
backwards? It turns out that adding a (two's complement) negated operand | ||
overflows exactly when the subtraction does *not* borrow. This "true carry" | ||
convention matches actual hardware behaviour, while the "borrow" x86 convention | ||
requires extra gates to invert carry. Arm uses the "true carry" convention to | ||
save a few gates. | ||
|
||
Which convention should FEX use? | ||
|
||
We could store the x86 carry flag in the Arm carry flag. Unfortunately, that | ||
requires an extra instruction after each subtraction to invert carry to get the | ||
borrow flag. | ||
|
||
The counter-intuitive alternative is storing the *opposite* of the x86 flag. | ||
That requires an extra invert after every *addition*, but it eliminates the | ||
invert after subtraction. | ||
|
||
Either we pay after additions or after subtractions. Which should we pick? | ||
|
||
While addition is common, using the *flags* from an addition is not. Flags are | ||
typically used with comparisons, which are subtractions. Therefore, the | ||
inverted convention usually wins. This month, Alyssa adjusted FEX to invert | ||
carries, speeding up typical workloads by a few percent. | ||
|
||
After tackling the carry flag, Alyssa optimized FEX's translations of address | ||
modes, push/pop, AVX load/stores, and more. Overall, benchmarks are upwards of | ||
10% faster since the last release. | ||
|
||
## A Qt change | ||
|
||
What about more user-visible changes? If you use the FEXConfig tool to | ||
configure the emulator, you're in for a treat. While it works, this ImGui-based | ||
tool isn't exactly known for its convenience. In | ||
between his work optimizing the [redacted] out of FEX's [redacted], Tony | ||
rewrote FEXConfig as a simple Qt application, improving aesthetics, usability, | ||
and accessibility all in one go. Here's a preview: | ||
|
||
<a href="{{site.baseurl}}/images/posts/2409-09-05/FEXQonfig.png"><img title="Qt-based FEX configuration tool" width="100%" src="{{site.baseurl}}/images/posts/2409-09-05/FEXQonfig.png"/></a> | ||
|
||
Besides look and feel, we've polished first-time setup for logging, | ||
library forwarding, and RootFS images. We've also made tweaking various | ||
emulation settings a bit nicer. Users of our Ubuntu PPA can simply | ||
update to unlock these improvements without any further action. | ||
|
||
But with so much optimization, who needs speedhacks anymore? | ||
|
||
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js"> | ||
</script> | ||
<script src="https://code.highcharts.com/highcharts.js"> | ||
</script> | ||
<script src="https://code.highcharts.com/modules/exporting.js"> | ||
</script> | ||
|
||
<script type="text/javascript"> | ||
Highcharts.chart('geekbench', { | ||
chart: { | ||
type: 'column' | ||
}, | ||
title: { | ||
text: 'GeekBench 5.4.0 improvement from FEX-2408 to FEX-2409', | ||
align: 'left' | ||
}, | ||
subtitle: { | ||
text: | ||
'<a target="_blank" ' + | ||
'href="https://browser.geekbench.com/v5/cpu/compare/22841382?baseline=22841389">Raw Results</a>', | ||
align: 'left' | ||
}, | ||
xAxis: { | ||
categories: ['AES-XTS', 'Text Compression', 'Image Compression', 'Navigation', 'HTML5', 'SQLite', 'PDF Rendering', 'Text Rendering', 'Clang', 'Camera', 'N-Body Physics', 'Rigid Body Physics', 'Gaussian Blur', 'Face Detection', 'Horizon Detection', 'Image Inpainting', 'HDR', 'Ray Tracing', 'Structure from Motion', 'Speech Recognition', 'Machine Learning'], | ||
crosshair: true, | ||
accessibility: { | ||
description: 'Sub-benchmark' | ||
} | ||
}, | ||
yAxis: { | ||
min: -5, | ||
title: { | ||
text: '%' | ||
} | ||
}, | ||
tooltip: { | ||
valueSuffix: ' %' | ||
}, | ||
plotOptions: { | ||
column: { | ||
pointPadding: 0.2, | ||
borderWidth: 0 | ||
} | ||
}, | ||
series: [ | ||
{ | ||
name: 'Oryon-1 Improvement', | ||
data: [-2.3, 6.6, 7.3, 5.2, 5.4, 6.7, 6.4, 7.2, 5.1, 9.7, 0.8, 3.9, 11.9, 4.4, 5.5, 8.1, 13.2, 5.0, 8.6, 5.3, 24.9] | ||
} | ||
] | ||
}); | ||
|
||
</script> |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.