Skip to content

Commit

Permalink
2023-10-05-FEX-2310.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Sonicadvance1 committed Oct 6, 2023
1 parent 297f6ac commit 7454c23
Showing 1 changed file with 207 additions and 0 deletions.
207 changes: 207 additions & 0 deletions _posts/2023-10-05-FEX-2310.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
---
layout: post
title: FEX 2310 Tagged!
author: FEX-Emu Maintainers
---

Welcome back to another monthly release for FEX-Emu. You might be thinking that after last month's optimizations that we wouldn't have much to show
for this month. Well you would be wrong! We optimized even more! Let's get in to it!

## More instruction optimizations!
As stated last month, we introduced Instruction Count CI which has allowed us to do targeted optimizations of our code. One again we have optimized so
many instructions that it would be impossible to go through each individual change. Check our detailed change log if you want to see all the
instructions optimized. Let's just look at the final benchmark numbers compared to last month.

<div id="geekbench_5" style="min-width: 250px; height: 300px; margin: 0 auto">
</div>

<div id="bytemark_64bit" style="min-width: 250px; height: 300px; margin: 0 auto">
</div>

Let's talk about the [Geekbench 5.4](https://browser.geekbench.com/v5/cpu/compare/21805673?baseline=21674122) results first since they don't look very
impressive at first glance. While we are only showing ~13% of a performance improvement, the problem with this result is that this number is an
aggregate of multiple smaller benchmarks. Looking at the breakdown of all the subtests there are some that have improved by up to 66%! This is of
course because some benchmarks take advantage of some instructions that we optimized more heavily than others. Luckily this improvement also scales to
other video games as well.

The Bytemark improvements are a bit hard to make out, some numbers are hardly changed at all while a couple stand out as huge improvements. This
mostly comes down to some very specific instruction optimizations that significantly improved performance in a couple of tests and the rest don't show
up as much.

With this months optimizations and last months combined these optimizations end up being significantly more interesting. Some
[Geekbench](https://browser.geekbench.com/v5/cpu/compare/21805673?baseline=21671503) results are showing an average of 50% to 65% higher performance
sometimes even higher. Some benchmark results showing nearly 2x the performance compared to before! These numbers translate very well to gaming
performance where some games have more than doubled their FPS over the past couple months.

We're not slowing down either, we still have a ton of optimizations to go on our march to get our emulation close to native performance.

## Support preserve_all for interpreter fallbacks
We're calling out this particular optimization for three reasons.

1. It improves performance of x87 heavy code
2. It only works with the super recently released Clang 17
3. wine packages in FEX's rootfs use x87 heavily in some instances.

Let's talk about what this optimization is and how it improves performance. In Clang 17 they added support for a new function calling ABI called
preserve_all. x86 has supported this ABI for a very long time but it is a new addition for Arm64. This ABI breaks convention from the regular AAPCS64
ABI in that if a small function needs to more registers then they need to first save pretty much any of them. Unlike AAPCS64 where it has a bunch of
registers free for using. This is beneficial for FEX's JIT since we can save signicant time by not saving any state when we need to jump out of the
JIT and execute x87 softfloat code.

In particular this manifests to upwards of a 200% performance improvement in some microbenchmarks around x87 code! While this advantage is quite
significant, the only way to take advantage of it is to compile FEX with Clang 17. Since this compiler release came out only last month, pretty much
no distros have adopted it so it is unlikely to be used soon. In a few months time, or years depending on distro, they should naturally upgrade their
compiler stack and free performance improvements will happen.

As a fairly major side note to this excursion, FEX has found that the 32-bit wine packages that is compiled with Canonical's repository uses x87
heavily in some instances. This causes some really bad performance issues with some 32-bit games and installers. It is recommended to use Proton where
you can here since it compiles its 32-bit libraries with SSE optimizations instead which work significantly better.

FEX-Emu may look to provide its own wine packages in the future with this same optimization in place to help alleviate some of this burden. Until then
it is recommended to use FEX's x87 reduced precision mode to try and alleviate some of the overhead.

## Fixes a bug when chrooting in to rootfs
For quite a few months now FEX-Emu has changed some behaviour around chrooting in to the FEX rootfs.
While chrooting isn't generally advised, if a user wants to modify the rootfs then it's the only option. While we provide some scripts inside of our
rootfs images to facilitate this, it has been broken for a few months.

We have now fixed this bug in both FEX-Emu and the scripts inside of our rootfs images. So if you want to modify packages inside of the image you will
now be able to do so again. Make sure to update your image to get the new scripts!

## Remove x86-64 JIT and Interpreter
This has been a long time coming in the FEX-Emu project. We have had support for an IR interpreter and x86-64 host JIT for compatibility testing since
the project's inception. It has always been the case that if these CPU backends get in the way of the ARM64 JIT that they would get removed.

That time has finally come. Due to some upcoming changes around how flags are getting represented in FEX's JIT and the general burden of implemented
FEX's IR operations three times, often undoing an x86->Arm64 translation to go back to x86. It has been deemed too much of a burden and these have
been removed. This is a necessary step for our ARM64 JIT to gain more performance as we continue working to make it better.

We are looking forward to future ARM platforms that can take Radeon GPUs through PCIe slots to regain a platform which can test RADV directly, but
until that point we will have to make due with our current devices.

## Instruction Count CI on x86-64 hosts
While we removed our x86-64 JIT, we do have a fun addition to our instruction count CI. Now developers that don't have an Arm64 device handy can still
run the Instruction Count CI and attempt to optimize implementations without even having an ARM64 device to run it on. This is as simple as building
FEX on an x86-64 device with the Vixl disassembler and simulator enabled and you will be able to optimize to your hearts content!

We've got a need for JIT speed! Let's go fast!

## Implement first optimizations using 128-bit SVE
This is a fairly minor change but previously FEX was not using any 128-bit SVE instructions. This is primarily because there aren't really any SVE
supporting devices in the consumer market, even though Snapdragon hardware theoretically supports it. 128-bit SVE adds a couple of optimizations that
we can use.

- Wide-element shifts
- Index instruction for generating simple index masks

While these are fairly simple initially, they change some from being translated to six instructions down to one or two depending. This is a fairly
minor change, but it is good to note that FEX is now taking advantage of SVE if it is available!

## Adds WOW64 frontend
This has been a long time coming, with us adding initial mingw support back in FEX-2305. FEXCore now supports being built with a brand new WOW64 WINE
frontend. While currently not being utilized, this will allow WINE to integrate FEX directly in to its WOW64 layer for running both x86 and x86-64
applications on Arm64 host devices.

This is a very substantial change to how WINE integrates with FEX, since today FEX-Emu just runs the full x86-64 WINE process and eats the overhead of
emulating everything WINE needs to do. With the WOW64 layer now implemented, a bunch of the WINE code can now be Arm64 native code and when it needs
to execute application code it just jumps back to the emulator. This is similar to how Windows natively handles its emulation through its "XTA" layer.
Sadly today this is only wired up to work through a 32-bit x86 part of the layer, we need to get setup to support Wine when it inevitably supports
Wow64 for x86_64->Arm64.

Big shout out to [ByLaws](https://github.com/bylaws) implementing support for this! We look forward to future Wine integration work landing!

## Implement thunking support for wayland-client and zink
We have some improvements to thunking this month! As we are working towards supporting thunking more code, we implemented some features to get
wayland-client thunking wired up. While this support is early, it is enough to get Super Meat Boy up and running using wayland and zink overrides
within a Wayland environment. We look forward to additional thunking improvements going forward so that performance can be improved everywhere.

# Video game showcase
<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/WpIBU-67utc" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

See the [2310 Release Notes](https://github.com/FEX-Emu/FEX/releases/tag/FEX-2310) or the [detailed change log](https://github.com/FEX-Emu/FEX/compare/FEX-2309...FEX-2310) in Github.

<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js">
</script>
<script src="https://code.highcharts.com/highcharts.js">
</script>
<script src="https://code.highcharts.com/modules/exporting.js">
</script>

<script type="text/javascript">
Highcharts.chart('bytemark_64bit', {
chart: {
type: 'column'
},
title: {
text: 'Cortex-X1C bytemark 64-bit uplift between FEX-2309 and FEX-2310',
align: 'left'
},
xAxis: {
categories: ['Numeric Sort', 'String Sort', 'Bitfield', 'FPEmu', 'Fourier', 'Assign', 'Idea', 'Huffman', 'NN', 'LU Decomp'],
crosshair: true,
accessibility: {
description: 'sub-benchmarks'
}
},
yAxis: {
min: -5,
title: {
text: 'Performance improvement %'
}
},
tooltip: {
valueSuffix: '%'
},
plotOptions: {
column: {
pointPadding: 0.2,
borderWidth: 0
}
},
series: [
{
name: 'FEX-2310',
data: [7.6, 1.22, -1.03, 61.21, 7.13, 7.30, 45.24, 12.98, 1.76, 2.66]
}
]
});

Highcharts.chart('geekbench_5', {
chart: {
type: 'column'
},
title: {
text: 'Lenovo X13s Geekbench 5.4.0 uplift between FEX-2309 and FEX-2310',
align: 'left'
},
xAxis: {
categories: ['Single-core score', 'Multi-core score'],
crosshair: true,
accessibility: {
description: 'sub-benchmarks'
}
},
yAxis: {
min: 0,
title: {
text: 'Performance improvement %'
}
},
tooltip: {
valueSuffix: '%'
},
plotOptions: {
column: {
pointPadding: 0.2,
borderWidth: 0
}
},
series: [
{
name: 'FEX-2310',
data: [13.0, 12.2]
}
]
});

</script>

0 comments on commit 7454c23

Please sign in to comment.