diff --git a/_posts/2023-10-05-FEX-2310.md b/_posts/2023-10-05-FEX-2310.md new file mode 100644 index 0000000..2628377 --- /dev/null +++ b/_posts/2023-10-05-FEX-2310.md @@ -0,0 +1,207 @@ +--- +layout: post +title: FEX 2310 Tagged! +author: FEX-Emu Maintainers +--- + +Welcome back to another monthly release for FEX-Emu. You might be thinking that after last month's optimizations that we wouldn't have much to show +for this month. Well you would be wrong! We optimized even more! Let's get in to it! + +## More instruction optimizations! +As stated last month, we introduced Instruction Count CI which has allowed us to do targeted optimizations of our code. One again we have optimized so +many instructions that it would be impossible to go through each individual change. Check our detailed change log if you want to see all the +instructions optimized. Let's just look at the final benchmark numbers compared to last month. + +
+
+ +
+
+ +Let's talk about the [Geekbench 5.4](https://browser.geekbench.com/v5/cpu/compare/21805673?baseline=21674122) results first since they don't look very +impressive at first glance. While we are only showing ~13% of a performance improvement, the problem with this result is that this number is an +aggregate of multiple smaller benchmarks. Looking at the breakdown of all the subtests there are some that have improved by up to 66%! This is of +course because some benchmarks take advantage of some instructions that we optimized more heavily than others. Luckily this improvement also scales to +other video games as well. + +The Bytemark improvements are a bit hard to make out, some numbers are hardly changed at all while a couple stand out as huge improvements. This +mostly comes down to some very specific instruction optimizations that significantly improved performance in a couple of tests and the rest don't show +up as much. + +With this months optimizations and last months combined these optimizations end up being significantly more interesting. Some +[Geekbench](https://browser.geekbench.com/v5/cpu/compare/21805673?baseline=21671503) results are showing an average of 50% to 65% higher performance +sometimes even higher. Some benchmark results showing nearly 2x the performance compared to before! These numbers translate very well to gaming +performance where some games have more than doubled their FPS over the past couple months. + +We're not slowing down either, we still have a ton of optimizations to go on our march to get our emulation close to native performance. + +## Support preserve_all for interpreter fallbacks +We're calling out this particular optimization for three reasons. + +1. It improves performance of x87 heavy code +2. It only works with the super recently released Clang 17 +3. wine packages in FEX's rootfs use x87 heavily in some instances. + +Let's talk about what this optimization is and how it improves performance. In Clang 17 they added support for a new function calling ABI called +preserve_all. x86 has supported this ABI for a very long time but it is a new addition for Arm64. This ABI breaks convention from the regular AAPCS64 +ABI in that if a small function needs to more registers then they need to first save pretty much any of them. Unlike AAPCS64 where it has a bunch of +registers free for using. This is beneficial for FEX's JIT since we can save signicant time by not saving any state when we need to jump out of the +JIT and execute x87 softfloat code. + +In particular this manifests to upwards of a 200% performance improvement in some microbenchmarks around x87 code! While this advantage is quite +significant, the only way to take advantage of it is to compile FEX with Clang 17. Since this compiler release came out only last month, pretty much +no distros have adopted it so it is unlikely to be used soon. In a few months time, or years depending on distro, they should naturally upgrade their +compiler stack and free performance improvements will happen. + +As a fairly major side note to this excursion, FEX has found that the 32-bit wine packages that is compiled with Canonical's repository uses x87 +heavily in some instances. This causes some really bad performance issues with some 32-bit games and installers. It is recommended to use Proton where +you can here since it compiles its 32-bit libraries with SSE optimizations instead which work significantly better. + +FEX-Emu may look to provide its own wine packages in the future with this same optimization in place to help alleviate some of this burden. Until then +it is recommended to use FEX's x87 reduced precision mode to try and alleviate some of the overhead. + +## Fixes a bug when chrooting in to rootfs +For quite a few months now FEX-Emu has changed some behaviour around chrooting in to the FEX rootfs. +While chrooting isn't generally advised, if a user wants to modify the rootfs then it's the only option. While we provide some scripts inside of our +rootfs images to facilitate this, it has been broken for a few months. + +We have now fixed this bug in both FEX-Emu and the scripts inside of our rootfs images. So if you want to modify packages inside of the image you will +now be able to do so again. Make sure to update your image to get the new scripts! + +## Remove x86-64 JIT and Interpreter +This has been a long time coming in the FEX-Emu project. We have had support for an IR interpreter and x86-64 host JIT for compatibility testing since +the project's inception. It has always been the case that if these CPU backends get in the way of the ARM64 JIT that they would get removed. + +That time has finally come. Due to some upcoming changes around how flags are getting represented in FEX's JIT and the general burden of implemented +FEX's IR operations three times, often undoing an x86->Arm64 translation to go back to x86. It has been deemed too much of a burden and these have +been removed. This is a necessary step for our ARM64 JIT to gain more performance as we continue working to make it better. + +We are looking forward to future ARM platforms that can take Radeon GPUs through PCIe slots to regain a platform which can test RADV directly, but +until that point we will have to make due with our current devices. + +## Instruction Count CI on x86-64 hosts +While we removed our x86-64 JIT, we do have a fun addition to our instruction count CI. Now developers that don't have an Arm64 device handy can still +run the Instruction Count CI and attempt to optimize implementations without even having an ARM64 device to run it on. This is as simple as building +FEX on an x86-64 device with the Vixl disassembler and simulator enabled and you will be able to optimize to your hearts content! + +We've got a need for JIT speed! Let's go fast! + +## Implement first optimizations using 128-bit SVE +This is a fairly minor change but previously FEX was not using any 128-bit SVE instructions. This is primarily because there aren't really any SVE +supporting devices in the consumer market, even though Snapdragon hardware theoretically supports it. 128-bit SVE adds a couple of optimizations that +we can use. + +- Wide-element shifts +- Index instruction for generating simple index masks + +While these are fairly simple initially, they change some from being translated to six instructions down to one or two depending. This is a fairly +minor change, but it is good to note that FEX is now taking advantage of SVE if it is available! + +## Adds WOW64 frontend +This has been a long time coming, with us adding initial mingw support back in FEX-2305. FEXCore now supports being built with a brand new WOW64 WINE +frontend. While currently not being utilized, this will allow WINE to integrate FEX directly in to its WOW64 layer for running both x86 and x86-64 +applications on Arm64 host devices. + +This is a very substantial change to how WINE integrates with FEX, since today FEX-Emu just runs the full x86-64 WINE process and eats the overhead of +emulating everything WINE needs to do. With the WOW64 layer now implemented, a bunch of the WINE code can now be Arm64 native code and when it needs +to execute application code it just jumps back to the emulator. This is similar to how Windows natively handles its emulation through its "XTA" layer. +Sadly today this is only wired up to work through a 32-bit x86 part of the layer, we need to get setup to support Wine when it inevitably supports +Wow64 for x86_64->Arm64. + +Big shout out to [ByLaws](https://github.com/bylaws) implementing support for this! We look forward to future Wine integration work landing! + +## Implement thunking support for wayland-client and zink +We have some improvements to thunking this month! As we are working towards supporting thunking more code, we implemented some features to get +wayland-client thunking wired up. While this support is early, it is enough to get Super Meat Boy up and running using wayland and zink overrides +within a Wayland environment. We look forward to additional thunking improvements going forward so that performance can be improved everywhere. + +# Video game showcase + + +See the [2310 Release Notes](https://github.com/FEX-Emu/FEX/releases/tag/FEX-2310) or the [detailed change log](https://github.com/FEX-Emu/FEX/compare/FEX-2309...FEX-2310) in Github. + + + + + +