diff --git a/_posts/2023-11-07-FEX-2311.md b/_posts/2023-11-07-FEX-2311.md new file mode 100644 index 0000000..85a6c2e --- /dev/null +++ b/_posts/2023-11-07-FEX-2311.md @@ -0,0 +1,80 @@ +--- +layout: post +title: FEX 2311 Tagged! +author: FEX-Emu Maintainers +--- + +Another month gone by and another FEX release out the door! This last month was a bit of a less busy month as most of our team spent a week in Spain +to take part in [XDC 2023!](https://indico.freedesktop.org/event/4/) We did still have the rest of the month to do some work although, so let's get +to the changes! + +## Small bug fixes +This month we fixed a couple of bugs with could have caused spurious crashes! In fact while testing some upcoming performance optimizations, we fixed +a few unrelated bugs that was crashing Steam periodically! Always nice to see a bunch of little work that just improves the software, even if they +aren't a single big fix. + +- Fix register corruption when jumping out of JIT +- Fixes double munmap which would cause spurious pointer unmaps + - Fixes crashes when a program would shut down a thread +- Implements RPRES support and Fix implementation issue with ARM's new RPRES feature + - RPRES gives us the ability to do reciprocals in one instruction instead of using ARM's divide instruction. + - The bug would have caused invalid data to be returned + - No CPU supports this yet luckily +- Fixed issue with *at syscalls not working with absolute paths + - Broke Proton's pressure-vessel in weird and unique ways +- Fixes bug with named enum argument parser + - This is used to override CPU features with the FEX_HOSTFEATURES option so typically not hit + +## 32-bit thunking infrastructure +While 32-bit thunking is not yet in place, and this month it still isn't fully integrated, some of the code has been landing to work towards this +goal. In order to do 32-bit thunking the right way we are spending bunch of time ensuring that we have a proper daya layout analysis system in place +that is based on clang to do a couple of things. This analysis will let us to automatic translations of data structure from 32-bit in to 64-bit and +also alert us if something needs to be manually translated. This needs to be in place because otherwise we can end up in a situation where we +unknowingly corrupt data and it would be a nightmare to find. So this month we now have the ability to annotate our thunk definitions and start having +clang work for us. While not complete, some of the work has shown to have thunking working for 32-bit Super Meat Boy to work! It's getting there! + +## NZCV usage preparation +A big performance improvement that FEX is working on is to use the CPU's flags to directly emulate the x86 flags when possible. This is a long and +arduous task but the performance improvements will be huge once the code lands! A bunch of prep work this month has landed to start down this path but +we're going to need to let this sit in the oven for a bit longer. Check back next month to see if we get there! + +## Minor optimizations +With XDC being in the middle of the month, it caused most of the bigger work to be delayed so we have a bunch of smaller things this month! + +- Minor optimization to bfi/bfxil + - Removes one or two instructions for some instruction translations +- Optimize atomic fetch operations in to atomic if the result isn't used + - Removes a couple of instructions if the resulting fetch data isn't used. +- Implements support for ARM's new AFP extension + - Currently disabled until we can audit the codebase to ensure we aren't corrupting anything + - Lets us remove an insert after every scalar operation to match SSE behaviour +- Optimize palignr that behaves like a move + - Compilers shouldn't use this, but now we optimize it to a move +- Optimize pblendw + - A fairly uncommon instruction but now its implementation is basically as fast as it can be +- Optimize blendps + - We had already optimized blendpd last month, so this time was to optimize the 32-bit version + - Fairly commonly used so should improve perf in some games +- Optimize dpps and dppd + - These instructions do a dot product and a broadcast of their result but we couldn't find a game using it heavily + - So while this is now optimal, this is unlikely to affect any real game +- Optimize some 3DNow! instructions + - 3DNow! is a really old instruction extension that is basically only used in some really old games + - All of these instruction implementations are basically as fast as we can make them now, which is good! +- Optimize direction flag pointer offset calculation + - This converts a three instruction calculation down to one and stops using a ternary selection + - This happens with x86's repeat instructions, which typically happens for memcpy and memset + - Used a lot but is a minimal improvement. +- A few other random bits and bobs! + +## AVX optimizations! +While nothing supports our AVX implementation today, we have optimized a handful of implementations once hardware supports what we need. We have +optimized a smattering of instruction translations. +- 256-bit VExtr, VFCADD, VURAvg, VFDiv, VSMax, VSMin, VUMax, VUMin +- Removes a bunch of truncating moves + - If we know an AVX instruction is operating at 128-bit width, we can remove a redundant move which speeds things up! + +# Video game showcase + + +See the [2311 Release Notes](https://github.com/FEX-Emu/FEX/releases/tag/FEX-2311) or the [detailed change log](https://github.com/FEX-Emu/FEX/compare/FEX-2310...FEX-2311) in Github.