Miss us? Summer only lasts so long and some of us need our holidays.
2023 continued to deliver swathes of new Switch titles with Pikmin 4 and a remast-... what? It’s a port? Well if you say so… Ahem, a re-release of Rockstar’s western classic: Red Dead Redemption. A game which has been killing emulators since before some of our readers could read. Luckily we’re, as the kids say, built different.
We’ve got a good one cooked up so let’s get to it.
We’ll start with some good old OpenGL games: Wreckfest and 20XX; the latter not to be confused with 30XX or a niche Super Smash Bros. Melee meme of the same name. Both of these titles rendered perfectly, but upside down. Requiring users to bring a horizontal mirror with them before sessions seemed a tough ask, so fixing an incorrect fragment origin seemed a better solution.
Let’s stick to some guest OpenGL game bugs in the form of Dragon Quest Builders. A game which a discord user, who will go unnamed, has mentioned in at least 30% of their server messages. The dedication to the cause is truly inspiring.
DQB, and potentially other OpenGL Switch games, had a very strange issue where item icons would simply be muddled up with other item icons. Even with direct comparisons it’s hard to spot what exactly is wrong, unless you’ve played before.
By tracking buffer copies that modify texture memory, which resolves an issue where buffer data was being copied directly into memory. This only works correctly if the texture data does not already exist, if it does… weird stuff.
Alright after this we’ll shut up about OpenGL games, they’re just so annoying?
Some of them were using some interesting texture formats that we’ve never seen before like: Z16RUnormGUintBUintAUint which appears to just be an extremely long alias for Z16Unorm. This is used for some shadowmaps in titles such as Go Rally, Pyramid Quest and Monster Blast.
Moving onto Jurassic World Evolution Complete Edition, which is surely in contention for the ‘most Bethesda game name’ award, had been in a state of relative limbo since its release. The game has booted since release, but upon entry to a campaign, would deliver nothing but a view of your desktop, the program having crashed. However, by eventually tracking the issue down to a mishandled case in shader instructions, we can finally get a look at some trees.
Red. Dead. Redemption?
For those that didn’t experience this game all the way back in the late 2000s, Rockstar graciously blessed us with a new Switch port of their original PS3/X360 title. As it has never yet seen a PC release, it has stood the test of time as the game to stress PS3 and Xbox 360 emulators such as RPCS3 and Xenia. You can’t find a video on these without stumbling across RDR1 somewhere.
So the question on everyone's mind, admittedly including our own, was if the Switch version was going to be another wasted effort, or a genuine alternative route to getting it onto PC. Thankfully the answer is fairly positive. After fixing a Vulkan-specific bug with masked stencil clears, which resolved a very interesting psychedelic effect where foliage failed to render, the rest of the experience was as close to flawless as we dare call anything.
Hardware requirements for the native 30FPS are fairly modest and performance can reach 60FPS (and beyond) on the top-end CPUs. Switch emulation being relatively GPU-light means that resolution scaling to 4K or higher is effectively free on any competent GPU. We dislike outwardly making comparisons to other emulators, especially when they’re completely different consoles, but we’re confident that we’d place favorably in an RDR1 emulation tier list!
Moving away from video game westerns, how about we talk about AMD? It's been a while since we’ve had a good therapeutic rant. Well maybe this one is a little more justified.
The Switch is powered by an Nvidia-designed Tegra X1 which means that sometimes, where Nvidia and AMD diverge in how they design their hardware, workarounds are going to be required.
One such issue showed itself in a lot of games, especially Unreal Engine titles. GPUs have a property which is usually called `Invocations per subgroup` and crucially AMD and Nvidia diverge here in their GPU designs. Nvidia uses 32 invocations per subgroup, while AMD uses 64. RDNA onwards support a Vulkan extension which allows a GPU to change its subgroup size but only for compute shaders, so while this fixed some games like Shin Megami Tensei V on modern AMD GPUs, if a game used these operations in the rest of the graphics pipeline, no dice.
The solution is to simply sub-divide the 64 into two groups of 32 instead of just ignoring any extra invocations beyond the 32nd, resulting in the mess seen above. Fixing this fixes a staggering number of AMD-exclusive graphical bugs.
It was a good couple of months for AMD users in general, as new contributor gleng, isolated and fixed a prevalent issue AMD owners were experiencing on macOS devices. Namely that some games like Pokémon Scarlet/Violet only rendered in ¼ of the screen space.
The AMD metal driver, or when it goes through MoltenVK, seems to have serious issues using the `VK_EXT_shader_viewport_index_layer` extension, which results in disaster. Luckily, there doesn’t seem to be any negatives to just disabling the use of this specifically for AMD devices when using MoltenVK.
All this macOS talk brings us nicely into our next little section. Because this is gonna be the last one.
These last couple of months have been really huge for our macos1 upstream progress, coming off the back of transform feedback emulation in June. The final large refactor of the shader backend was pushed through, which simplifies one of the final puzzle pieces of geometry shader emulation.
Before we get to that though, a long awaited change on the performance front was the introduction of Buffer Mirrors. The Switch GPU has certain methods that allow it to load arbitrary data into buffers at any time with no need for additional barriers. Vulkan on the other hand does not have any functionality that allows these arbitrary updates; you must be outside of a render pass and manually perform a buffer copy or compute write to achieve a similar outcome. On desktop GPUs, interrupting a render pass is not particularly expensive and can even be considered free. This is unfortunately not the case on mobile GPUs (such as those found in M1/M2 chips), where ending a render pass is much more of a commitment.
The solution to this intriguing issue is aptly ingenious. Whenever a game requests an inline buffer update, we can take a ‘mirror’ of the current binding, perform the update and then rebind to the new mirrored buffer. This can greatly improve performance on M1/M2 Mac’s depending on how often they use these inline buffer updates. Most games perform them to some extent so improvements are global.
And the coup de grace, geometry shader emulation, was also finalized and merged right at the end of August. This puts a cap on all of the major milestones we wanted to hit and full parity with the initial macos1 release, though faster, better and more importantly, with cleaner code that doesn’t impact other OSs or hardware configurations.
The implementation of geometry shader emulation present today is a little different to how it was implemented last year. Instead of additional vertex draws, this is an implementation fully in compute shaders, which were chosen to support subgroup operations, something that was impossible when emulated via vertex shaders.
Well, there you have it. While the eagle-eyed among you may have noticed that we’re still missing a single item from our upstream list, we do not consider it to be of great importance in most cases. As such, if you check our website, macos1 has finally been retired and we highly recommend all macOS users to go and grab our latest, fully-featured and auto-updating release!
This does however mean that you’re gonna need to now share this progress report with everyone else. The days of entire sections being devoted to Steve Apple are now behind us; without further delay, let’s shift onward!
To start our descent toward the end of this report let’s blast through a few miscellaneous smaller changes in July and August:
- SDL2-CS updated to 2.28.1. Fixes issues with Xbox controllers randomly disconnecting and being unable to reconnect.
- Allow access to code memory for exefs mods. Allows mods to make use of JIT without crashing. Mainly impacts exlaunch mods.
- Fix invalid audio renderer buffer size when end offset < start offset. Fixes a crash in Disgaea 5 at the end of the “Dreaming Mushroom” mission in episode 4.
- Add Fmaxp & Fminp Scalar Inst.s, Fast & Slow Paths to CPU JIT. This change was also required to allow Jurassic World Evolution to boot.
- Implement GetWorkBufferSizeExEx and GetWorkBufferSizeForMultiStreamExEx for the opus decoder. Fixes a crash on boot in Sea of Stars.
The last update we gave on any GUI advancement was that we were patiently waiting on Avalonia, the framework we’ve built our new frontend with, to update to their next major milestone, 11.0. Well, that happened and brings with it a few excellent improvements.
- Performance no longer differs depending on window size. Previously a small window would perform better than a fullscreen window!
- General performance and responsiveness of the framework improved dramatically.
- The title bar will finally match your system color theme on Windows.
- Lots of misaligned elements on macOS were resolved.
- Flatpak compatible! This was the main issue prior to 11.0. If we’d have jumped early, Linux users would have been left behind.
So the question now, if that was merged in July is, what's the hold up now?
Well, there were still bugs in this endless game of whack-a-mole you play with software. macOS and Linux had an issue where, if the window was not focused, dialogs like the software keyboard would not spawn, resulting in a highly annoying softlock.
As expected, a seemingly pointless `isActive` check was being performed on the window before the program would provide any content to populate the dialog. Removing this was all that was needed.
To further improve performance, lots of the settings configuration states, mainly the function that queries the Vulkan device list, were made asynchronous to significantly reduce the time the settings window takes to open. Previously, the main thread was spending almost 60% of its time just to populate the GPU device drop-down, a task that does not need to block!
There is currently one more improvement to startup times in the pipeline that we want to get in before shipping this as the main frontend, as we really do not want there to be any downsides. If this seems like perfectionism, that’s because it really is. Stay tuned.
And finally, for those who are lucky enough to own a P3 compatible display, there is a new option for Vulkan to pass-through the color space selection to match your display, instead of forcing all content to sRGB. While you will technically be sacrificing color accuracy, if you like a wider gamut and a little more saturation, then give it a whirl. If any of you own an OLED Switch, it’s similar in principle to the “Vivid” mode that those models offer.
For those of us who do not follow us on Twitter, firstly go and do that, we recently previewed a lot more on the final patreon goal of texture replacement. Check out that tweet here if you haven’t already.
To summarize, it’s available to test and we’d really appreciate it if folks who have experience working in this area, either as an artist, modder or interested party, to give us feedback on its usability and if everything works as expected. This is one of those features that ultimately other people will be using, and it’s best to get in early while major changes can still be made!