Had your fill of candy, sweets, or whatever your localized version of ‘edible items that are actually quite bad for you but they taste great’? We hope so, because the end of a month signals not only the start of the next, but also another progress report from yours truly.
In what was quite possibly the most spooky game launch of all time, Super Mario Wonder graced both our Switches and our PCs on exactly the same date… Curious how that keeps happening.
Either way let’s fill our October goody bags with just a few more sweet treats!
The first item on the agenda starts with an adequately thematic title in the form of Luigi’s Mansion 3. As one of the visual masterclasses of the Switch, LM3 loves to be a little jank in a lot of places: rendering the lobby interlaced and using an extremely aggressive dynamic resolution mode being just a couple of the annoyances we’ve had to contend with over the years. We spoke last month about a couple of fixes for AMD GPUs in regard to certain objects and shadows, but there was still a single case where everyone could very easily notice that Luigi was not having a good time.
We can save him some dignity here and confirm that he is in excellent control of his bladder. The issue lies in the texture formats. LM3 renders this shadow as an R16Unorm (color format), but then proceeds to sample it as a D16Unorm (depth format). Adding support for a copy dependency between these formats restores the shadow to its appropriate shape and longevity.
While checking Cocoon, we isolated further shader instructions on the GPU that we did not yet support. While this usually isn’t too big of a deal (just a warning in the console log), this particular instruction was actively generating invalid code and not just failing gracefully. Implementing proper support for querying the amount of samples on a multisampled texture kills both birds with one stone. Shader instruction implemented, garbage output gone!
Coming off the back of such a simple name like Cocoon, our next target is the even more concise: Neptunia GameMaker R:Evolution.
Much like flushing a toilet, GPUs need a method to remove data from their memory (VRAM) after the game is no longer using it. One of the core rules around flushing textures however, is that you should not attempt to do it once the texture itself has become unmapped. This is because once unmapped, the GPU has no idea what is now actually stored at its memory location, the CPU could have put something else there already. If anyone was guessing at what could happen if the GPU attempted to flush data that wasn’t actually what it thought it was… well that’s how you get data corruption!
In the case of our GameMaker friend above, this is precisely the scenario occurring. Unfortunately for this game, the resulting data corruption simply resulted in a hard crash rather than any pretty digital art. Skipping these invalid flushes allows the title to proceed in fine fashion.
AMD served us a fresh, and steaming, dish of frustration with the release of their 23.2.x line of Radeon drivers. Imagine the horror, waking up, updating your GPU drivers, then…
In our SPIR-V backend, we previously attempted to re-use function parameters across multiple calls to the same function. For whatever reason, this now completely baffles the AMD SPIR-V compiler, resulting in the abstract line art you can view above.
Giving each function its own set of temporary values seems to resolve the issues.
NativeAOT is a rather new-ish feature of .NET which we’ve mentioned a couple of times before in these reports. It effectively tries to bridge the gap between a fully JITed runtime like usual C# or Java, and a fully compiled language such as C/C++. Compiling C# directly to machine code Ahead of Time (AOT) has some excellent benefits in terms of boot times and portability to platforms where you may not have access to the full .NET runtime and JIT.
You do lose some features of .NET when doing this though, namely the ability to generate code during program runtime. Ryujinx currently uses a JIT for some Switch GPU macros which tries to directly emit .NET IL (Intermediate language) to avoid going via a slower interpreter route. Simply enabling NativeAOT and thus forcing the interpreter reduces performance in Super Mario Odyssey from 90FPS down to 75FPS, a huge 17% dip.
The solution to this is to implement HLE macro’s that attempt to match the lower level NVN macro directly, instead of leaving it up to the emitted IL. Under NAoT this brings the performance of SMO backup to 85FPS. Still short of the 90 mentioned prior, but that 5FPS gap is made up by additional factors unrelated to these NVN macros.
Onto other news, a fallback has been added for GPU drivers that do not support the OpenGL equivalent of `textureGatherOffsets`. MoltenVK technically reports that it does support such an extension, but sometime between the initial macos1 release and today, an update to SPIRV-cross (a library MVK uses to convert Vulkan shaders to Metal shaders) has made it attempt use said feature… which crashes the metal compiler because metal doesn’t support it!
This fallback once again allows Xenoblade Chronicles: Definitive Edition to render (although there are still an array of other issues, especially on newer versions of MoltenVK).
Sifu, a roguelike with a very short name, has been a frustration to play on Ryujinx since its release many months ago. While visually and mechanically sound, we’ve heard that time-dependent random crashes are not many users' favorite issue to deal with.
The cause was due to the game problematically calling a function which was replacing buffers in the surface flinger (this is basically the service that makes one big buffer of data from lots of smaller buffers). While the eventual result of a large chain reaction of issues was a memory unmap crash, the root was a small issue in a single basic counter not being decremented properly. By making sure this counter is decremented if these problematic cases are hit, Sifu players no longer need to hold their breath.
Moving away from all that graphical stuff, let’s talk multiplayer!
For those unaware while one upstreaming effort has been completed in the macOS changes, another was also started. Getting all the LDN functionality we’ve been working on over the last 3 years into our main releases has finally become a focus now that more time has opened up to work on the cleanup and reverse-engineering aspect of the service.
Last month we added support for the actual service implementation with the caveat that while a lot of the ‘Switch’-side stuff is now in place, we still need some way to make it useful. This is currently done in two ways in our LDN builds:
- Custom, over the internet, implementation over our own servers called “RyuLDN”. This allows Ryujinx users to connect to each other from around the world, but has the downsides of being limited to Ryujinx users only.
- Ldn_mitm is an alternative that transforms the functionality of any game that has LDN functionality, into one which has LAN functionality. This means that any real Switch with the ldn_mitm sysmodule can connect to any other equivalent Switch, and additionally to Ryujinx. The downside is that for this method to work, all systems must be on the same network, whether real or virtual.
As the second approach is ultimately simpler for the time being, it is the first to become available in our main releases.
A change which many users on integrated and lower power systems (or anyone who stares at their power usage instead of playing their games!) may enjoy, was an adjustment to how we signal for a session to be added to a given ServerBase. In the past, we we’re simply polling for 1ms which resulted in a fair amount of ‘fake’ CPU usage. While it was fake in that the thread would yield if any other task required it, what wasn’t fake was that it forced the thread into constant real use, inflating its presence when profiling, and causing large influences on power consumption.
By adjusting this logic to signal an event on session addition instead of polling, general CPU usage (especially when emulation is paused) sees considerable reductions. However, the real gains can be seen on battery-powered devices such as the Steam Deck. Mario Kart 8 Deluxe, while simply sitting on the character select screen, has its power consumption cut from 15.8W, all the way to 9.1W.
We’d like to reiterate that we do not expect this to have much influence on raw performance (as mentioned above the usage hog wasn’t ‘real’), but equal performance at a 42% reduction in wattage is certainly a win in our book. This won’t be the case in every game, but there are still further improvements to how we perform submillisecond waits that can be made in future.
As for some other quality of life features that were added this month, the long requested ability to add game shortcuts to your desktop via a simple click finally materialized. Right-click any title and look to the very bottom of the context menu, from there the desired game will have a shortcut of itself (complete with game icon) created on the current active desktop.
Secondly, aspect ratio can now be changed from the bottom bar if you use our Avalonia frontend. This allows hot swapping between them if, for whatever reason, 16:9 just isn’t cutting it mid-session, and you just need the game to be 4:3.
Last but certainly not least, the library we use to emulate most of the file system services - LibHac, was updated to version 0.19.0 and continues the trend of open-source developers hating naming anything version 1.x.x!
New version, new stuff. `IFileSystem.GetFileSystemAttribute’, a new filesystem service added in firmware 16.0.0 is now fully supported allowing newer titles such as Tiny Thor, Cassette Beats and DeepOne to head in-game.
That last one clearly still needs a bit of work!
If you make it this far in these reports then you deserve a gold star, or a cookie, or something that more generally triggers dopamine.
We’d like to once again thank everyone who supports us every month on Patreon, contributes code to us on GitHub and those who help other users out with troubleshooting and bug reporting in our Discord! We couldn’t do it without you and with that, we hope to see you again next month.