Progress Report April 2023
Hello folks, it’s your favorite time of the month once again. No need to say anything, we know it’s true!
Plenty happened in April which you’ll soon discover below, but before that let’s take a look through the current patreon goals. Or that's what we usually say, except we're changing it up a little from this month. Patreon has informed us that they're sunsetting their "goals" feature from May 16th. While we could keep track of these separately, part of why we chose this route in the first place was the ease of which our supporters could track the progress toward each goal in real time.
We are currently exploring new ways to drive interest and reward our supporters for the contributions they have made, and continue to make, so stay tuned for any updates regarding our Patreon benefits. If you aren't a current patron, let us know if there is anything in particular that would make the package more enticing. All the goal features that have been previously met will be finished and delivered, this also includes "Texture Replacement", which we should be able to preview very soon!
All aboard...
April's journey begins with The Legend of Zelda: Breath of the Wild. It really couldn’t be anything else could it? Throughout April, people were suddenly very interested in the stability, fidelity and performance of the game, and not without cause.
For Nvidia GPU owners, fidelity and graphical accuracy has never seriously been an issue for years now, but this was not the case for AMD and (a growing number of) Intel users. Grass shadows would have major artifacting around them, and this effect could even be seen on character models and other objects. Given they were in shade.
The issue here is how different drivers are tie-breaking when selecting texels when exactly halfway between two options. Nvidia, Apple and Mesa will break the tie correctly while AMD/Intel go the opposite direction. By applying the smallest positive bias possible, we can force these drivers to choose correctly.
Performance-wise the quirks here are numerous and will span into May. One of the main performance bottlenecks for Breath of the Wild was its incredibly long render passes with large numbers of draws in each. This meant that the backend could potentially end up building a single command over 4-5ms (a very large number when a frame can be as short as 16ms). This is worsened by BotW's extremely aggressive GPU synchronization requirements, meaning that the game is forced to wait for the completion of each large command. Reducing the size of these command buffers would therefore reduce the impact of two net debits to performance.
By implementing a so-called “fast-flush” mode to the Vulkan backend, we now force command submission periodically if the game is syncing aggressively enough. We saw improvements of up to 11% in BotW and some other GPU-limited situations, such as when resolution scaling is used in Pokémon Scarlet/Violet.
Leaving Zelda alone for now, we’ve mentioned Fate/EXTELLA a fair amount in the past and this month is again no exception. It seems to have an odd knack for highlighting some rather niche gaps in the GPU emulator, so we hope you aren’t bored with its continued cameos! It turns out we’ve been missing a single case of multisample <-> non-multisample depth conversion to complete the set, ultimately causing certain textures to simply not render. By resolving this final conversion case we hope (!) to finally put this game to bed.
Now comes a new recurring segment of these blog posts: our coveted ‘GPU-vendor-specific bug of the month’ award. Snatching the prize out of last month's winner Nvidia it’s…….. AMD! Now the keen readers out there may be asking “why isn’t it a tie with Intel over that whole grass thing?”. It was tough let us tell you, the panel debated long and hard on this verdict, but it was ultimately decided by a complete and catastrophic breakdown of Pokémon Legends Arceus that just edged AMD into the lead.
Starting in drivers 23.x.x, we had hoped that this would be quickly resolved in a couple of driver patches. Word on the grapevine told us other programs were exhibiting driver bugs with these versions and thus, we waited. One, two versions passed us by and still no change. Fine, we’ll do it ourselves.
They broke transform feedback… AGAIN! We’ve already had to change the implementation twice but three times is, hopefully, the charm.
Red vs Blue, a tale as old as time. Some guest OpenGL games on Switch make use of a particular functionality in the GPU DMA engine that was causing some interesting color swaps. The function itself is more or less a simple shuffle, which is used to re-order things like pixel components in a texture. The Switch OpenGL driver uses this to perform BGRA (Blue, Green, Red Alpha) to RGBA (Red, Green, Blue, Alpha) data conversions. As expected, not implementing this results in this swap never occurring. In some cases it can seem like nothing is wrong, but if you’re familiar with how a game is meant to look it becomes more obvious.
You would be forgiven at first glance for thinking this is simply a time of day difference. It isn’t.
To put a bow on the GPU section, let’s first talk about mistakes and how they happen. Everyone is human and everyone is prone to making small mistakes with fairly enormous consequences. With that said, how about we discuss frame-pacing in Ryujinx.
Frames are meant to be rendered and then passed to the backend queue as ‘ready to go’. From here, any number of presentation methods can be used to display them in motion, and a lot of the details can be handled by your GPU driver and the backend itself. Ideally at any given framerate, all of the frames would be ready to present at an equidistant time interval to produce a smooth experience. We’ve known that this hasn’t been the case for a few years now and have been bombarded with spiky graphs throughout that period.
Users of VRR-capable displays making use of G-SYNC/FreeSync were obviously less affected by this and we always assumed it must just be a limitation on the backend. Vulkan, for all its strengths, does not have any universally adopted way to query the display timing from your monitor without platform-specific workarounds, like a DirectX interop layer on Windows, which wouldn’t help us much on Linux/macOS.
While all of the above is true, it didn’t account for a single missing line in the GPU engine code. We originally designed the system to wait for up to 8ms on commands, as a failsafe, but with a separate interrupt event that would cause the frame to be released as soon as it was ready. Someone, who for their dignity shall not be named, forgot to signal this interrupt event, and as such was effectively adding up to 8ms of error in every single wait event. This is very easy to see in the above graph, as the frame-time deviation was never more than +/- 8ms, but the crippling point was its fluctuating nature. What happens if the code written actually works how it was designed to work…
There are still a few moments where host:guest vsync deviates slightly, but these are much rarer. Whenever Vulkan standardizes a way to query display timing, as mentioned above, this should improve even further.
MacOS upstreaming:
A few people asked us where this section was last month and it ultimately falls down to if anything was actually finished in a given month. Everything we detail in these progress reports are things available right now, and if a larger change is needed that takes say two months, then it would create a gap and it’s exactly what happened in March!
In April on the other hand, gdkchan finished a complete refactor of attribute handling on the shader generator, which came in at just under 2000 lines and should resolve a significant amount of shader compilation failures under MoltenVK. Tessellation is almost non-functional in the macos1 build, and in addition to simple upstream work, we’re also trying to clean up a lot of the more raw implementations of certain processes before they’re made available.
As a result of this work, tessellation is working correctly in games that make use of it such as The Legend of Heroes: Trails from Zero, which uses tessellation shaders to render entirely.
Other affected titles include The Witcher 3: Wild Hunt and Luigi's Mansion 3 (specifically the sand textures) in later levels.
A smaller fix to dual source blending was also made, which should resolve a crash in certain games such as Metroid Prime Remastered under MoltenVK.
As a result of both changes, lots more games should end up being playable at the next release! Unfortunately, we don’t have any timeline on when that will be possible due to a number of changes made since November breaking a lot of the macOS specific workarounds, like mirrors and geometry shader emulation. Given the time of year, the upcoming release schedule and a priority list as long as all our arms combined, it’s impossible to give an ETA. We're trying to schedule time to rebase all of those changes, but at the moment we can only apologize on this front and hope that when the inevitable `macos2` releases, it will be a sizable upgrade.
While strictly a May change, for those who wish to jump the gun, universal macOS builds are now part of our CI and are available to download (with an updater) from our Github Releases. As mentioned, a lot of the performance and GPU workarounds are not yet upstreamed, so try these out on a game-by-game basis and at your own risk!
Moving onto some CPU-related changes and back to everyone's favorite topic, Breath of the Wild; the final “random” crash cause was resolved in April, which was a great milestone for us on the stability front. The only prior information we had on this specific crash was that it happened sometimes near Lynels, maybe in the rain, or maybe on hills, or something. Not a great start on debugging.
Thankfully, a discord user discovered that there was a specific shrine puzzle that always crashed on certain physics interactions.
With this information, it didn’t take long to track the bug down to the CPU recompiler and how it was handling the FZ/RM flags for floating point operations. While looking for this bug, an extra small optimization to TPIDR_EL0 and TPIDRRO_EL0 registers was made, as games like BotW and Scarlet/Violet access them thousands of times per second. This did appear in the CPU profile but is unlikely to show any significant performance improvement.
Some homebrew applications such as Borealis also required us to implement the remaining ARM64 HINT instructions. These are reserved instructions used on future CPUs and simply execute as nothing on older ARM processors like those found in the Tegra X1. These are normally used for fairly mundane tasks, like pointer authentication, and as such aren’t useful outside of homebrew.
To start off the usual "misc" change section, we’d like to give a huge shout-out to contributor jhorv, who is currently on a warpath of memory usage reduction across Ryujinx. In April alone there were not one, but two different changes made that together can reduce the size of the small/large object heaps by up to 20%, reducing total garbage collection time by nearly 10%. Check the handy table below for anyone who wants to see some large numbers.
You should see more of this work in the coming months, and while it isn’t as flashy as a game fix or a huge performance boost, it’s appreciated all the same.
For those who make use of gyro motion controls on Sony or third-party Nintendo controllers, you may have noticed that when held stationary for a period, Ryujinx used to forcibly re-center the axes constantly. This was causing lots of problems in games like Splatoon, where accurate aim is vital for success.
Removing this reset functionality entirely seemed like the best solution here as on closer inspection, it was simply setting the motion filter to 1 periodically. The filter would then return to exactly where it was before this reset, and then simply reset again.
To finish out this report, we’ll do a quick-fire round of the smaller quality of life changes made in April:
- Game types (XCI, NSP, NCA, NRO) can now be hidden or shown in the games list via a dropdown toggle. This is useful if you keep everything in a single folder.
- The network interface selector has been backported from the LDN builds. This means you can now actively choose which network to use for guest internet and LAN connections, rather than Ryujinx choosing the default.
- The main window will now remember its size and position on shutdown, and restart there on start-up. This includes if the window was maximized.
- Another cause of deadlocking on “stop emulation” was resolved. We will get to the bottom of this eventually…
Closing words
April? Completed it mate.
One third of 2023 is already over and Ryujinx has never been in a better state, if you ask for our totally unbiased opinion. It’s all made possible by the incredible support that our community shows through donations on our Patreon, contributing code to our GitHub repo, or simply helping other users out on our Discord. All of it means that our development team can spend more time fixing games and making Ryujinx a better and more versatile program.
As always, if you’re proficient in C# (or really any C-based language), interested in emulation/modern 3D-graphics, want to improve any aspect of the program down to fixing typos, or simply need a large project to stat-pad your GitHub for that upcoming job interview, we’re always on the lookout for folk who can bring something new to the table. While our core team can work some miracles, the lifeblood of open-source software has always been people finding something annoying, and fixing it.
We look forward to May, and whatever it may bring.