Progress Report March 2023
We’re cruising our way into the second quarter of 2023 and finally got news on that new Zelda game everyone is so angsty about; did the presentation meet everyone’s expectations? All of us are incredibly excited to see what Nintendo has in store for this oddly familiar-looking adventure, and of course, to see what challenges it’ll bring Ryujinx. We hope you all had a great month, and that you appreciate this slightly shorter progress report than usual! But first, let’s take a look at our remaining patreon goals.
We’d like to reiterate once again that any features listed below will eventually be worked on, regardless of the goal being met. It would simply become a priority as soon as the incentive amount was sustained. This, of course, isn’t true for the full-time development goals which, by nature, are dependent on consistent backing. We view a goal as sustained if the amount remains above the threshold at the start of the following month; once stated it has been met in these progress reports.
$2000/month - Texture Packs / Replacement Capabilities - unfortunately dipped below this amount at the end of March but extremely close!
This will facilitate the replacement of in-game graphics textures which enables custom texture enhancements, alternate controller button graphics, and more.
$2500/month - One full-time developer - Not yet met.
This amount of monthly donations will allow the project's founder, gdkchan, to work full-time on developing Ryujinx. All our contributors currently only work on the project in their spare time!
$5000/month - Additional full-time developer - Not yet met.
This amount of monthly donations will allow an additional Ryujinx team developer to work full-time on the project.
What better way to start a GPU section than to present our coveted ‘GPU-vendor-specific bug of the month’ award. Taking first prize this month (for only the second time ever!), itttt’ssssss….. *drum roll*.... NVIDIA! Storming to a clean victory with a Ryujinx bug that was specific to RTX 3000 and 4000 series GPUs.
Paris isn’t usually known for its ominous, floating 2D-shapes and Mario Kart 8 Deluxe wasn’t the only title affected. Since driver 522.25, games like Xenoblade Chronicles 1/2/3 and Hyper Light drifter had been exhibiting random artifacting that took a very long time to track down. We’ve mentioned in the past that it isn’t so much a problem if a driver bug is consistent, but when it’s restricted to certain hardware the complexity to solve increases tenfold.
The cause was eventually narrowed down to newer Nvidia GPUs being able to start clearing render targets before the final image rasterization task has been completed. This can allow a texture to clear while it’s being sampled, producing the artifacts witnessed above. The solution is luckily extremely simple, inserting a barrier before the clear event, thus aligning RTX 3000 and onward cards with their older siblings.
The resolution scaler giveth, the resolution scaler taketh away. If anyone has been with us for a few months, you may remember a fix aimed at the Splatoon games which stopped the scaler from multiplying point totals and causing players to be unable to ink-swim at low enough resolutions (check out the September report for more info). Unfortunately, in order to fix these rather game-breaking bugs, a few titles such as WarioWare: Get It Together and Wreckfest now exhibited graphical bugs when rendering beyond native. Usually in the form of heavy flickering on character models or in overworlds.
! Flicker Warning !
By scaling values when they’re being added to the ReportCounter, instead of scaling the total count after the fact, weird overflows and large counter values are avoided. This eliminates the seemingly random flickering some games may have been exhibiting when scaled since September, and of course in the two mentioned above.
Sonic Frontiers was a bit of a problem child at launch, as while it did boot and technically worked, the experience was a little like tying your shoelaces via chopsticks. Or in one word, painful. The main cause of the extraordinarily long loading screens and the mediocre performance was the game's tendency to create enormous cubemap arrays with over 7000 faces (175 cubemaps * 6 faces * 7 levels for those interested).
Iterating over both the handles and existing views when adding a new one added up very fast to potentially 50 million iterations to add the final views. Since we only needed to add individual views at a time, we can instead add that view to the existing overlaps, rather than recalculate them all. This becomes a new generic “fast path” for adding a single texture view to a group and could improve other titles that exhibit this behavior.
Let’s talk performance, everyone loves a bit of that.
A focus in March was isolating cases where OpenGL was still vastly outperforming Vulkan. This usually indicates code-paths that the OpenGL driver of your GPU is optimizing automatically, whereas in Vulkan we’d need to do those manually in Ryujinx itself. We’ll start with some titles that really don’t look like they should be struggling, alas, they did.
Some games like LA-MULANA, a visually simple 2D-platformer, was sweating under Vulkan but running like a breeze under OpenGL. Previously, index inline buffer updates were being performed one index at a time, step by step, meaning that, for example, in the event of two 16-bit indices being uploaded, the actual work would be performed in multiple 8-byte chunks. An extremely inefficient process that the Nvidia OpenGL driver was working some magic around. Vulkan on the other hand, no such luck!
Changing this upload mechanism to allow batched uploads (up to 256 indices at a time), titles such as this no longer struggle. While this mainly helps Vulkan rendering performance, other vendors whose OpenGL drivers may not be as competent could see improvement when using OpenGL also.
There was a final elephant in Vulkan’s room which has taken a very long time to resolve. It was noted early into the public testing of the new backend that some games performed much worse and used a lot more GPU resources when compared to OpenGL. This wasn’t helped by the fact it seemed weirdly hardware specific. An Nvidia GPU paired with an Intel CPU wouldn’t exhibit these symptoms, but when you simply swap in an AMD CPU you did. Is the issue with AMD CPUs then? Well no, because if you swap the Nvidia GPU for an AMD GPU then the problem goes away again! The result being that there was clearly a weird situation with Nvidia GPUs being paired with AMD CPUs… Do they know? Do they repel like magnets?! No avenue was left unexplored.
The problem in this situation stemmed from how Ryujinx handled GPU buffer data. All data was owned by “Host Mapped” memory which belongs to your system RAM, not your graphics card's VRAM. This allows us to quickly access, upload and pull data to and from this memory without needing to go through the GPU. Unfortunately, we learned that this is very dependent on a number of factors like CPU, GPU, GPU driver and even down to PCI-E bandwidth. As such, this method of storing all buffer data in shared memory is inconsistent to say the least.
Certain games bind very large ranges as storage buffers which, depending on the factors above, could cause huge bandwidth constraints, skyrocket your GPU usage, and subsequently cause your desktop manager to become laggy and unstable.
The solution proposed is very much a balancing act. We can’t just store everything in VRAM as we’d lose all the aforementioned advantages such as quick access, but we clearly can’t store everything in shared memory either. A set of rules were therefore established to migrate buffers between different memory types in order to improve GPU performance and eliminate a bulk of cases where OpenGL still performed slightly better.
While the majority of these cases affect the fabled AMD CPU/Nvidia GPU combo mentioned, all vendors experienced the issue to some degree and all should see improvement. The numbers below were taken with a variety of AMD/Nvidia hardware combos, not just a 5600X/RTX3070, so your numbers may not match exactly; the percentage improvement is the star of the show here.
This is not an extensive list and many more titles saw major to moderate gains across hardware lineups, not just AMD CPU/Nvidia GPU setups. It's also hard to convey the stability improvements these changes bring. As we stated before, while 18fps isn't ideal, what's even less ideal is Ryujinx becoming so GPU intensive it starts to lag your desktop manager.
Subnautica is a fun one that’s omitted here (the bar would dwarf everything) but its title screen saw a minor 2000% performance increase. With these changes in place though, we aren’t expecting many more titles to perform wildly better/different in OpenGL, so if you previously tried a very slow game in Vulkan and had to switch backends, give it another go!
Last month there was mention of Metroid Prime Remastered and its notoriously stuttery doors. The largest cause of these frametime spikes is due to a very large (40mb) texture being created when each new zone loads. So surely, the solution would be to try and update the current texture rather than recreate it? Spot on. If you guessed that at home then you too could one day be an emulator developer, or someone who writes about them…
To close out the GPU section, we fixed a small omission that was causing a device query to break in Vulkan, resulting in Ryujinx not knowing it could force some AMD GPUs (RDNA and later) to use a subgroup size of 32 rather than the default 64. This could be seen in some flickering corruption in titles such as Shin Megami Tensei V and Crisis Core.
Our bad! The issue above is still present on Radeon GPUs older than RDNA1 (RX 5000), but it should be resolved on anything beyond that supports variable subgroup sizes.
Shortly after Intel quietly dropped support for AVX-512, a new instruction set that operates on 512-bits rather than the 256-bits of AVX2, AMD announced that its newest Zen4 CPUs would offer support instead. As such, there has been a fair bit of buzz around offering optimizations of the 512-bit variety to CPUs that support it currently and in the future. While we aren’t confident that Switch emulation will ever be able to take as much advantage of these instructions as something like RPCS3, some preliminary work was carried out by external contributor Wunkolo to accelerate the `mvn`, `orn` and `not` opcodes. While there are currently no tangible performance gains we can show for this, the implementations of the opcodes are technically faster on AVX-512 compatible chips, and as further instructions are used, these small optimizations may add up.
A funny side effect of merging these changes was that we discovered that some CPUs being used with Ryujinx are so old, they didn’t support the hardware flag being used to check if certain instructions were supported or not! While we really do not recommend running a Switch emulator on 2008 server CPUs, this issue is now also fixed.
Onto some service shenanigans, `CreateServerInterface` in the Shop access services would pass some transfer memory and then never close its handle. If the service was called a second time, it would fail and cause a crash in any title that did this. Preventing this fixes crashes in SD Shin Kamen Rider Ranbu and, as usual, any other titles that could have exhibited this behavior.
As for this month's miscellaneous round-up of changes:
- Program memory allocations were reduced. Results in a 44% reduction in allocation events, a 25% reduction in garbage collection time and a 32% reduction in program pauses due to memory allocation. Very minor performance gains could be seen in allocation intensive scenarios.
- Syscall capabilities have been updated to include syscalls added in firmware version 15.0.0.
- LibHac (library used for the filesystem services) was updated to 0.18.0. Adds support for personalized ticket title keys given the correct console key dump.
- A hang when shutting down the application, causing a ghost Ryujinx process to stick around, was resolved on Linux.
- The DLC manager in the WIP Avalonia GUI was given a refactor to more closely align it with ‘Fluent’ design principles and with a highly requested “Enable all” button, for those who were scammed into buying all the Smash Bros DLC + Costumes.
News on our shift to Avalonia for the frontend has been slower recently, as we are waiting on their team to finalize the next 11.0 release. This is required for us to continue to package Ryujinx on FlatHub for our Linux and Steam Deck users, so jumping the gun early wouldn't be ideal.
That’s all from us this month folks. We’re fast approaching the business end of the gaming year and we’re doing all we can think of to make this boat water-tight before the storm. As usual, we’d like to give a huge thanks to everyone who supports the work we do financially on Patreon, with technical expertise on GitHub and just by helping fellow users or just being active in our Discord. You keep this ship on-course!
Tears of the Kingdom. There you go, for anyone that Ctrl-F'd.
See you all next month ;)