Progress Report September 2022
September's report? You’re trying to tell me we just hit the 75% mark on the year? Madness.
This month marked not only the turn from summer to autumn and some major world events but, most importantly of course, the launch of Splatoon 3. With the holiday season fast approaching, that means game releases, game releases and…you guessed it, game releases. The characteristic eye twitches that upcoming Pokémon games always bring to our development team are just taking root and Bayonetta hopefully won’t step on us at launch!
Before we dive into Ryujinx’s journey through September 2022, let’s take a moment to review our patreon goals and incentives. As a reminder, these features are not locked behind a paywall; all features mentioned below will be implemented eventually regardless. However, if a goal is reached, then priority is shifted to focus on implementing that feature straight away.
ARB Shaders - Goal reached in April 2021.
Work is ongoing, please wait a little while longer until we are able to deliver this update into a state we are happy with.
ARB shaders will further reduce stuttering on the first run by improving the shader compilation speed on NVIDIA GPUs using the OpenGL API.
$2000/month - Texture Packs / Replacement Capabilities - getting close!
This will facilitate the replacement of in-game graphics textures which enables custom texture enhancements, alternate controller button graphics, and more.
ETA once the goal is sustained: ~3-4 weeks
$2500/month - One full-time developer - Not yet met.
This amount of monthly donations will allow the project's founder, gdkchan, to work full-time on developing Ryujinx. All our contributors currently only work on the project in their spare time!
$5000/month - Additional full-time developer - Not yet met.
This amount of monthly donations will allow an additional Ryujinx team developer to work full-time on the project.
It will come as no surprise that the largest GPU change this month concerns three items that, combined together, are relatively new to the Ryujinx spotlight: Vulkan, AMD GPUs and Splatoon 3. On release, many were pleased with how well the game rendered and played, but this was a glory that only NVIDIA GPUs could attain. Vulkan expects certain vertex attributes to be ordered in a specific way, and if games pass misaligned elements then this can cause a bit of a chaotic chain reaction to future vertices. By adding a method to change the stride of vertex buffers before they are bound, we can avoid this issue and keep the Vulkan spec happy. Good news for the Team Red users.
Not all issues need to be game-breaking in order to be notable, though, and Splatoon 3 brought us a rather interesting if not amusing bug this month. Some LDN users quickly noticed how the scoring system seemed to be awarding certain teams with some frankly outrageous scores.
It turned out that due to the unique way points are scored in Splatoon (how much map coverage you have), resolution scaling values above native were actually causing the game to believe there were more pixels covered in paint than there were. This quickly turned into a GPU arms-race on the LDN server to see who could get the all-time world-record before a “fix” dropped. Funnier still was scaling in the opposite direction, to resolutions lower than native, which made it impossible to actually swim in any ink, thus completely breaking a large portion of gameplay for many users. By scaling the SamplesPassed counter in accordance with the resolution, both of these quirks were resolved!
A couple of regressions were isolated in September, the first of which was resolving a bug in Fate/EXTELLA where all backgrounds had stopped rendering and simply presented a black void. Having narrowed the problem down to a broken blit between texture types, the issue has since been resolved in OpenGL with a separate fix for Vulkan to follow somewhat soon.
Mario Party Superstars also suffered a blow recently, with rendering of the Spotlight Swim mini-game taking a hit. The red spotlight specifically seemed to be getting its surface illumination cut in half. This was a real head-scratcher as it didn’t affect the other two spotlights in the game at all. After some digging it was traced back to a previous optimization to shader specialization, and a quick fix to rebind textures if their format changed was added.
Some titles seem to ‘prefer’ one host graphics API or another for rendering accuracy, performance or just legacy hardware compatibility. On the performance front there were a few games that ran a fair bit worse on Vulkan than on OpenGL. While we will continue to reiterate that Vulkan is not a silver bullet that fixes every problem under the sun and isn’t necessarily intrinsically faster than OpenGL, the dip in performance for these games was large enough to be considered abnormal. As with a lot of the most frequently reported issues, it all starts with Pokémon: specifically Sword and Shield, which incurred a 20% performance hit by using Vulkan prior to recent changes. This was heavy for AMD and Intel users who may have been hovering 20% away from full speed, but without the luxury of a performant OpenGL driver.
So what’s causing this huge delta in performance and how do we fix it? It actually all starts at the Nvidia OpenGL driver because, as has been proved many times over emulation history, it’s pretty smart. The Nvidia OpenGL driver has a built-in mechanism to flush commands directly to the GPU when the queue becomes large which, while inconsistent, works really well in games where these flushes happen often. Vulkan, as usual, gets no such special treatment from the driver so the solution chosen here was to periodically flush the commands manually to reduce GPU<->CPU latency and make the time we spend waiting on the flush smaller and more consistent.
Above is a quick table of a couple of the games that benefit from the flush changes. Sword/Shield benefit the most, but even Breath of the Wild is happy to breathe a little easier. The bottleneck in Sword/Shield was the time spent waiting on the GPU from the main GPU thread; Breath of the Wild, meanwhile, sees a reduction in time spent waiting on the GPU from other guest threads.
Remaining on the topic of Vulkan improvements: the R4G4B4A4 format had some components out of order and was causing all sorts of mischief with backgrounds and text boxes. Correcting this ordering manages the mischief in titles such as Ys VIII: Lacrimosa of Dana and Vroom in the night sky.
Let’s keep the Vulkan train going with some quickfire changes:
- The blend state is now zeroed if blend is disabled. This reduces pipeline recreation stuttering on AMD and Intel GPUs. The Nvidia driver was already very forgiving on pipeline misses in this scenario.
- Quads are now converted to triangles on Vulkan. As Vulkan has no native host quad support, our previous method of queuing one draw per quad was much less efficient than allowing Vulkan to render what it’s good at. Triangles!
- ViewportIndex is no longer output on SPIR-V if the host GPU does not support it. This allows older GPUs that may not conform to the latest Vulkan specification to play some titles that would previously crash on boot including Super Smash Bros. Ultimate.
Onto a more visible change, tessellation had a few notable problems even after the year-long testing phase of Vulkan. However, due to the recent release of The Legend of Heroes: Trails from Zero, the topic was brought back fresh into everyone’s minds and more specifically back into our Discord channels. As you can see, this probably wasn’t the immersive gameplay experience developer Nihon Falcom intended.
However it was indeed exactly what we suspected: tessellation struck again. Fortunately, by fixing a whole bunch of wrong assumptions and other SPIR-V related mis-steps, tessellation issues in a few games have been ironed out and they should be rendering accurately now.
As the Ys games seem to be popular in this report, why don’t we throw in another one? Ys VIII: Lacrimosa of DANA was a bit of a disappointment as users were presented with a wide assortment of rendering quirks. Sometimes it worked, sometimes it didn’t and sometimes it just rendered textboxes. Very annoying. Thankfully the game was so consistently broken, even in small ways, that reproducing the bugs and thereby finding the cause wasn’t as painful as other ‘random’ problems. By transforming shader LDC into constant buffer access in certain scenarios, we can allow bindless elimination to activate in this case.
September also brought the fixes to a crash in early intro cutscenes in Sniper Elite 3 by allowing the use of bindless textures with handles from unbound buffers. If that's a lot of words then allow me to simplify: game does weird thing = game crash, game still does weird thing = now game no crash. Somewhere in between those extremes we’re confident everyone is covered. The internal vsync signal (no, not the screen tearing one you’re thinking of) was also changed in September to signal at precisely 16.667ms instead of just using Ryujinx’s swap interval. This fixes an issue in Tokyo Mirage Sessions #FE Encore where audio would slowly desync in cutscenes as the vsync timing slowly drifted away from the audio channel.
Capping out September's GPU section will lead us neatly into the CPU section as this final change works in tandem, with other things we’ll discuss later, to fix rendering in a couple of 32-bit titles. 1D and buffer textures use the exact same texture instructions on the shader so we need to get the actual texture directly from the GPU state and this was getting messy for Prinny: Can I Really Be the Hero and Prinny 2: Dawn of Operation Panties Dood (please never make me type this again). By resolving the scenario where 1D textures were assumed to be buffers, these games can start to render correctly.
Not quite right still though is it? Let’s solve that.
Not all graphical bugs are related to the GPU emulation, and this month saw huge progress for Ryujinx’s CPU emulation. As mentioned above, we’re going to start with a change that, in combination with the latter GPU fix, resolved many rendering issues in 32-bit titles.
Due to an oversight in the original CPU tests for VLDn and VSTn, these instructions were not actually being accurately tested in all their modes. Fixing this revealed several failure points caused by an incorrect register value, in turn causing other values to be pulled from or sent to incorrect register locations. Addressing this incorrect register increment value fixes such a variety of 32-bit bugs it would require a whole list unto itself.
The two Prinny games, being the anchor here, were of course fixed by this change:
But as with all of the best changes, it affects a whole lot more. The following titles now render or have some major graphical bugs resolved:
No More Heroes:
No More Heroes 2 Desperate Struggle:
Charlie’s antenna and a range of other graphical effects now render correctly in Pikmin 3: Deluxe.
This change also resolves abysmally poor audio quality in: Ni no Kuni, Double Dragon Neon and Sky Gamblers: Storm Raiders. EARPHONE ALERT.
Would it be a progress report CPU section if we didn’t list every new instruction ARMeilleure can now process? Executive decision: no, it wouldn’t.
If you haven’t guessed yet, the past few months have seen a 32-bit focus as it was by far the weakest area of our recompiler, due to the majority of Switch titles being natively 64-bit. However, as with all things Nintendo Switch, if you give developers the option to do weird stuff, they will do weird stuff. Quite a few Switch titles (usually ports of some kind) therefore opt for the 32-bit option, and can cause us headaches if the instructions they need are not accommodated in the recompiler.
Alright, so what’s new and what does it do?
- VRSRA, VRSHRN, VQSHRUN, VQMOVN, VQMOVUN, VQADD, VQSUB, VRHADD, VPADDL, VSUBL, VQDMULH and VMLAL Arm32 NEON were implemented and allow Dies irae -Amantes amentes- for Nintendo Switch, Baldur’s Gate, Icewind Dale and Star Wars: Republic Commando to all head in-game.
- ADD (zx imm12), NOP, MOV (rs), LDA, TBB, TBH, MOV (zx imm16) and CLZ thumb instructions were implemented and allow the Vita2HOS homebrew to function again on its newest versions.
- Thumb (32-bit) memory (ordered), multiply, extension and bitfield instructions were implemented and allowed a few Vita applications to progress a bit further under Vita2HOS.
- T32 Vfp instructions were implemented and allowed some Vita homebrew to begin rendering under Vita2HOS.
- VRINT (vector) Arm32 NEON instructions were implemented which allow Ni No Kuni Wrath of the White Witch to head in-game if you provide a save file (Web applet is required otherwise).
- T32 Asimd instructions were implemented which allow Vita homebrew such as a CHIP-8 emulator to boot and render. This one is actually insanely cool as the resulting scenario is essentially a PC emulating a Nintendo Switch, which is running a homebrew translation layer, which is running a PS Vita CHIP-8 emulator, emulating Breakout!
- PLD and SUB (imm16) on T32, plus UADD8, SADD8, USUB8 and SSUB8 on both A32 and T32 instructions were implemented and once more allow for more general functions of Vita2HOS to function, although there is just a chance some games may use them.
- A32/T32/A64 Hint instructions (CSDB, SEV, SEVL, WFE, WFI, YIELD) were implemented as Nops (do nothings) to avoid unintended behavior and crashes in games such as Meiji Katsugeki Haikara Ryuuseigumi - Seibai Shimaseu, Yonaoshi Kagyou (bit of a mouthful).
Wow. Lots of work being put in by a few different people to knock out so many new instructions in such a short period of time, not even taking away from progress in other areas as we’ve already covered the extensive changes GPU emulation received. We’re now in a much better spot in regard to 32-bit titles, homebrew and Switch -> PS Vita translation layers! That final one may seem niche, but projects like Vita2HOS really do capture the imagination.
After nearly a year in purgatory a cleanup of the rejit queue was merged, which saw the maintenance of that section of the codebase becoming easier and heralded the prodigal return of one of Ryujinx’s original CPU developers, who brought some excellent progress if you like your in-game videos to play at full speed. LDj3SNuD’s first port of call in September was implementing some managed methods of both the Saturating and ShlReg region of the SoftFallback class. You don’t need to know what any of that means but the effect is quite transformative in video playback.
Astral Chain collaterally has its performance massively improved in certain areas of the game as pre-recorded videos coexist with normal gameplay rendering in places like the HQ lobby. We saw the AC intro improve from 23FPS -> 100FPS on a i7-10700K and the lobby take a jump from 30FPS -> 43FPS.
Not content there, some additional changes to the handling and isolation of Fpsr/Fpcr instructions further improved playback of full-motion video. The improvements are most apparent in titles like TONY HAWK'S™ PRO SKATER™ 1 + 2 whose intro is an extremely demanding FMV on lower end hardware.
There are relatively few things to say about GUI development this month but one of them is major enough to deserve this section. If you’ve followed these progress reports for a while then you’ll be aware that we are trying to switch from GTK3 (via C# bindings) to a native C# UI framework by the name of Avalonia. It’s been a while since this journey began and at times it has felt like two steps forward, one step in a random direction. One of these was in the so-called ‘render window’; this is the section of the GUI that contains the OpenGL or Vulkan renderer and actually presents to you the game, app or homebrew being run.
The area highlighted in red has caused more than one headache and has already seen several revisions over this year. GTK currently handles this area as an embedded window which means that there is actually a second ‘child’ window simply being embedded directly into another separate window which houses the rest of the GUI. This allows full granular control over the rendering and means that the render window isn’t directly tied to the same update cycle as your GUI, a good thing for tasks like resizing and dragging the window around.
The first iteration of our Avalonia implementation did this also. But we soon noticed some strange oddities on Windows, including the ‘child’ render window having a separate focus to the GUI ‘parent’ window. For example, if you clicked on the game it would deselect the main window and break stuff like hotkeys and focus-specific actions. Not ideal. So, other options were explored over the course of 2021 and 2022, concluding with an implementation where the render window was instead a render layer being displayed as part of the main window. This seemed like the solution, as it resolved the focus issues and allowed the GUI and game to have full sync with each other in key areas like hotkeys and keyboard navigation. This, however, came at some significant costs.
- Because the rendering was now part of the parent window, that meant the entire GUI had to be hardware rendered. This made it impossible to switch GPU or graphics backend without restarting the entire application.
- Due to the above limitation, there was now no distinction between game and GUI. This wreaked havoc for overlays, recording software and other benchmarking tools that hook into graphics APIs. Most would provide a recording or performance statistics of the GUI itself rather than the game as there was no way to tell the difference.
3. As the UI was now being rendered by the Avalonia layer we had effectively lost some of the control over the core rendering and presentation process. Frame pacing took a significant hit in a lot of titles and many users have been understandably concerned about the new Avalonia UI before this was resolved.
There has been much back & forth on how to best tackle this. Suggested solutions ranged from simply using a pop-out window (similar to Dolphin) all the way to potentially implementing something into the Avalonia project framework itself for our specific use case. None of these seemed practical nor solutions that our users would feel comfortable with for a supposed “upgrade”.
So. Are we back to square one? Well, yes! There was a second crack this month at returning to our roots with an embedded window and we’re pleased to say it’s been a resounding success. The interactions between the parent and child windows are not causing focus issues this time around and with the return of full render control comes better overlay support and a presentation experience on-par with GTK. Did I mention that it also removed over 3500 lines of code while only needing to add 800? Simplified and better.
General tidying of some bugs and UX improvements including some font changes, border additions and alignment fixes were merged this month, which should hopefully make things less of-centre or floaty.
While technically an October change, we’re happy to note that the updater now functions correctly on the Avalonia builds, meaning that anyone who wishes to ‘beta test’ the new UI can do so with a fully self-updating build. Head over to our GitHub releases page and select the “test-ava” build for your operating system if you wish to give it a whirl.
To wrap up a most productive September, let’s take a walk down the road that our kernel and services emulation took to reach us here.
A primary goal has been continued work on our network and BSD services as they affect a great number of games, even if we don’t directly connect to any real Nintendo servers. A null reference exception when launching Victor Vran Overkill Edition with guest internet enabled was fixed and a small oversight causing sockets to return incorrect result codes was resolved. Following on from this the methods the sockets use to poll was improved and SendMMsg/ReciveMMsg were both implemented in the bsd service for completeness.
Games that pack multiple titles into a single executable (think Super Mario 3D All-Stars or some other game collections) do some rather strange things when moving between their bundled applications. One such title is Prinny Presents NIS Classics Volume 3: La Pucelle: Ragnarok / Rhapsody: A Musical Adventure which requires the list of current users when it transitions into one of the actual games. Previously, the services required to do this were stubbed and so returned empty lists, causing a crash. By properly implementing the ListOpenContextStoredUsers service and stubbing LoadOpenContext, this title, and potentially others with a similar issue, head in-game.
An optimization to the placeholder manager tree lookup arrived this month with the primary aim of allowing games that perform a large number of memory mappings to shut down a lot faster. Previously games like Shin Megami Tensei V would take a considerable amount of time to close:
One more quick fire round until we finish:
- A work buffer is now allocated for the audio renderer instead of using guest supplied memory. This fixes a crash in Urban Trial Tricky and Mutant Year Zero: Road to Eden where the memory required would be unmapped before the audio renderer was disposed.
- Endianess was fixed in the sfdnres socket service to prevent an issue during port serialization.
- A regression to partial unmap protection, which was causing text in Super Smash Bros. Ultimate to render incorrectly, was resolved.
- A bug in the SSL GetCertificates call when certificate ID was set to All was fixed. This allows Life is Strange: Remastered to boot.
And that’s all she wrote! We’re barrelling toward the end of 2022 and that means we draw ever closer to another Pokémon launch; nightmare fuel for us all. However, on a related topic, we are planning to release a new LDN build sooner than expected thanks to a number of the Splatoon 3 bugs listed in this very report being rather crucial for good LDN play; especially for those on AMD GPUs. We can’t provide an ETA yet but rest assured it will not be many months like the delay seen in the last release.
If you're an eager beaver and follow all the news around Ryujinx, then you may have already seen our lead developer, gdkchan, give a brief interview about the project over at linuxgamingcentral (check it out if you haven't already done so). As gdk mentions, there is indeed a small surprise brewing that we hope to be able to tell you more about in the coming months. It isn't quite ready for open discussion in these blog posts but we promise that you won't be disappointed!
Finally, here is the usual sales pitch to anyone with a software development background, an interest in emulation/3D graphics or literally anyone who thinks they could contribute anything at all to the software package we currently provide. Core emulation, web development, GUI & UX improvements all the way down to simple code cleanups are all areas that make an emulator tick. We wouldn’t be where we are without generous people taking an interest in this field and dedicating some of their time to our funny corner of the internet. We’re always available on our GitHub and Discord!
Until next month!