Patreon Goals/Vulkan Update & Progress Report April 2021
LDN2.2, Miria (input re-implementation), and Custom User Profiles, oh my!
LDN2.2 was released this month with specific performance enhancements making Monster Hunter Rise a much more playable experience while also providing internet multiplayer for those wanting to hunt together, and a true-up to all the latest improvements master had to offer for any others wanting to play their games on LDN. Miria, the input code overhaul to kill off all usage of OpenTK3, was also released this month. And last but not least: Custom User Profiles, the second Patreon feature goal, was merged into the main build in April.
Before we jump into the April 2021 updates, let’s take a look at the current state of Ryujinx’s Patreon goals and deliverables:
Amiibo Emulation - merged into the main build in March 2021.
While compatibility is now almost perfect, there are still some improvements to come for Amiibo which can be tracked on the associated Github issue here: https://github.com/Ryujinx/Ryujinx/issues/2122
Custom User Profiles - merged into the main build in April 2021.
Vulkan GPU Backend - still in progress, ETA delayed from April to May 2021 for public testing. See below for a more in-depth report on the progress of this feature.
ARB Shaders - Goal reached this month! Work on ARB shaders will begin as soon as Vulkan is finished.
ARB shaders will further reduce stuttering on first-run by improving the shader compilation speed on NVIDIA GPUs using the OpenGL API.
$2000/month - Texture Packs / Replacement Capabilities - Not yet met
This will facilitate the replacement of in-game graphics textures which enables custom texture enhancements, alternate controller button graphics, and more.
$2500/month - One full-time developer - Not yet met
This amount of monthly donations will allow the project's founder, gdkchan, to work full-time on developing Ryujinx.
$5000/month - Additional full-time developer - Not yet met
This amount of monthly donations will allow an additional Ryujinx team developer to work full-time on the project.
Preliminary internal testing has begun on Vulkan, and its integration with the new Avalonia UI, while the remainder of the features are finished out. Of these SPIR-V support is 90% done, leaving alpha test, multisampling, resolution scaling, shader cache, and of course thorough testing on AMD and Intel GPUs remaining.
Below you can check a comparison of Mario Kart 8 Deluxe running on an AMD RX 5500XT GPU, first on OpenGL and then a Vulkan test build.
Will Vulkan include a shader cache?
The main goal right now is getting Vulkan delivered in a good working state. If this is done quickly enough and there's still enough time left for a shader cache implementation, then shader cache could be added. Otherwise, it will be left as a follow-up task and will be implemented after the Vulkan implementation is merged into the master branch of the emulator. To clarify, our ETAs of these Patreon goal features is the estimated time to develop the feature and submit the PR (pull request) for public testing & review. Actual merge dates into the main build may vary and do not have ETAs.
Will Vulkan include async shader compilation?
No, for the same reason it is not implemented in OpenGL. The async shader compilation feature implemented on other emulators come with compromises like visual glitches while shaders compile, some of them being persistent until emulation is stopped. It also does not fully eliminate stutter as some shaders cannot be compiled asynchronously. For those reasons, the current Ryujinx GPU emulation developers are not interested in implementing it. We are not opposed to an external contributor implementing this feature, however.
Will Vulkan support resolution scaling?
Yes. The goal is having feature parity with OpenGL upon release.
What kind of improvement can I expect?
AMD and Intel GPU users should expect a significant improvement in compatibility, performance and, especially for Intel GPU users, a drastic reduction in graphical glitches. NVIDIA users can expect a small performance improvement in a few titles (we will defer saying which ones until after the Vulkan implementation is properly optimized and tested). All GPU vendors should benefit from faster shader compilation and less stuttering on Vulkan.
Little help here? - Implement shader HelperThreadNV
By early April Monster Hunter Rise was nearly perfect graphically, with one exception: some distant objects flickered, making for a distracting experience. See below for an example of what this looked like.
This issue was caused by an incorrect instance count from a compute shader which was not receiving proper data from a SSBO. In order to address this, the OpenGL extension gl_HelperThreadNV (which uses gl_HelperInvocation now, supported by all GPU vendors) was implemented, allowing the associated helper threads on shuffle operations to pass the needed data appropriately.
After the update, no more flicker!
One important item to note is that adding this extension raised the minimum OpenGL version from 4.4 to 4.5. Some (much) older GPUs may not meet this new requirement but odds are if you have one of these older GPUs, your PC is not up to the task of emulating Switch games regardless.
Implemented on (#2163) by gdkchan
Hold it right there! - Hold reference for render targets in use
This update changed the behavior of render targets in use to prevent them from falling out of the auto delete cache. This bug was discovered while developing Vulkan (on which a crash could occur if a freed texture was used). While no OpenGL-specific bug had been identified, it is possible that this change could make the emulator less likely to crash since textures are reused.
Implemented on (#2156) by gdkchan
Improve shader global memory to storage pass
On NVIDIA GPUs, shaders can access GPU memory directly, by calculating the address that should be accessed on the shader. Graphics APIs in general, however, do not allow this. Instead, one should bind the buffer that the shader will access beforehand, and then the shader accesses a region inside the bound buffer, rather than just freely reading and writing at any GPU virtual memory address.
For this reason, the emulator needs to find where the base address accessed on the shader is located, and then find the buffer that the address belongs to, to be able to bind it beforehand, and generate a buffer access on the shader, from the raw GPU virtual memory access.
The process of finding the memory access base address is complicated, and may fail. For this reason, it has a fallback for the cases where it fails, where it just checks every possible location on the shader itself and tries to find the base address and the buffer that is being accessed. This process is very slow and is better avoided.
This change allows the shader translator to find the buffer in more cases, and not need to use the fallback as often. The improvements realized here include reduced code size, improved shader code execution speed, and better buffer management (as there is no need to sync/write track unused buffers).
Implemented on (#2200) by gdkchan
Fix sub-image copies on intel GPUs
Intel iGPUs have notoriously produced incorrect output in the emulator, if they even survive the game’s boot process. This update, custom tailored to help Intel iGPUs while leaving other vendors’ GPUs alone, fixes some graphical glitches by creating a special path that passes storage handles rather than view handles, and adds the view base layer and level to the source & destination layers and levels.
Even though Vulkan is on the way, it doesn’t hurt to improve things on OpenGL in the meantime for our Intel users, right? See the comparisons below.
Mario Kart 8 Deluxe
Captain Toad’s Treasure Tracker:
Fixed on (#2198) by gdkchan
Divide & Conquer - HwCapabilities: Divide Intel into IntelWindows and IntelUnix
Anyone following the Switch emulation scene closely is probably already aware that Intel iGPUs don’t have the best driver support; Windows OpenGL support is particularly woeful. For this reason, the emulator has some specific workarounds to make up for the shortcomings of the Intel driver. As Linux support for Intel iGPUs is far more robust, these specific workarounds should not be applied lest they end up creating problems where there weren’t any to begin with.
This change divides up Intel device identification by whether they are detected in Windows or in Linux/UNIX. This way the emulator can ensure the best experience possible no matter which OS the Intel iGPU is running on. “But Ryujinx doesn’t work on Intel iGPUs in Windows anyway!” you say? Keep reading...
Implemented on (#2219) by A-w-x
Only enable clip distance if written to on shader
This update changes clip distances to only be enabled if they are ultimately written to on the vertex shader. Up until now, Intel iGPUs specifically had a problem with clip distance manifesting itself as flashing triangles, notably on first party titles like Animal Crossing: New Horizons and Mario Kart 8 Deluxe.
With this change, our Intel iGPU users can enjoy a huge quality-of-life upgrade on games that used to suffer from flashing triangles. See below for a comparison of Animal Crossing: New Horizons on an Intel iGPU.
Implemented on (#2217) by gdkchan
Reduce allocation during SSA construction
SSA construction was allocating an array for each basic block twice, it re-uses a single array instead now. This also removes DefMap's dependency on Register which makes extending SSA construction to handle LocalVariable easier.
Also marked Operand.GetRegister as aggressive inlining, because it wasn't getting inlined.
Implemented on (#2162) by FICTURE7
Improve StoreToContext emission
Hoist StoreToContext in dynamic branch fast & slow paths out into their predecessor. Reduces register pressure, code size and compile time a little bit because we're throwing less stuff down the pipeline.
What is register pressure, you ask?
Imagine that the emulated CPU is an assembly line with a conveyor belt full of objects that must be sorted, assembled, and delivered as a finished product. High register pressure, in this case, would mean an overloaded conveyor belt full of junk, slowing down the process in order for workers to catch up, sort the items, and assemble them for production. Reducing register pressure would mean that the objects on the conveyor belt would be more spread out, alleviating the load on workers and enabling them to assemble the product in time.
Tl;dr: reducing register pressure is good.
Implemented on (#2155) by FICTURE7
Fix CRC32 instruction when constant values are used as input
There was a bug with the CRC32 intrinsic where the JIT did not force a copy when a constant value was used as input for this instruction. On debug, this will cause an assert as this specific instruction does not support immediate operands on x86, and in release, it will just generate invalid code.
This update fixes the issue by forcing the value to be copied to a register, and then replacing the operation input with the register.
Fixed on (#2183) by gdkchan
Slaying a giant - PPTC vs. giant ExeFS
Some games that possess a large exeFS (looking at you, Monster Hunter Rise) have been misbehaving with PPTC, due to some limitations in the PPTC implementation. This update changes PPTC to be able to handle any size of JitCache and optimizes many aspects of its operation.
Boot-time PPTC retranslations—occurring after a PPTC code update in the emulator, or after the PPTC is purged—have also been optimized, with the emulator now taking into account the user’s free physical memory (instead of just the number of CPU cores) to calculate the number of threads to use for parallel translation.
Implemented on (#2168) by LDj3SNuD
Add inlined on translation call counting
As part of a campaign to improve CPU emulator JIT function lookup, the following changes were made to call translation counting/call counting:
- Add reusable structures to do on translation counting.
- Add on translation call counting.
- Change call counting behavior to be associated with nodes (i.e head of translations) instead of edges (i.e out of translation branches). Before if all incoming branches to a translation was not hinted for rejit, it would never tier up even though it could be hot.
Implemented on (#2190) by FICTURE7
Account: add Custom User Profiles support
Ryujinx’s second Patreon feature goal, Custom User Profiles support, was released this month! This allows customization of the default profile, addition of other profiles, the option of custom images, and separate save files for each profile used.
Implemented on (#2274) by AcK77
Allow DRAM size to be increased from 4GB to 6GB
This update, which has been added as an option under Hacks (as there are no known retail units with 6GB DRAM), allows increasing the amount of the emulated Switch memory from 4GB to 6GB.
The amount of memory available for the application is increased from 3.2GB to 4.8GB; the remaining memory is reserved for the system and, as with HLE, it is almost entirely unused.
There are a total of 6 configurations, but from those only 2 are useful for games; the other distributes the memory differently and gives more memory to applets or to the system, which is not very helpful for the emulator.
This change is only useful for mods that require more memory than the Switch possesses, such as the Monster Hunter Rise 4K resolution mod. Enabling this option does not offer higher performance in standard configurations without mods.
Implemented on (#2174) by gdkchan
nifm/ssl: Implement GetCurrentNetworkProfile and stub Ssl Service
One thing that most emulators shouldn’t do is try to connect to online services from the original vendor (Nintendo, in this case); for this reason, many network related services in the emulator had never been addressed. The lack of these services may seem trivial as the emulator is being used offline most of the time (except for our LDN users!) but it meant that some games relying on them, even in a minor capacity such as for uploading scores to an online leaderboard, would crash on launch or in menus, making gameplay impossible.
This update implements several of these missing network services while effectively informing the game that an internet connection is not available. On top of that, some other Ssl related services were cleaned up and partially stubbed, enabling even more games to progress further in the boot process, with many becoming playable.
Implemented on (#2186) by AcK77
Initial support for the new 12.x IPC system
Nintendo rolled out the new 12.0 system update in April, and with it came a new IPC system (named TIPC).
This implements initial support for 12.x and renames applicable references in preparation to handle the new SM command IDs.
Implemented on (#2182) by Thog
Miria: The Death of OpenTK 3
For as long as many can remember, controller support has been somewhat hit & miss in the emulator. Many Xinput controllers such as an Xbox 360 controller “plug n’ played” without issues while other controllers, including Nintendo’s own Pro Controller, suffered from a range of maladies such as being unable to map the right analog stick or simply not being detected by the emulator at all.
Many of these issues were attributed directly to the use of OpenTK 3 for input support. While not a terrible piece of software OpenTK 3 is considered ancient by current standards, and it showed. This update, code name Miria, completely removes OpenTK 3 from the emulator, replacing it with custom bindings (also developed by Thog), OpenTK 4 for OpenGL & OpenAL interfaces, and SDL2 for controller support.
Among other benefits, the sum of these changes provide for something new in Ryujinx: native motion support (without the need for 3rd party utilities like DS4Windows or BetterJoy)! Input configuration is vastly improved with the ability to hot swap controllers while the emulator is running, and native support for all console controllers.
Implemented on (#2194) by Thog
Focus! - Return focus from controller applet after completion
One particularly irritating bug that has existed in Switch emulation since the beginning is a softlock that occurred in Mario Kart 8 Deluxe when attempting to play local multiplayer with more than one controller connected (this could also happen when opening the applet by pressing + in character select). All this time, there has been a workaround where if you pressed L+R at the title screen on any controller other than Player 1’s (instead of the Player 1 controller), the softlock could be avoided completely.
Thanks to some of the R&D that went into providing the sneak peek in March’s progress report last month, riperiperi was able to identify & implement a working fix that resolves the softlock issue once and for all by returning the focus to the main window after a controller applet’s invocation has been completed.
Implemented on (#2218) by riperiperi
Amadeus: Allow out of bound read on empty delay lines
A widely acclaimed classic made its way onto the Switch in April. On the day of its release FEZ, an indie puzzle-platformer, could only crash on launch in the emulator. This ended up being due to an issue with the audio renderer’s delay lines and how data is read. On a Switch console, the data would be read from a given user’s work buffer address; in the emulator, these buffers are not used at all.
This update increases the size of the work buffer to provide a place for this data to go.
And now FEZ is playable!
Implemented on (#2223) by Thog
Can you hear me now? - Amadeus: Fix low pass base gain related issues on delay effect in mono
Now that FEZ was playable, something strange could be heard when comparing the game to original hardware: upon entering a room volume levels were increasing, making the sound louder. On original hardware, the opposite was true; upon entering a room, volume levels decreased.
As it turns out, a single missing set of parentheses had caused a gain related bug to reverse volume level changes for mono sounds using delay effects. Adding the parentheses resolved the issue, and now FEZ plays and sounds exactly like it should.
Implemented on (#2224) by Thog
If I wanted your input, I’d ask for it - Initialize hid inputs on activation to avoid spurious inputs
For quite some time now, certain games have exhibited issues with inputs randomly spamming themselves in the emulator, making gameplay difficult if not impossible in some cases. A few notable examples were Crash Bandicoot 4, Mega Man 11, and Balan Wonderworld. In these games either directional input or button spam made playing the game a chore and, in Mega Man 11’s case, simply impossible to play.
While testing the compatibility of Black Legend, there was some unsolicited input causing undesired movement of the character in-game, but when starting a new game these random inputs manifested as something new: a text field in the game visibly filled with junk after the user input the desired name into the software keyboard applet.
See below for what this looked like in the emulator:
As you can see, what was typed into the software keyboard applet was “mad”, but after accepting the input and closing the applet, a bunch of random keys added themselves into the field. This was a reproducible event and afforded newly-minted Ryujinx team developer Caian a reliable way to track down & troubleshoot the issue.
It was discovered that games were filling the HID shared memory with random garbage data as a placeholder. To mitigate this, HID inputs are now initialized on activation. As it turns out, this fix does not only fix this keyboard issue in Black Legend, but all of the aforementioned unsolicited input issues that had been plaguing many games utilizing Unreal Engine 4.
Implemented on (#2246) by Caian
Ready, set………. - Fix GetClockSnapshot not writing steady clock timepoint
On its release date Shantae, the recent port famously ported from the Game Boy Color with the help of Modern Vintage Gaming, did not boot in the emulator. A couple of moments after launch, the emulator would softlock or crash.
The game is supposed to show the "Limited Runs" logo for a few seconds, and this is done by drawing the logo on screen on a loop until the time has passed. To do this, it calls the Switch time service to get the current time before the loop starts, and then calls it again inside the loop, and calculates the time delta (how much time has passed since it draw the logo for the first time). The bug was that the time was not being written by the time service, which means that the time delta was always zero and the loop would never exit. This could be likened to a sprinter waiting for a starter pistol that was never fired, so the race couldn’t begin.
This one-line fix ensures that the timepoint is properly written, and the game now has what it needs to continue.
Note that this only affects steady clock time, which is basically a monotonic timer that counts since the start of the system. For this reason, not many games are affected by it, as there are other means to calculate such time deltas (like reading the CPU generic timer counter system register directly, which is the most used method).
Fixed on (#2249) by gdkchan
Update Pro Controller Image + Trigger View
Shahil-Ayato was not satisfied with his own previous Pro Controller image update and resolved to make a notable improvement.
As you can see below, the new image is much more friendly for button location & identification, and adds the trigger buttons view for easier reference.
Implemented on (#2128) by Shahil-Ayato
Enable updates in portable mode
Last month’s improvement to portable mode operation brought with it a much simpler method of enabling portable operation, and made Ryujinx able to be used in truly portable configurations.
Additionally this change disabled automatic updating entirely, allowing end users to update the emulator when and where they wished. However, due to popular demand, this update restores the option to use automatic updating in portable mode! If users wish not to be prompted for automatic updates, they need only disable the “Check for Updates” checkbox in the emulator options.
Implemented on (#2181) by jms-c
One of the things that we hear the most from users are from those asking for performance improvements, whether it’s someone with a top of the line PC wanting an even higher/uncapped FPS or a lower spec computer struggling to reach full speed. We have a project in the works that should bring performance improvements to most titles, on all GPU vendors.
If you want to have an idea of the performance improvements brought by this new project, check out the videos below. Both videos recorded on a system with the following specs:
AMD Ryzen 9 3900x
16GB 3200MHz RAM
NVIDIA GTX 1070
For now, that's all we have to say. We hope to release it soon, so stay tuned!
New code contributors April 2021:
Thanks to everyone that has supported us so far, be it via Patreon donations, code contributions, testing games in the emulator, or simply being an active member of our community. You’ve helped make this emulator what it is today!
We now have an active Patreon campaign with specific goals and restructured subscriber benefits/tiers, so head on over if you're interested in becoming a patron to help push Ryujinx forward!