Progress Report February 2023
Bye-bye February, you won’t be missed. Does anyone actually like it? Short, cold and dark. Maybe if you live in an upside down part of the world you disagree, but you’d still be wrong!
February marked a couple of exciting events in the lives of Nintendo fans: a Direct, a Pokémon Presents, and a stealth drop of a certified classic; thus proving that if you need to delay a game, simply release a remaster of the old one to plug the gap. Luckily Metroid didn’t offer too much resistance to emulation but we’ll talk about that later. First on the agenda is glancing through our patreon goals, one of which we’re so close to!
$2000/month - Texture Packs / Replacement Capabilities - Reached, work will begin on this feature if this amount is maintained for one more month.
This will facilitate the replacement of in-game graphics textures which enables custom texture enhancements, alternate controller button graphics, and more.
$2500/month - One full-time developer - Not yet met.
This amount of monthly donations will allow the project's founder, gdkchan, to work full-time on developing Ryujinx. All our contributors currently only work on the project in their spare time!
$5000/month - Additional full-time developer - Not yet met.
This amount of monthly donations will allow an additional Ryujinx team developer to work full-time on the project.
Sound good? Moving on…
We’re kickstarting this section with some certified cursed™ gaming. For anyone who’s tried to play the Mario+Rabbids games on Ryujinx, you’ll know that until recently they were both… questionable experiences. The first title had significant performance issues and the newer game, Sparks of Hope, didn’t render much of anything; not unless you spoke gentle words and sacrificed a few goats the week prior. While the performance issues of the first entry got some love, the graphical issues of the second were more challenging to solve.
The first obstacle was to determine why the game sometimes rendered and sometimes didn’t. Sparks of Hope is using a buffer clear to remove texture data from the GPU, by clearing CPU side data on the GPU buffer clears, the game no longer reads garbage leftover memory when rendering and thus removes the random nature of this particular quirk.
Still not quite right though. The game attempts to alias a R8Unorm texture as RGBA8Unorm which was incompatible. Shifting some of our copy dependency rules around is luckily enough to resolve this.
Another title off the ‘cursed’ list! How about another?
The Legend of Zelda: Breath of the Wild is all hot at the moment, primarily of course due to its successor being oh so very close to release. This has triggered a lot of folks asking almost daily “Will it run fine at release?” and to this our answer is always one of two things. Either a crystal ball emoji (a personal favorite), or a lethargic ‘We’ll have to wait and see.’. But we’re trying our hardest to make the chances as high as possible.
Many users have reported on the more recent BoTW updates, the mysterious case of the legendary ‘Bike Nuke’. This phenomenon was triggered fairly consistently by activating the Master Cycle Zero rune, but was also sighted at random moments in gameplay.
The problem was looked at a couple of times over the years with the most constructive session leading to the following discord message: “yeah, this is toast”.
However, all it took was a couple of indie titles to come along and break in the exact same way, but this time with a twist. Consistently!
The hardest bugs to fix are the random ones. They aren’t deterministic and even if you do get one to happen, isolating the cause through the hundred or thousands of actions you performed prior is nigh impossible. This all changes if you can be sure something will break at a specific point and this is exactly what was happening in both “void tRrLM();” and “The Longest Five Minutes”. These games highlighted the need for us to handle cases where texture size between the cache and pool were mismatched as the older method was clearly insufficient. The solution? We sure hope you aren’t bored of copy dependencies!
And finally, Breath of the Wild. At last, able to re-enact the dream of being in an environmentally-friendly biker gang!
Players of Pokémon Scarlet and Violet with AMD graphics cards will be pleased to hear that two of their issues have been killed with a single change. Starting S/V with any resolution scale other than native would crash on these cards due to an unsupported blit operation but prior to this game’s release we weren’t quite as aware of the scale of games this impacted. It turns out that a number of games including Fire Emblem: Engage and TLoZ: Links Awakening have also been broken in the exact same way for seemingly a very long time.
A separate path for AMD cards has therefore been created which makes use of the VK_EXT_shader_stencil_export Vulkan extension, which notably Nvidia does not support at the time of writing. Luckily Nvidia and Intel GPUs don’t need to use this safe path so it’s a change that in practice should only affect AMD.
As a bonus this change seems to have also resolved the anomalously low performance AMD GPUs were seeing in Scarlet/Violet. One user even commented “upgrading” their experience by moving from an RX 6800XT to a GTX 1660, something that on paper, probably shouldn’t be happening. Tested with a Ryzen 5 5600X and an RX 580, performance jumped from 33 to 45 FPS with these changes. Right in-line with the Nvidia equivalent.
Unfortunately we have to move away from areas that AMD can currently follow. Some titles, but most prominently Mario Party Superstars (MPS), make use of some operations and extensions that only Nvidia have support for. The first of these fixes were mentioned way back on MPS’s release date, with AMD lacking VK_EXT_fragement_shader_interlock on Vulkan and also ARB_fragment_shader_interlock on OpenGL. To this day it means AMD GPUs cannot render certain effects in mini-games such as Spotlight Search, whereas their Nvidia counterparts can.
Further MPS mini-games and certain screens of Luigi’s Mansion 3 make use of so-called ‘Programmable Blending’ which is implemented via microcode on the Switch’s Tegra X1. Graphics APIs on desktops however, such as OpenGL and Vulkan, do not expose such direct functionality, instead opting to provide extensions such as ‘VK_EXT_blend_operation_advanced’. You can check at home how many GPU vendors support this extension and if your GPU is listed among them. For Ryujinx it means that for now only Nvidia has the pleasure, but for future uses, and any Switch emulators on Android devices, Snapdragon Adreno GPU drivers also provide options.
And in Luigi's Mansion...
The kicker is that these advanced blend modes could be emulated with the use of fragment shader interlock… if AMD supported that. Users of these cards will be pleased to know that all hope isn’t completely lost though. AMD could implement support for one or both extensions sometime in the future, or an LLE approach to advanced blending can be implemented. If you’re on Linux, then go and pester those smart folks who develop the RADV driver!
The current implementation uses the Vulkan and OpenGL extensions to match whatever the Switch is doing with microcode operations but this could be done manually. We’ve avoided doing this for the time-being as the complexity and time cost compared to using API extensions is immense. It is, however, on the cards going forward for vendors such as AMD and Intel who may not support, or only support a small subset of blending extensions.
Onto the new release this month of Metroid Prime Remastered; a truly shocking reveal that had everyone born in the 90s produce a simultaneous scream. If we ignore the fact this release likely means Prime 4 is being delayed even further, it was cool to see such a high effort remaster in this day and age. As far as emulating it went, the game booted and was technically playable from start to finish on day 1 but with a few caveats which we’ll get into now.
First up to fix was a nasty crash when using Vulkan that was isolated to the SPIR-V shader generator (as OpenGL was unaffected) when the shader was using input or output indexing.
The next problem was graphical and affected both backends.
While the inclusion of Dark Samus would have been a nice touch, this was being caused by a limitation with how we handled partially mapped textures. Prior to this, partial mapping was supported but not when the start of the texture was unmapped. Any punts as to what Prime Remastered was doing? We’ll give you five guesses.
Unfortunately the fix here created another issue which we’re still in the process of solving. To deal with the unmapped start of textures, a ‘mega-texture’ of sorts needs to be created at certain moments. For anyone who’s played the game on Ryujinx recently, you may be able to infer when these happen due to the large hitch that can occur when going through doors and loading new areas. Rest assured that we’re aware of this and solutions are currently in the pipeline so stay tuned!
Some smaller changes to our GPU emulation this month included:
- Fixing partial updates for textures - resolves random black or garbled textures appearing mainly in UE4 games such as Tony Hawk’s Pro Skater.
- vkCmdSetViewport is no longer called when viewportCount is 0 - resolves some Vulkan validation errors.
- Vulkan 1.2+ is now enforced at instance level and Vulkan 1.1+ is now enforced at device level - Ensures Ryujinx does not attempt to initialize incompatible Vulkan devices.
- Cleanup of Vulkan MemoryAllocator - removes vkGetPhysicalDeviceMemoryProperties from being called repeatedly at runtime and cleans up MemoryAllocator.
- Respect Vulkan spec for VK_KHR_portability_subset vertex stride alignment - alignment now supports any power of 2, instead of being hardcoded to 4.
We end the GPU section on another remaster and another Zelda game. Skyward Sword HD took a little love this month with the resolution of a Vulkan-specific bug in one of the later-game dungeons. OpenGL allows primitive restart on all topology types while by default Vulkan would prefer it only be used on strip and fan. Luckily, by utilizing VK_EXT_primitive_topology_list_restart we can expand this supported topology list and match the OpenGL behavior.
Implementing support for things like filters and anti-aliasing techniques is a little like asking a child to improve the Mona Lisa. Whatever tools you provide them, they’re going to take the biggest brush and create a mess. They’ll think it looks amazing, everyone else will shake their heads and cry. Unfortunately for us, it’s been a popular request for many years and it does have some genuine use cases. Below we’ll outline what’s currently available, what it does, and what the ideal scenario to use things/leave them alone are. This can be some quite pixel-peepy stuff so we recommend you try everything out and see what you like and don’t like!
Aliasing is caused by everything on your screen being broken down into square pixels. Eventually even curved or circular edges need to be squares somewhere. At high enough resolutions you can barely see this; at lower resolutions the so-called “stair-casing” is very obvious. Anti-aliasing (AA) attempts to smooth these edges through a variety of techniques such as blurring and edge detection to make jagged surfaces appear smooth.
Ryujinx now offers two anti-aliasing techniques:
- Fast Approximate Anti-Aliasing (FXAA)
- Subpixel Morphological Anti-Aliasing (SMAA)
FXAA was one of the earliest forms of AA developed by engineers at Nvidia. Its goal was to be fast and functional, but not particularly amazing at edge detection. As such it tends to over-blur even non-edges and has really only been provided due to its simplicity.
SMAA is similar in concept to FXAA but uses much better edge detection in its shader. This means that it can more clearly define where in the image to apply the blur and ideally leave more of the screen in sharp focus. SMAA itself breaks down into 4 subsections: Low, Medium, High and Ultra. These sub-options define certain parameters in the SMAA shader such as edge thresholds and how many AA passes it makes.
On a modern GPU the cost of FXAA and SMAA, even at Ultra, are negligible. So realistically we do recommend SMAA Ultra if you’d like some level of AA on those particularly jagged games. On the flip side, we don’t recommend enabling this for pixel-art titles, or games whose art-style is designed around sharp edges. We will judge you for it!
Whenever a piece of content doesn’t exactly match the resolution of your monitor or TV, there needs to be some form of scaling in order to make that piece of content fill your screen. If no scaling was applied then you’d simply get black bars around the edges of the screen where no data was present. Your GPU or monitor usually does this automatically as is shown with a handy infographic in the Nvidia control panel if you were to display a 1080p image on a 4K screen.
As most Switch games aren’t even 1080p, let alone 4K, we need to scale them to the size of the program window, or even to the size of your monitor when playing in fullscreen.
There are many ways to scale an image. Some common ones you may have heard of include: Bilinear, Bicubic, Nearest Neighbour and Lanczos. All have their strengths and weaknesses which is why a lot of this comes down to personal preference. Ryujinx currently supports the following three scaling filters:
- Bilinear filtering (current default) is usually what the Switch itself will use to scale images to the output monitor and hence should be most accurate to real hardware output. However some view it as a little blurry, especially at lower resolutions.
- Nearest Neighbour is a very basic technique that simply replaces every output pixel with the “nearest” real input pixel. As such it creates a very blocky and aliased final image. This can however be a bonus when scaling pixel-art or retro titles such as Celeste and the GameBoy NSO emulator as no attempt will be made to smooth any edges.
- AMD FidelityFX™ Super Resolution 1.0 is a filter designed by AMD to take a lower resolution image and upscale it to a higher resolution. Note that only FSR 1.x is usable here as FSR 2.x makes use of temporal data such as motion vectors in a similar fashion to DLSS. When FSR is selected, a slider will also appear in settings which controls how much sharpening is applied. 100 = maximum, 0 = minimal sharpen.
To run through the same comparison as with AA the same scene will be used with no AA filters applied.
Scaling filters can be used to produce an image that better suits your personal preference or style of game you’re playing. Even though it's mostly subjective, we’ve made a little table to highlight the intended use-case of each and what to try and avoid.
Fairly short one this month. The first stages of the more complex graphical fixes were upstreamed this month by moving gl_Layer to the vertex shader if geometry shaders are unsupported. This allows some UE4 games to begin rendering on self-compiled macOS builds.
The method used is a little different to that in macOS1, as there we have geometry shader emulation to worry about too. As previously mentioned though, ideally MoltenVK will natively support those before everything is ready, in which case no issues should arise.
An updater script is also now included in macOS releases, as attempting to replace your own program while running, as is currently done on Windows and Linux, can invalidate the code signing on Apple systems. This is mainly a stopgap until a better solution can be found and isn’t functional at the moment anyway, as the macOS releases are not a part of our main build pipelines.
Tying into the last section, many Mac users have reported some fairly nasty screen tearing which hasn’t been reported on Windows or Linux. When searching for why this would only affect seemingly Apple devices, we discovered that in Avalonia windows we’d forgotten to call the device VSync at render time, resulting in tearing on systems where the driver couldn’t save you. Nvidia, AMD and Intel all force VSync by default if the program doesn't enforce it and so this went unnoticed for a fair while! This also slightly improves the micro-stutters on macOS as there is now some semblance of refresh rate sync.
As far as service changes go, LoadOpenContext in the account services was implemented in February and allows some multi-game collections such as Prinny Presents NIS Classics Volume 1: Phantom Brave: The Hermuda Triangle Remastered / Soul Nomad & the World Eaters (so very long…) to head in-game. Can also resolve crashes in multi-game collections when attempting to head back to the game selection pages.
Some games, such as Kingdom Rush, were crashing when requesting an unknown NPadID type. Adding additional checks to determine whether received IDs are even valid resolves those particular crashes and allows Kingdom Rush to be played.
And last, but very much not least, here is an image from code contributor Lostromb after they gave our audio upsampler a SIMD optimization kick.
Below is a table which shows the time method used vs time spent on audio upsampling from the old (baseline) method and the new (SIMD) methods.
Well that was a long one. For a shorter month. How does that even work?
We’d like to once again thank everyone who supports us every month on Patreon, contributes code to us on GitHub and those who help other users out with troubleshooting and bug reporting in our Discord! We couldn’t do it without you.
As mentioned right at the start, we’re sitting barely above the texture replacement patreon incentive and if that figure is maintained until the next progress report is released, work will begin!
Until next we meet…