Progress Report January 2022
New year, new month, and new progress! We hope everyone has been enjoying the new year as much as we have. 2022 has been quite the adventure already with a brand new main series Pokémon title, Pokémon Legends: Arceus. Alongside that big release we’ve been working hard with the amount of GPU updates and bug fixes we’ve been able to do this month.
Patreon Goals:
Amiibo Emulation - merged into the main build in March 2021.
While compatibility is close to being perfect, there are still some improvements to come for Amiibo which can be tracked on the associated Github issue here: https://github.com/Ryujinx/Ryujinx/issues/2122
Custom User Profiles - merged into the main build in April 2021.
Vulkan GPU Backend - still in progress, a public test build is delivered and is available here.
ARB Shaders - Goal reached in April 2021. Work is ongoing alongside Vulkan, please wait a little while longer until we are able to deliver this update into a state we are happy with.
ARB shaders will further reduce stuttering on the first run by improving the shader compilation speed on NVIDIA GPUs using the OpenGL API.
$2000/month - Texture Packs / Replacement Capabilities - hovering around this level!
This will facilitate the replacement of in-game graphics textures which enables custom texture enhancements, alternate controller button graphics, and more.
ETA once the goal is sustained: ~3-4 weeks
$2500/month - One full-time developer - Almost there!
This amount of monthly donations will allow the project's founder, gdkchan, to work full-time on developing Ryujinx.
$5000/month - Additional full-time developer - Not yet met
This amount of monthly donations will allow an additional Ryujinx team developer to work full-time on the project.
So without further ado let’s get into this month's progress!
Vulkan Progress:
You people are suckers for this section aren’t you…
Well, without further ado let’s get into some of the juicier changes, fixes and additions to the Vulkan backend!
Final Fantasy VII had a rather… unique visual glitch on AMD cards:
If you haven’t played FF7 in a while it isn’t meant to look like that. Luckily this is included in our list of fixes, still a minor issue but much better:
Professor Layton shared a similar issue:
Became:
Direct SPIR-V (Vulkan’s shading language) compilation is disabled by default on the main pull request download to make testing easier, but that doesn’t mean it’s been forgotten about in this report! gdkchan has been hard at work fixing a mountain of bugs with the shader backend and it’s starting to pay some handsome dividends.
SPIR-V shaders are much faster to compile than the GLSL shaders that are currently used on OpenGL. Check out the comparison below (both without any cache):
OpenGL GLSL Vs. Vulkan SPIR-V:
Pretty cool right? It may be worth mentioning that the OpenGL video is using fully multithreaded compile while Vulkan is only using single-threaded compile! That’s the power of SPIR-V and we may have more to share about parallel compile in the next report.
Breath of the Wild is our first case study. A game lots of more adventurous users were using as a check to see if they had managed to enable SPIR-V due to some, well… easy-to-spot problems!
Before (spooky):
After:
Next let’s check out a few games that straight up didn’t boot using SPIR-V before some recent changes:
Shin Megami Tensei V:
Monster Hunter Rise:
Paper Mario:
Are you all sufficiently Vulkan’ed out for a bit? We hope you’re not too tired to continue, because life outside of Vulkan was also getting some major attention!
GPU
Add support for render scale to vertex stage.
Games can occasionally read off textureSize on the vertex stage to inform the fragment shader what size a texture is without querying in there. Before, Scales were not present in the vertex shader to correct the sizes, so games were providing the raw upscaled texture size to the fragment shader; this was incorrect behaviour. There’s two downsides to note from this: one is that the fragment and vertex support buffer description must be identical, so the full size scales array must be defined when used. Another is that the fragment texture count must be updated when vertex shader textures are used.
Fixes render scale causing a weird offset bloom in Super Mario Party and Clubhouse Games (Clubhouse Games still has a pixelated look in a number of its games due to something else it does in the shader). This also fixes a regression where some line artefacts would appear if you upscaled games such as Hyrule Warriors Age of Calamity.
Super Mario Party
Before:
After:
Hyrule Warriors Age of Calamity
Before:
After:
Implemented by riperiperi in #2763
Texture Sync, incompatible overlap handling, data flush improvements.
This was quite the big update so grab your popcorn. This huge update aimed to solve a bunch of issues caused by texture modification and data flushing, by primarily handling data flush on a per-handle instead of per-view basis, and synchronising flushes with syncpoint increments. It also introduced a new backend method, so it's not currently compatible with Vulkan.
Part 1: Syncing Texture Flush
This change has been in the pipeline (wink wink) for quite some time. When texture flush via memory tracking was first added, our CPU and various other components were considerably slower. The GPU was so tight with the CPU that when the game tried to access the data, it was almost certain that the draw would have been completed and the data would be there, just by chance. This did not last, over time people started encountering various issues where textures would flush before their data was fully ready, causing white water in BotW (milk water):
Rainbow lighting in Splatoon:
This was compounded a lot by the addition of Backend Multithreading and Vulkan, so much that people thought there was a regression. There wasn't, it’s just that as that one meme said… “Being faster than light means you can only live in milk and rainbows”.
To fix this, we need to look at what is even happening in all this flushing and textures and other words I had to look up before writing this. When a game flushes texture data, there is usually some sort of notification to let the system know whether or not the data it wants is even there. If this system isn’t used then it’s all a roll of the dice, and could end up in a race between the CPU and GPU. While this sounds cool on paper, it’s a nightmare in practise and ultimately undefined behaviour, so some changes were made to the backend to ensure that: given a race, the GPU always wins.
Part 2: Flush ordering for incompatible data
As of right now, Ryujinx follows a core assumption with the texture cache, where a use of a texture should be valid in its own layout + format (imagine this as shape + size).
This core assumption keeps important texture data alive while saving time by not flushing or loading “garbage” data. Sounds great right? Well here’s the kicker: this rule wasn't fully established before and as always there are exceptions to every rule…
The issue lies in that incompatible textures only have the potential to be deleted when a new overlay appears, and any checks only happen when the texture is created. If both textures existed in the cache at the same point, they could flush separately... in any order.
This is where the new rules come into play. To put it simply, if a texture is written to and some other textures try to use its memory, their data is considered invalid. This means that only one can live at a time, and therefore data flushes will always use the latest available information. Problem solved! Note that these rules were already in place before, they were just enforced on creation, rather than on each use. This is a large sweeping change that affects every game.
Part 3: Flushing host incompatible formats
Switching gears slightly there is occasionally a texture format or relationship used which isn't fully supported. Two such examples are:
- ASTC compressed textures, which are not supported on desktop GPUs (other than ironically enough Intel iGPUs).
- BCn compressed 3D textures, which are not supported by OpenGL (but can be supported by Vulkan)
Ryujinx supports these formats by converting them to a supported, uncompressed format on the CPU. But, this means that data cannot be accessed directly on the GPU, which is quite important to… you know, render stuff. As anyone could guess this isn’t ideal and was causing a whole host (my pun game is on fire!) of issues.
Life is Strange: True Colours, and potentially other UE4 games use ASTC textures for characters and environments:
BotW draws into a compressed 3D BCn texture to use for the blue dissolve teleportation animation:
Before, Link would just disappear immediately as Ryujinx could not move the data for this texture. The change to fix this was not the fastest, but it is fully compatible and allows us to cover cases which were completely broken before, and sometime in the future, allow support for platforms that don't support BCn, like mobile hardware.
Life is Strange:
BotW:
So after that rather lengthy section let’s look at some pretty pictures together.
You’ll need to head to Lon Lon Ranch for your milk now:
Splatoon 2 won’t play itself and splatter the whole map with rainbows (unsure if this is a W or an L):
We all learned a lesson here. I myself now know that if you constantly shout “CEMU MILK WATER GX2DRAW DONE” at riperiperi he would likely solve the climate crisis if it got you to shut up! And in a way, he did. Hyrule’s water runs clean again.
Implemented by riperiperi in #2971
Fix sampled multisample texture size
The width/height of the render target and copy textures is already pre-multiplied by the driver for multisample textures. For shader sampled textures that are on the pool, they are not pre-multiplied. This changes how they multiply their size by the multisample size, in order to allow them to match existing textures on the cache, in addition to allowing the texture to have the correct size (as the TextureCreateInfo that is passed to the backend has the width and height divided by the amount of samples).
Fixes rendering on Okami HD.
Before:
After:
Implemented by gdkchan in #2984
Implement IMUL, PCNT and CONT shader instructions, fix FFMA32I and HFMA32I
Ryujinx is capable of running a variety of homebrew applications, though some may not run as well as others. MelonDS, a Nintendo DS emulator introduced us to IMUL, PCNT and CONT shader instructions which we weren’t aware of before, the last two are similar to existing PBK/BRK and SSY/SYNC pairs. While working on implementing these the FMUL32I instruction implementation got fixed up along the way with modifying it so the third operand should use the destination register, not "SrcC'' as it does not exist for this instruction. An issue similar to the above one for HFMA32I as well, but this one was also missing from the instruction table so this was remedied.
Implemented by gdkchan in #2972
Fix adjacent 3d texture slices being detected as Incompatible Overlaps
The big changes Texture Sync brought were quite big but some issue came up and caused the Xenoblade games to have odd colour grading. Essentially what was happening was the rendered 3D texture data was lost for most slices.
Implemented by riperiperi in #2993
Fix render target clear when sizes mismatch
On OpenGL and Vulkan when the bound render targets have different sizes, then it only renders on the intersection of all their sizes. On the GPU, this clipping is controlled by the ScreenScissorState pair of registers. This register was being mostly ignored before, but for clears, that may cause issues if there are render targets of different sizes bound, and the game is trying to clear one of them, with a screen scissor size that matches the target being cleared. OpenGL would clip it to the smallest size and not clear the entire region. This issue was fixed by forcing all other render targets to be unbound, to avoid the host clipping, and then using a custom scissor region, calculated from the screen scissor and user scissor (0).
Fixes Pathway not having the screen entirely cleared.
Before:
After:
Implemented by gdkchan in #2994
Add capability for BGRA formats
This adds a new capability called, SupportsBgraFormat. On OpenGL, it is always false as the API has no support for BGRA texture formats. However, it will be set to true on Vulkan, which allows us to use those formats there, without needing to swap the components ourselves on the fragment shader output. The main goal here is reducing the difference between the Vulkan branch and the current branch which makes reviewing much easier.
Implemented by gdkchan in #3011
Stop using glTransformFeedbackVaryings and use explicit layout on the shader
On the Nintendo Switch there are two ways to specify what should be written to the transform feedback buffers when the feature is enabled on OpenGL. The first and the one that we use currently is passing the name of the shader outputs to be written using the glTransformFeedbackVaryings function. The newer method is specifying it directly on the shader using layout qualifiers. This change implements the latter. The reason for that is that Vulkan only supports the latter, there is no "TransformFeedbackVaryings" function on Vulkan to specify that information outside of the shader. In fact, this code for this change was mostly pulled from the Vulkan branch. So, the main advantage here is reducing differences with Vulkan and Master, which will make review easier, and will allow us to use the same method on both APIs. One limitation of this new approach is that it's not possible to, for example, write the same output into multiple buffers (although, it may be possible to create multiple outputs and copy the value). But since games also have to specify the transform feedback layout, they should also be bound by the same limitations.
Implemented by gdkchan in #3012
Fix deadlock for GPU counter report when 0 draws are done
A few games on Nintendo Switch use what’s called conditional rendering, it’s where a game renders a different user interface (UI) markup if a condition is true or false. Sometimes a rare bug on Ryujinx would occur where reporting a counter for a region containing 0 draws could deadlock the GPU. If this write overlaps with a tracking action, then the GPU could end up waiting on something that it's meant to do in the future, so it would just get stuck. Before, this reported immediately and wrote the result to guest memory (tracked) from the backend thread. The backend thread cannot be allowed to trigger read actions that wait on the GPU when backend threading is enabled, as it can end up waiting on itself, and never advancing. In the case of backend multithreading's SyncMap, it would try to wait for a backend sync object that does not yet fully exist, as the sync object would exist according to the GPU and tracking, but it has not yet been created by the backend. The fix is to queue the 0 draw event just like any other, its _bufferMap value is just forced to 0, and it will be flushed with other events on the counter queue. This fixes the issues games with conditional rendering such as Super Mario Odyssey, Mario Kart 8, Splatoon 2
Implemented by riperiperi in #3019
Add support for BC1/2/3 decompression (for 3D textures)
The ginormous texture sync update added support for flushing incompatible overlaps that uses unsupported compression formats. However, only the BC4 and BC5 compression formats were supported. This extends it to support the BC1, BC2 and BC3 formats. This fixes broken textures on games using those formats with 3D texture, on OpenGL. Vulkan does not have the issue as it supports 3D compressed formats. Other changes include, added new "Supports3DTextureCompression" capability, always false on OpenGL but should be set to true on Vulkan, Changed Capabilities property on GpuContext to return the struct ref to avoid copies and also changed the Capabilities struct properties to readonly fields, also to avoid copies.
Removed the Bc1Rgb formats. They were unused (in fact they are pretty useless since there's no difference between the RGB and RGBA variants, other than the alpha component being ignored (can be done by setting alpha on the swizzle to one)) and finally, optimized existing BC4 and BC5 decompressors as well. BC4 is about 2.5x faster here, while BC5 is about 2.1x faster (tested with a randomly generated 256x256x2 3D texture).
Fixes text in Tales of Vesperia.
Before:
After:
Fixes explosions in Xenoblade Chronicles 2.
Before:
After:
Implemented by gdkchan in #2987
Fix res scale parameters not being updated in vertex shader
Before on Ryujinx, render scale arrays would not be updated when technically the scales on the flat array were the same, but the start index for the vertex scales was different. This fixes the issue by updating the scales in the support buffer when the vertex stage has bindings and fragment stage binding count has been updated since the last render scale update.
Implemented riperiperi in #3046
Add timestamp to 16-byte/4-word semaphore releases.
The Legend of Zelda: Breath of the Wild had a bug where the game would act as if running at 20fps in Ryujinx was full speed, and going above that would make it go above its native frame rate. This was incorrect behaviour and it was a long standing bug that just stumped developers as the issue was strange. It turned out what was happening was the game was reading a ulong 8 bytes after a semaphore release: this is the timestamp it was trying to do performance calculation with, so it's been made so it writes only when necessary.
This fixes BotW being capped at 20fps all the time (now it only does this when the game runs too slowly).
Implemented by riperiperi in #3049
CPU/HLE/Kernel
ffmpeg: Add extra checks and error messages
Some games use H264 video encoding which is displayed to the user via an ffmpeg context. If the system did not have the correct packages installed Ryujinx would crash in a null value error, which of course wasn’t ideal!
This adds some error checks and logging to inform users if they do not have the required packages installed, and most importantly prevents a ‘random’ crash.
Implemented by Ac_K in #2951
CPU - Implement FCVTMS (Vector)
We’re still finding games both new and old that put ARMeilleure (Ryujinx’s CPU dynamic recompiler) through its paces.
This change implemented the FCVTMS vector CPU instruction which allows games such as XCOM 2 to now boot. The struggle against endless CPU instructions continues…
Implemented by Saldabain in #2973
Update to LibHac 0.15.0
LibHac updates are usually followed with a large list of bug fixes and new games that will now boot due to the improved accuracy of the filesystem!
However, this time the new version was the equivalent of a spring clean with some reorganisation of the code and some minor changes. These changes are aimed at making future updates in the filesystem code much more seamless for our developers, and ultimately our users too.
Implemented by Thealexbarney in #2986
sfdnsres: Implement NSD resolution
Fixes a missing implementation of NSD usage when being requested by a couple networking-related services ‘GetAddrInfoRequest’ and ‘GetHostByNameRequest’.
This is but one of many networking fixes in this report!
Implemented by Thog in #2962
Return error on DNS resolution when guest internet access is disabled
When gdkchan implemented a lot of network fixes last month in #2936 (yes the one that lets you all watch YouTube!) this wasn’t without a blood sacrifice. As it turns out some games, most notably Crash Bandicoot 4, try to connect to servers very early in their boot process. Prior to this fix the game would crash immediately if the guest network option was disabled, as it would fail to lookup a DNS and proceed to error out.
This change returns to the old behaviour if the setting is disabled which allows Crash to boot successfully again.
Implemented by gdkchan in #2983
sfdnsres: Block communication attempt with NPLN servers
It’s been all over the internet lately that Nintendo are replacing their ageing ‘NEX’ server system with new ‘NPLN’ servers! Maybe smash can get rollback next time…
Some games such as Monster Hunter Rise were among the first to make partial use of this new system and more games will of course soon follow. This change simply adds the new servers to the internal DNS blocked list.
Implemented by Thog in #2990
account: Rework LoadIdTokenCache to auto generate a random JWT token
Many Switch servers use JWTs (json web tokens) for authentication. JWTs are a simple and standardized way to pass information between servers without storing it in a database.
This improves Ryujinx’s accuracy when using this call and brings it closer to the hardware implementation.
Implemented by Thog in #2991
bsd: Revamp API and make socket abstract
As some of you know and some of you don’t, Ryujinx is a project that is over 4 years old now and as such some of the codebase hasn’t seen the light of day for quite some time. Think about where you were 4 years ago!
As networking fixes were all the rage in early January it was time to venture once more unto the breach and back into the API and socket functions. The list of changes, updates and modernisations here is quite extensive but some highlights include:
- The socket implementation was separated from the IClient class (allowing for possible native implementation of the sockets in the future if needed)
- The IPC code of IClient was revamped to use more modern memory API’s
And my personal favourite:
- “...Probably more that I missed”
Implemented by Thog in #2960
ssl: Implement SSL connectivity
SSL, or for our readers who don’t have a background in networking, ‘Secure Sockets Layer’ is a protocol for establishing encrypted links between networked computers (this is the same protocol that gives you the little padlock on https sites!).
Some applications require SSL authenticated connections to boot/display things to the Switch and by extension Ryujinx. These include some games, and most notably applications such as Twitch, which can now function correctly.
Implemented by Thog & InvoxiPlayGames in #2961
Fix return type mismatch on 32-bit titles
After a larger addition a few months ago that optimized tail merges in the CPU recompiler, a minor issue could occur where the return type may not match the actual return type of the function due to the address being 32-bit, rather than 64. This would then cause an assert on the copy and cause mayhem!
This change resolves the assert and the problems causing it.
Implemented by gdkchan in #3000
kernel: Fix deadlock when pinning in interrupt handler
Even the best of us make small mistakes which start to seem quite major. A simple misplaced critical section leave was causing deadlocks on certain games such as DoDonPachi Resurrection and possibly other games too.
Luckily all that was needed was a basic rejig of only 2 lines of code and this was swiftly corrected!
Implemented by Thog in #2999
GUI/MISC
Add Cheat Manager
Cheating is such an integral part of video games that one of our developers felt that it should be integral to Ryujinx too. Cheats already technically worked fine before this change but they were always blanket applied and the users could not toggle them at runtime or select which cheats they wanted active from a large list, something the switch can do via cheat managers.
This change implements a cheat manager of our own that will parse your cheat files and allow these to be enabled selectively at runtime. Try it out in-game with Actions -> Manage Cheats (just make sure you have a valid cheat file placed correctly first!).
Implemented by emmaus in #2964
Implement analog stick range modifier
No controller is perfect, regardless of how hard PlayStation owners will try and convince you, and so over time their analog sticks are subject to wear and tear just like everything else. Deadzone adjustment can help to mitigate drifting of the sticks, but just like humans in old age, sometimes these old controllers just can't quite reach the same maximums as they used to be able to.
Range modification allows the controllers' “maximum” input to be reached earlier in the axis to help old or strangely designed controllers to input full directions. Games like Super Smash Bros. Ultimate require full input to be reached in order to consistently dash and so this change also helps even brand new controllers perform such techniques more easily.
Implemented by MutantAura in #2783
Closing words:
So far 2022 has already been quite the eventful year for us! The final section of this report may have bored you silly with network jargon but it did allow a Nintendo Direct x Ryujinx crossover!
It’s been a while since we talked about our UI rewrite in Avalonia but we’d like to assure everyone progress is still going strong and there’s even been time to make some of it quite fancy!
We’d like to thank everyone for their continued support and we hope to be able to bring you more (on-time) gossip next month!