Progress Report December 2021

Progress Report December 2021

Happy new year everyone! Ryujinx wrapped up the final month of 2021 with a blizzard of bug fixes, GPU improvements, HLE updates, code cleanup, N64 emulation(!) and finally, general system stability improvements to enhance the user's experience.

Patreon Goals:

Amiibo Emulation - merged into the main build in March 2021. While compatibility is now almost perfect, there are still some improvements to come for Amiibo which can be tracked on the associated Github issue here: https://github.com/Ryujinx/Ryujinx/issues/2122

Custom User Profiles - merged into the main build in April 2021.

Vulkan GPU Backend - still in progress, a public test build is delivered. A lot is being worked on

ARB Shaders - Goal reached in April 2021. As seen from last month's progress report, work on ARB shaders has been going smoothly.

ARB shaders will further reduce stuttering on the first run by improving the shader compilation speed on NVIDIA GPUs using the OpenGL API.

$2000/month - Texture Packs / Replacement Capabilities - Almost there!

This will facilitate the replacement of in-game graphics textures which enables custom texture enhancements, alternate controller button graphics, and more.

ETA once the goal is reached: ~3-4 weeks

$2500/month - One full-time developer - Not yet met

This amount of monthly donations will allow the project's founder, gdkchan, to work full-time on developing Ryujinx.

$5000/month - Additional full-time developer - Not yet met

This amount of monthly donations will allow an additional Ryujinx team developer to work full-time on the project.

Vulkan Progress:

December has seen a lot of progress on our Vulkan implementation.

First, as you may have noticed, we have been telling our users not to expect immediate improvements on Vulkan compared to OpenGL under NVIDIA graphics cards, with the exception of a few graphical fixes thanks to features that Vulkan supports but OpenGL does not. But thanks to recent performance improvements on the backend, a few titles are starting to outperform Nvidia OpenGL. For example, The Legend of  Zelda: Breath of the Wild is up to 25% faster in some areas.

OpenGL:

Vulkan:

There are a few other games that have seen improvements on NVIDIA with Vulkan aswell.

NVIDIA was not the only vendor that had improvements; Intel also had great performance uplifts in the past month, thanks to optimization that was made on the backend, and also several improvements to their drivers on Windows, which happened to fix bugs that affected Ryujinx too.

One example of this is Mario Kart 8 Deluxe, which still runs at about 30 FPS on OpenGL (varies a bit depending on the area and the view).

OpenGL:

Vulkan:

Below you can see a graph with tests on a few more games.

The games are Mario Party Superstars, Super Mario Odyssey, Luigi's Mansion 3 and Animal Crossing, respectively. The tests were performed on a laptop with an Intel i5 8300H CPU and integrated Intel UHD Graphics 630 GPU, and 16 GB of DDR4 RAM.

Above we can see a similar graph, this time tested with a AMD GPU. The tests were performed on a PC with an RX 570, Intel i5 7500 and 16 GB of DDR4 RAM.

Please note that the values on the graphs are percentages in relation to the target frame rate, not frame rate values. So 110 (%) there means that it can run at slightly above the intended frame rate. For a game like Animal Crossing that targets 30 fps, that means it can run at 33 fps. Please also note that performance on those games is dependant on several factors, including the area of the game and current amount of on-screen elements. Animal Crossing performance for example depends on the number of houses and other objects on the island. Our main goal here is to show how it compares to the current OpenGL backend on the same hardware and same spot.

Thanks to Harone for performing these tests.

We also had fixes for graphical glitches on the Vulkan backend in December. First, alpha test was implemented, which fixed the lack of transparency on a few games, including Mega Man 11 and New Pokémon Snap. The issue causing models on New Super Mario Bros U Deluxe to have black borders was fixed. Some texture corruption issues have been fixed, and Splatoon 2 can now load and run consistently on NVIDIA. A bug causing Shin Megami Tensei III Nocturne to render nothing other than the playable character was fixed. We are also working to fix issues that affect both Vulkan and OpenGL (like some texture related issues), but Vulkan is the most affected.

There are still some performance issues, like low frame rates on Super Mario Odyssey on AMD and overall low performance on Linux with NVIDIA and AMD, that we plan to investigate.

Please look forward to more improvements on next progress reports!

Now, let's get into this month's progress!

GPU

Fix FLO.SH shader instruction with an input of 0

An oversight on the implementation of this shader instruction made it produce the wrong results for a specific input value (zero). The implementation was corrected to fix this error. This error was found using our shader fuzzer that we mentioned on previous progress reports.

While we haven't found any game with visual improvements caused by this change, it's pretty likely that something was affected by it, since this instruction is often used for shader thread operations.

Implemented by gdkchan in #2876

Implement remaining shader double-precision instructions

Most games use 32-bit single precision floating point instructions for performance reasons, as 64-bit double precision instructions are slower and best avoided. There are, however, a few games that make use of those instructions, and one of them is World War Z.

Thanks to the shader fuzzer, we now have a way to test the shader translator without needing to actually launch a game and wait until the part where it uses a given shader with missing/broken instructions. Now we can just generate a shader with those instructions and use it for testing. This approach allowed us to easily implement all the missing double precision instructions, allowing the game to render correctly.

The implementation includes DMNMX (Double Min/Max), DSET (Double Set), and DSETP (Double Set Predicate), and for double-precision operations on the MUFU (Multi-function) instruction: RCP64H (Reciprocal 64-bit high half) and RSQ64H (Reverse square root 64-bit high half). Finally, this fixes the immediate operands on all double-precision instructions. Before it was being interpreted as the higher 20-bits of a float value converted to double, when it should be the higher 20-bits of a double value.

This allows World War Z to progress further, but it still can’t progress past the menu due to a few other errors.

Implemented by gdkchan in #2845

Move texture anisotropy check to SetInfo

Some games on Ryujinx have texture/sampler counts when anisotropic filtering is not Auto (notably Unreal Engine 4 titles). Rather than calculating this for every sampler, this change calculates if a texture can force anisotropy when its info is set, and exposes the value via a public boolean. This should improve performance on games with heavy texture/sampler counts.

Implemented by riperiperi in #2843

Fix SUATOM and other texture shader instructions with RZ dest

This is another shader issue found with our fuzzer. The shader translator would produce invalid code if the shader contained a SUATOM (Surface Atomic) instruction with a RZ destination register. This has been fixed and now a valid shader is produced for this instruction encoding too.

Implemented by gdkchan in #2885

Add support for releasing a semaphore to DmaClass

If you've tried Undertale on this emulator before, you may have noticed that on some specific sections of the game, it would slow down a lot, to the point of being unplayable. This was not the only affected game: several other OpenGL games had a weirdly similar pattern, where they would pause for about 10 seconds and then continue.

The fact that it would pause and then continue after this specific amount of time was indication that the OpenGL driver was waiting for something, but whatever it was waiting for did not happen, and it would just give up after 10 seconds. It turns out that the driver was waiting for a semaphore release operation that did not happen. With the operation properly implemented, the freezes no longer happen and Undertale runs at the correct speed.

This also fixed some graphical issues, like for example, the thumbnails missing from save games on some visual novels, and a softlock on a specific level of the game Record of Lodoss War: Deedlit in Wonder Labyrinth.

Implemented by riperiperi in #2926

Fix for texture pool not being updated when it should + buffer texture fixes

This one is a batch of fixes for buffer texture related issues, but the most notable one was the black vertex explosions in some Unreal Engine 4 games. The cause was an incorrect buffer texture being bound, due to it missing some changes to the texture pool (the region of memory where GPU texture information is kept).

One of the affected games was Dragon Quest XI S, see the screenshots below for a comparison.

Before:

After:

Also fixes black textures in Balan Wonderworld Demo...

...and flickering black textures in SnowRunner.

Before:

After:

Implemented by gdkchan in #2911

Fix I2M texture copies when line length is not a multiple of 4

The Switch GPU has an engine called Inline-To-Memory (I2M) that is used to push data to GPU memory. The data is submitted on the command buffer, at a granularity of 4 bytes. That's because the command buffer data is divided into 4 bytes values.

This actually imposes a limit on the data that is submitted using this method. Since the data is divided into 4 bytes values, all the data submitted must be padded to align to 4 bytes. For textures, it means that each line of the texture must have its size in bytes aligned to 4. If we take a RGBA8 texture, which is a very common format, we can see that it is already naturally aligned to 4, since on this format, each component takes 8-bits (1 byte), and it has 4 components (red, green, blue and alpha). But if we take a format like R8 (still 1 byte per component, but only one component), then the format no longer naturally aligns. Depending on the width of the texture, we may have a line size that is not a multiple of 4.

The issue here is that the emulator was simply ignoring the padding, and assuming that the data was supposed to go into the next line. This would create some sort of staircase effect where all the lines of the textures were misaligned. The fix was simply taking this padding into account and skipping it.

To see what it looks like in practice, we can take a look at Cat Girl Without Salad, one of the affected games.

Before:

After:

Pay attention to the subtitle text.

You might be wondering why only the text was affected on this game. The reason is simply, the font uses a texture with R8 format, since it only requires one color channel (the text only has a single solid color afterall). This format, as explained above, does not use a multiple of 4 amount of bytes per pixel, and depending on the texture width, it could trigger the bug.

Implemented by gdkchan in #2938

Fix DMA copy fast path line size when xCount < stride

This fixes an issue related to texture copies. In some specific cases, the copy could be out of bounds, causing a crash. The specific case triggering this was a linear texture, where the copy region width was less than the stride (amount of bytes per line) of the texture.

It was causing random crashes on the YouTube app for the Switch, and might also affect a few other OpenGL games.

Implemented by gdkchan in #2942

Flip scissor box when the YNegate bit is set

GPUs support a feature called "scissor" that does what the name would suggest: it cuts one region of the output image, or more precisely, it restricts the rendering to the region specified by the scissor rectangle. Anything outside that region is simply not rendered.

On OpenGL games and apps, it was causing issues because the coordinates of the scissor rectangle were inverted, so the region being cut was completely incorrect. This is because there is a register controlling if the origin point is at the top or the bottom of the image, and since that register was being ignored, it was using the wrong origin in some cases.

This fixes menus being cut off in the YouTube app.

Before:

After:

Also fixes the in-game UI in Bloons TD 5.

Before:

After:

Implemented by gdkchan in #2941

Fix A1B5G5R5 texture format and Add support for the R4G4 texture format

Nintendo released a Nintendo 64 emulator rather recently on the Switch for NSO users. The emulated games were not working as they require the JIT service (which is not implemented), but in December we started working on the changes required to get it up and running, making Ryujinx the first-ever Nintendo Switch emulator to be able to boot and run this official Nintendo 64 emulator. While not complete yet, there is a PR open if you want to give it a try (here). What's more, running it also revealed a few graphical issues. This is one of them.

First, the A1B5G5R5 format was incorrect on the OpenGL backend, which caused the textures to have the wrong colors. A pretty easy fix, we just had to change the OpenGL format and invert the texture swizzle.

Before:

After:

It also had another issue caused by a missing texture format.

Before:

As you can see, the buttons are not being rendered on the HUD. They use the R4G4 texture format, which was not implemented before. It is very similar to the much older L4A4 texture format (4-bits of luminance and 4-bits of alpha). However, while this format was once supported by OpenGL, it has since been deprecated so we can't use it anymore. So instead, we need to do conversion on the CPU to a compatible format. Vulkan does support the format, so no conversion will be required once the change makes its way to the Vulkan branch too.

After:

With those fixes, the game is now rendered properly.

One can see why those textures are using this format. They only have a single color, in addition to transparency, so the format with one color channel and one for transparency is just ideal. And it being 4-bits is probably a choice made due to the memory limitations of the Nintendo 64.

Please note that this emulator has its own emulation issues that happen on the Switch as well, and therefore those issues will also happen on Ryujinx. So if it doesn't look like the game running on a real Nintendo 64, it might be a NSO emulator issue rather than Ryujinx.

Implemented by gdkchan in #2955 and #2956

Force crop when presentation cached texture size mismatches

Before, the presentation texture size was used to find a matching texture on the cache, but after that, it was not used anymore, instead, it used the cached texture size. The problem is that due to size alignment, the cached texture might be actually larger than the presentation size, leading to gaps when the texture is presented. The fix is relatively simple, we simply crop the texture based on the presentation size before showing it on the screen.

This solves alignment issues the Nintendo Switch Online Nintendo 64 emulator, Super Mario Sunshine. Hades and maybe a few other Vulkan games had.

Before:

After:

Implemented by gdkchan in #2957

HLE/Kernel/CPU

kernel: Improve GetInfo readability and update to 13.0.0

This one is mostly refactoring. It does not have any effect on games, but makes the code easier to read and makes it up to date with the changes on the latest version of the official kernel.

Implemented by Thog in #2900

Implement UHADD8 instruction

This implements a missing 32-bit CPU instruction required by a few games. This specific instruction is used by No More Heroes and No More Heroes 2. According to our testers, the game is not yet playable, but can now boot further with this implementation.

Implemented by piyachetk in #2908

Implement CSDB instruction

This is a 32-bit instruction required by the recently released Monster Rancher games. On the Switch CPU, it does nothing since the CPU is quite old and this instruction was not yet supported there, so the implementation was very simple.

Both games are now playable.

Implemented by gdkchan in #2927

Update to LibHac v0.14.3

LibHac is a .NET library that reimplements some parts of the Nintendo Switch operating system, also known as Horizon OS. Ryujinx uses Libhac for its file system. This updates LibHac dependency to version 0.14.3 which brings many improvements to Ryujinx’s file system. It makes the emulator all the more accurate while also allowing some games to boot that didn’t before.

Most notably, this update adds support for NCAs with sparse partitions and fixes an issue related to games that do not contain an NCA data partition (this one was actually a Ryujinx issue, not a LibHac issue). Both of those allowed some games to work for the first time.

As an example, we have Ruined King: A League of Legends Story working.

(The red glitch is caused by resolution scaling).

Another game using the sparse storage is Lost in Random, which also now works thanks to this update.

As an example of game without a data partition, we have Fire Emblem Shadow Dragon and the Blade of Light, which is an emulated NES game with the ROM embedded on the executable, which is why it has no data partition.

It was technically possible to run the title before, by first unpacking it and loading as an unpacked game (which was the standard way of running games in the early days, before Ryujinx had support for XCI and NSP). Now it works properly without unpacking, like the other games.

Implemented by Thealexbarney in #2925

Remove PortRemoteClosed warning

The emulator automatically logs some result codes returned by the kernel as a warning, because some of them might indicate an error. Some of those results codes are returned under normal operation however, instead of being actual errors. We already filter some of those results to not print them as a warning, but the PortRemoteClosed was missing from that list. We have now added it too, which removes the warning. We had a few users ask what was wrong because they were seeing the warning multiple times, so removing it also solves this issue.

Implemented by gdkchan in #2928

Fix bug causing an audio buffer to be enqueued more than once

This was a small oversight that would cause an audio buffer to be enqueued more than once. It was caused by a variable not being incremented, which would cause the same buffer to be picked multiple times. In addition to making the same audio data play more than once, it was also messing up buffer release, which would cause the backend to become starved as not enough audio data was coming in, causing terrible audio crackling in some specific cases.

This was affecting the YouTube app, although some games are probably affected as well. With it fixed, the audio now plays perfectly on this app.

Note that this bug affected all audio backends.

Implemented by gdkchan in #2940

Fix GetAddrInfoWithOptions and some sockets issues

This is one of the functions related to DNS resolution. This specific function can be used to get an IP address from a host name, which is later used to connect with the servers.

While the function was already implemented, one of the variants of the function was not correct, as it was writing the result values at the wrong location. The end result was that applications calling the function would return an "empty" result, as if the resolution produced no IP addresses.

The fix was just re-arranging the fields to have them written at the correct location.

This allowed the Switch YouTube app to work for the first time on a emulator. As we have shown before, it had a number of graphical issues, but they all have been fixed and the app works just like it would on a real console.

And just in case you're wondering, 360° videos and live streams are also working.

The YouTube app is one of the easiest among the ones that requires Internet access to get working on an emulator, because it does not use the SSL service (instead it uses some SSL library internally), and also because it does not require the Web Applet. Other apps that we have tried make use of both the SSL service and Web Applets (both are not currently implemented right now).

Most games that have some online capability require a Nintendo account to function. Since we don't have one on the emulator, they don't get very far even with those services implemented.

All that said, it's still nice to see this app working, and a great milestone for Ryujinx to have network functionality working in this capacity. We also have a SSL implementation in the works and will share more about it on the next progress report.

Note: A new "Enable guest Internet access" option was added in the settings (system tab). The YouTube app only works with this option enabled. When enabled, it indicates to the game/app that there is a network connection available. Some applications will assume that there is no network connection if it is disabled.

Implemented by gdkchan in #2936

Use minimum stream sample count on SDL2 audio backend

On titles where the buffer size was changed constantly, a lot of audio crackling could be heard when using the SDL2 audio backend. That was caused by the output audio stream being re-created every time the backend received a new buffer size that was not divisible by the old one.

Now, the stream is only re-created if the buffer length is lower than the old one. This avoids the massive slow down and audio crackling that was caused by the old approach. Affected games include Final Fantasy VII, Animal Crossing, Pokémon Sword/Shield (on videos only), the GTA Trilogy, the YouTube app when watching live streams, and likely more. With the change, the audio now plays perfectly on those titles.

Implemented by gdkchan in #2948

hid: A little cleanup

More refactoring, improves the code, but has no visible effect on games.

Implemented by Ack77 in #2950

kernel: Implement thread pinning support

This adds support for 8.x thread pinning changes and implements the SynchronizePreemptionState syscall.

Based on kernel 13.x reverse, this likely fixes a few softlocks games could have.

Implemented by Thog in #2840

am: Stub SetMediaPlaybackStateForApplication

This specific function is used by the YouTube app to make the operating system aware that there is a video playing. We have stubbed it to avoid unimplemented service related crashes with the "Ignore missing services" hack disabled.

Implemented by Ack77 in #2952

friend: Stub IsFriendListCacheAvailable and EnsureFriendListAvailable

More stubs, those functions are used by the Super Bomberman R Online game. The game now no longer crashes with ignore missing services disabled, but is still not playable, most likely due to online functionality related issues.

Implemented by Ack77 in #2949

GUI/MISC

misc: Migrate usage of RuntimeInformation to OperatingSystem

This changes how runtime information was displayed in our code base. As an example, for Windows before it showed as “if (RuntimeInformation.IsOSPlatform(OSPlatform.Windows))” but now it shows as "if (OperatingSystem.IsWindows())".

This results in much cleaner and simple code as there’s much less clutter being added.

Implemented by Thog in #2901

misc: Fix alsoft.ini being present on Linux releases

An oversight made alsoft.ini be present in the Linux releases when it shouldn’t have because it’s not supported on Linux.

Implemented by Thog in #2902

Remove usage of Mono.Posix.NETStandard across all projects, Remove unused empty Ryujinx.Audio.Backends project, Remove debug configuration and schema

Thog went hard at work with removing a lot of legacy files from our source code as they weren’t being updated and were not required anymore. Some cleanup was also done to make the code base much cleaner.

Implemented by Thog in #2920, #2919 and #2906

Using more intense lossless compression

This makes our assets smaller by using more intense lossless compression via the tool Optipng.

Implemented by Mou-ikkai in #2811

UI - Add Volume Controls + Mute Toggle (F2)

A long-time-requested feature was to be able to control the volume of a game through the emulator instead of using the user's OS’s volume settings.

With this update, users can now change the emulator's volume without needing to mess with any OS settings. It should be noted the default level is always 100% and will always reset back to the default level once you close the emulator

Implemented by saldabain in #2871

Closing words

We’d like to thank everyone who contributed in any way to this project whether it be through code contributions, testing, or by being a patron. We can’t even begin to show how encouraging it is to see everyone be excited by our hard work and everyone’s dedication to our project. This past year has been a rough ride for everyone all around, and from all of us here at Ryujinx, we once again wish you a Happy New Year and an amazing 2022!