First Patreon Goal Feature—Amiibo Emulation—is Here & the January 2021 Progress Report

First Patreon Goal Feature—Amiibo Emulation—is Here & the January 2021 Progress Report

As promised after reaching the Patreon goal last month, Amiibo emulation has arrived! While not yet in the main build, anyone is free to test Ryujinx's Amiibo emulation now; we expect this feature to be finalized within days. Before we dive into the progress report, it's time for a quick update on our Patreon goals.

Patreon Goals Update

Wow! The outpouring of support since we introduced Patreon goals has been nothing short of amazing. THANK YOU! Aside from the Amiibo emulation goal mentioned above, in just the last few weeks we reached two more goals: custom user profile support, and the addition of a Vulkan GPU backend. Though the custom user profile support Patreon goal itself was reached in January, the ETA for this feature is roughly ~4 weeks from now (as the developer who will be handling this feature has been fervently working to deliver Amiibo!), and for the Vulkan GPU backend we are aiming for an early to mid April arrival. Our current remaining unmet Patreon goals are:

$1500/month (almost there!) - ARB Shaders
ETA once goal is reached: ~3-4 weeks
ARB shaders will further reduce stuttering on first-run by improving the shader compilation speed on NVIDIA GPUs using the OpenGL API.

As you can see, we underestimated the amount of support we were going to receive in such a short time span. We'll be adding more goals soon!

January 2021 Progress Report

We kicked off the new year with a month full of bug fixes & advancements in performance and compatibility. There were also a few key updates in the UI & applets, thanks largely to an influx of new contributors. We’re excited and encouraged to see a more active development community for the emulator, and look forward to a productive 2021!

GPU Improvements

Shooting Blindly - Implemented missing texture formats

There are a variety of texture formats present in Switch games, and not all of them have been implemented in the emulator. A game that used one of these missing formats is Psikyo Shooting Stars Alpha, whose title screen and in-game graphics were either distorted or mostly missing.

Below you can see the Psikyo Shooting Stars Alpha title screen and one of the embedded games, with neither looking their best:

With the missing texture formats added, the title screen and most of the embedded games now look as they should.

Other games improved by this change are Psikyo Shooting Stars Bravo and Skygamblers – Afterburner.

Implemented on #1867 by AcK77

Hunting a Monster - Tackling the Monster Hunter Rise Demo

As seen on the above screenshot, at first, the game did not render anything other than the UI. After some investigation, using graphics debugging software to examine what the game was drawing, we successfully isolated the problem that was causing the screen to render as solid blue. The issue was caused by a shader instruction that was not fully implemented, causing some parts of the shader code to not be executed.

As is often the case with this kind of bug, fixing it was easy; finding the problem, not so much. But with it fixed, more of the scene is now visible.

Fixed on #1878 by gdkchan

This is better, however most of the scene is still missing. What's going on? The problem here is that the models are not being drawn at all. By intercepting NVN (the Switch graphics API) function calls made by the game we discovered that the game was making use of indirect draws, which is not something used very often. For this reason, it was the most likely culprit. Normally, the parameters used on a draw are passed from the CPU, and recorded on the command buffer that is sent to the GPU; this means that the draw will always have the same parameters. Indirect draws on the other hand, reads its parameter from GPU memory. This means that the parameters can be written by the GPU itself, and may change between draws. What this game does is write one of the parameters (the draw count) from a shader. This effectively allows the game to control what is rendered or not in subsequent draws from a shader.

Now that we know what the game is trying to do, the question is: why is it not working? The answer is buffer flush. On the Switch, command processing happens on the GPU. However on the emulator, we must do this on the CPU since the PC’s GPU would not be able to understand those commands, as they are GPU specific. Since the draw count for the indirect draw is located on GPU memory, and we read it from the CPU on the emulator, we clearly have a problem here. We can solve the problem flushing the GPU memory data when the command buffer is read from the CPU.

With that in place, we can finally see most of the scene rendered.

Fixed on #1790 by riperiperi

Some textures here look completely corrupt. Looking at the input textures used on one of the draws, we can see one texture that is clearly broken.

The first step to figure out what is breaking the texture, is to log all uses of the texture, where it is located in memory, and any information that could help track the problem down. Logging the memory mapping eventually revealed the problem. The texture was mapped to a non-contiguous memory region. What that means is that while the texture data is scattered through different memory regions, the emulator assumed that it was contiguous. That is, it basically assumed that all the data was stored on the same memory region.

With a bit of work, support for textures on non-contiguous regions was added, fixing the broken texture.

Fixed on #1905 by gdkchan

More progress, but we can still see several problems here. For one, the singer on the title screen is not supposed to be rendered completely black. Some further investigation revealed that the normal vectors for the model were completely incorrect. Normal vectors are vectors perpendicular to the surface of the model, pointing to the direction the surface is facing, and is used to calculate how much light each point of the surface receives. To sum it up: it is one of the components used to calculate lighting intensity so it’s not surprising that having incorrect normals causes the model to be rendered completely black, as if it received no light.

The issue is visible in the picture below. The shader was modified to render the normal value calculated on the pixel shader rather than the regular colors that it is supposed to draw, to make debugging easier.

The normal vectors are part of the model vertex buffer data, which means that the problem most likely exists in the place where it is written. In this case, the vertex data is actually written from a compute shader, so it turns out that the princess is in another castle! We can keep examining this capture and try to locate the bug.

Examining the compute shader revealed 2 instruction bugs. The first was related to an instruction that is supposed to write data to shared memory, called Store Shared (STS). The code was supposed to zero-initialize the memory, clearing the previous contents, by doing a 64-bit store. However, due to a bug, the shader generated by the emulator was only writing 32-bits. The reason for the problem is that the instruction was using "RZ", or Register Zero, which is basically a constant value of zero. When used with a 64-bit store, it is supposed to write a 64-bit zero value, rather than a 32-bit one.

With that fixed, another bug remained. This one was related to the Integer Set Predicate (ISETP) shader instruction. What this instruction does is compare two numbers, using a specified comparator (such as equal, not equal, greater than, etc), and writes the result into a register. For example, it may be used to check if a given value x is greater than the value y. If it is, then it writes -1 into the result, otherwise it writes 0. Everything up to this point was already correctly implemented; the problem is that the shader was actually using another variant of this instruction, "ISETP.X". The X here stands for extended. This is necessary because the Switch GPU does not support 64-bit integer operations with a single instruction, so what it does is break the operation down into two 32-bit operations instead, with the extended mode used to continue the operation, 32-bits at a time.

With that fixed, the normals are now correct.

Yes, that's what the previous picture was supposed to look like.

So what does it look like now with the normals fixed?

Fixed on #1901 by gdkchan

Well, that's still not correct...

The problem here is caused by the lack of buffer clears. NVN implements buffer clears using the Switch GPU copy engine, which has a feature that allows using a constant value as the copy source value. This feature was not implemented on the emulator, which means that buffer clears would not work, and the vertex buffer would carry old values on the normal vectors, which causes the above issue.

After that was implemented, the singer now renders correctly.

Fixed on #1902 by gdkchan

Shadows however still look off. They are not supposed to be pitch black like that, so we have more work to do here.

Looking at the textures used on one of the draws already shows one issue. All the mipmap levels of the texture, except for the base level, are completely black. Upon further inspection on the base level, we can see that we have smaller versions of the texture rendered on top of each other. It's pretty clear they are supposed to be rendered to the other levels, but aren't. So why is that not working?

To find what is causing the issue, we first need to find the exact operation that the game is using to write to those textures. This can be done by logging all texture accesses done by the GPU. Doing so quickly revealed that the texture was being written to by a compute shader. The problem is that the game was binding the different mipmap levels of the texture, one by one, for compute access. This was not supported on the emulator yet, so it was just rendering all of them to the same base texture.

After this was fixed, the texture now renders correctly.

Fixed on #1911 by gdkchan

But, unfortunately, the shadows are still wrong. So what is going on?

The problem was another texture, also written by a compute shader, that was incorrect. The logging setup helped to find the culprit, and examining the shader code quickly revealed the issue. An instruction was incorrectly implemented: namely, the Logical Operation with 3 inputs (LOP3) instruction.

With this fixed, we finally have something that looks correct.

Fixed on #1910 by gdkchan

Lastly, there was a very annoying issue making the game unplayable. Basically, the screen would randomly go black during gameplay, then slowly fade away to the correct color. It looked like this:

The most obvious way to debug this issue is making two frame captures: one with the game rendering correctly and one with the bug, then compare the two. Using this method, the issue was quickly tracked down. The value on one of the constant buffers used by a compute shader was incorrect, and this would cause other issues later on. Fortunately, fixing this was easy. The bug was caused by an optimization where it skips updating the constant buffer data if the emulator thinks that no data was modified. On a specific case, this prevented the data from being updated even when it was modified.

With this fixed, the bug is gone!

Fixed on #1892 by gdkchan

The game also required a few service functions that were previously not implemented. They are now stubbed allowing the game to boot fully without the need for the ‘ignore missing services’ option to be enabled.

Implemented on #1893 and #1876 by AcK77

A Smashing Good Time - Fix compute reserved constant buffer updates

One of the most annoying bugs that haunted Ryujinx for quite some time was the issue of vertex explosions during gameplay. It’s not very enjoyable to be inundated with spiky garbage on the screen every couple of seconds, so many users simply did not play the game on Ryujinx. Last month while chasing down a specific bug in the Monster Hunter Rise Demo that was causing the screen to randomly turn black with white elements, it was discovered that the compute reserved constant buffers were reading values from the wrong SSBO. This issue was addressed by ensuring these buffers update their values from the correct location, fixing the bug. A side effect of this fix was discovered almost immediately when testing the PR on other games that use compute and SSBO. One of these games is Super Smash Bros. Ultimate. Yes, this issue was resolved by fixing a different issue entirely!

Before:

With the compute reserved constant buffers properly updating, the problem is finally fixed.

Fixed on #1892 by gdkchan

Parallel Processing Prevents Poor Performance -  Enable parallel ASTC decoding by default

ASTC textures are large, compressed textures that cannot be displayed natively on most desktop GPUs; they must first be decoded/decompressed. This decompression process is performed by the CPU and, as many ASTC textures are on the larger side and are loaded into VRAM after being decoded, this can take some time to do. This manifests as increased load times or stutter, depending on the game’s coding.

Previously, this process was limited to a single CPU thread as there was some concern about whether there would be enough CPU leftover to handle the game and background JIT. But with PPTC (which is now enabled by default) the JIT cost is eliminated after a few runs, mitigating the need to limit the number of CPU threads to just one. This change divides the ASTC decode process by the number of host CPU cores (physical or virtual), and sends a processing thread to each. The most obviously affected games are the Monster Hunter Rise Demo (which could not be played on LDN multiplayer without this change), Astral Chain, and Animal Crossing: New Horizons.

Implemented on #1930 by gdkchan

Don’t Forget to Flush - Implement lazy flush-on-read for Buffers (SSBO/Copy)

Shader Storage Buffer Objects (or just SSBO for short) are regions of memory that shaders can read and write, accessing and storing any data it wishes there. Emulators face a common problem on desktop GPUs, that is caused by the fact that they have separate, dedicated video memory (VRAM) which is used to store this data, while on the Switch (and mobile hardware in general, but that’s also often the case on consoles), this memory is shared between the CPU and GPU. Evidently, the emulator should behave like the Switch for the game to work correctly, and that means the game must have the illusion that this memory is shared, when it isn't. One of the key pieces to make this work is memory flushing. The term is often used to describe the process of writing data that is inside a cache back into main memory, thus making it available to other hardware components. In this case, it means making the data that is stored in dedicated GPU memory available to the CPU, which is accomplished by copying this data from VRAM to main, CPU-accessible memory.

At this point, it should be a bit more clear how SSBO flush helps. It makes the data that would be otherwise only available to the GPU itself also available to the CPU. This means that the game can access it from the CPU by reading the memory region where the data is located, just like it would on real hardware. Because the Switch GPU is not fully emulated on the host PC GPU, this also becomes useful for the emulator itself as it needs, at times, to read GPU written data from the CPU for Switch GPU emulation.

Lastly, this also brings performance improvements. It might be a bit counterintuitive since what buffer flushes does is copy data from VRAM to main memory, and this is not exactly a cheap operation. This is not only due to the copy itself but also because, at times, we might need to wait until the GPU is done writing the data before doing the copy. What happens is that before this change, we had to flush the data all the time to ensure that the data on main memory was also correct. This change allows the data to be flushed on demand; that is, it only flushes if or when the CPU needs it. This not only eliminates some useless buffer flushes, but also gives more time for the GPU to complete the operation. This means that by the time the flush is required it may be already done and there is no need to wait at all, or the time it needs to wait is greatly minimized. Unfortunately, this also caused performance regressions on a few games; the reason at this point should be pretty clear.

The net effect of this update was to fix vertex explosions and improve performance in all Unity games, fix the flickering in Link's Awakening, fix particle effects that were broken or constantly restarted their animation in many first party games, as well as significantly improve rendering in the Monster Hunter Rise Demo.

Effect Fixes:

Mario Kart 8

Before:

  • Explosion & snow particles:
  • Drift particles:

After:

  • Explosion & snow particles:
  • Drift particles:

Super Mario Odyssey

Before:

  • Birds:
  • Flagpole:

After:

  • Birds:
  • Flagpole:

Link's Awakening (flashing)

Before:

After:

Implemented on #1790 by riperiperi

Add support for shader atomic min/max (S32)

A new Disgaea demo & game were released in January, and with it came the use of a shader function that was not supported in the emulator. This update added support for the atomic minimum and maximum with a signed integer type, and adds new functions that can do signed atomic min/max using a Compare and Swap (CAS) loop. This fixes broken graphics in the game as you can see below.

Disgaea 6: Defiance of Destiny

Before:

After:

Fixed on #1948 by gdkchan

Let it Mellow - Do not flush multisample textures

On the guest, multisample textures have their width & height multiplied by the amount of samples in each axis. When the texture is passed to the host, the actual size of the texture is calculated by dividing the width & height by the amount of samples in X and Y. Then the total amount of samples (samples in X * samples in Y) is also passed to the host. This creates a mismatch during flush where the texture that it gets back from the host is actually smaller than what it is trying to write in memory. This causes an access violation on Linear -> Block Linear conversion as it attempts to perform out of bounds access. Another problem is that the host simply does not support getting data from a multisample texture.

The fix for this is to simply disable flushing for those multisample textures. There is no reason for reading them from the CPU plus most games don't use multisampling anyway, so there’s no reason to expect them to need that data.

This fixed a specific crash on a few games including Super Bomberman R, fault - milestone one, and Leisure Suit Larry: Wet Dreams Don’t Dry.

Fixed on #1973 by gdkchan

Kernel & CPU Improvements

Update KAddressArbiter implementation to 11.x kernel

Parts of the HLE kernel implementation were updated to match the version 11.0 of the official kernel (which is currently the latest version), including changes to the condition variable syscalls. These are required to allow newer games, targeting this firmware version, to work. At the time of this writing, however, no games are known to target this new version.

Implemented on #1851 by gdkchan

Lowering the Bar - Add a simple Pools Limiter

One issue some users have noticed more recently is that the emulator seemed to use a lot of memory during boot for some games, and overall the memory usage seemed to be higher than usual. This update significantly reduces memory usage during boot/initial load, in some extreme cases by up to 90%. Memory usage overall during gameplay was reduced by between 30-50% on average. It also further reduced game boot times by 10-15% on average.

Most games were affected to some degree but below is an average example of the changes that occurred, using Luigi’s Mansion 3 as a reference.

Before:

After:

Implemented on #1830 by LDj3SNuD

A Little TOO Accurate - Lower precision of estimate instruction results to match ARM behavior

A couple of ARM estimate instructions (RECPE and RSQRTE) only produce 8 bits of fraction, with the remaining 15 bits being filled with zeros. The equivalent x86 instructions also return an approximation but with higher precision, so they return a result different from ARM. This led to some amusing if not frustrating bugs in a few affected games. One of them was Catherine: Full Body, in which the player would be summarily pushed off a ledge to fall to their death by the bug, instead of grabbing on to the seam of the box as it should be. To fix the issue, the results are instead rounded to the nearest and limits the number of bits of the fraction to 8, to match ARM.

Before:

After:

Another affected game was Super Bomberman R with update 2.2 applied (the bug did not occur on the base 1.0.0 version of the game) that sank the players into the floor, rendering them unable to move.

Before:

After:

Other affected games were Slayaway Camp: Butcher’s Cut and Friday the 13th: Killer Puzzle (both had their joystick input stop working after a single move, which did not allow you to progress through a level), and Out of the Box (input would stop working, causing a softlock). These games are also fixed by this update.

Implemented on #1943 by gdkchan

Implement PRFM (register variant) as NOP

A long-standing missing CPU instruction preventing Edna & Harvey:  Harvey’s New Eyes, and Deponia, from booting was finally addressed. This update implements PRFM (register variant) as NOP, as well as corrects a typo in the code referring to the instruction as “pfrm” instead of “prfm”. Now both games are able to boot and go in-game.

Implemented on #1956 by mageven

HID & GUI Improvements

A Bug HIDing in Plain Sight - Update missing sample timestamp in DebugPad

In the spring of last year the HID shared memory implementation was reworked, fixing a large number of bugs and greatly improving the user input experience. But the update caused a regression in the few games in the Switch library that need a proper DebugPad, making them now crash on boot. These included Ninjin: Clash of Carrots, RICO, Darkwood, Defunct, 1917 – the Alien Invasion, and possibly a few others. To fix it, the second sample timestamp needed to be updated along with the first.

As you can see below, the comical ninja bunny beat ‘em up and other games listed above are once again able to boot.

Implemented on #1873 by mageven

Ooey GUI Code Cleanup - GUI Refactoring: Part 1

The GUI code had not been touched essentially since its original implementation, and was in need of a good cleanup effort. This update improves the organization and overall quality of the code; it also added a special new section in the UI to acknowledge our Patrons contributing $10/month or more.

Before:

After:

Implemented on #1859 by AcK77

Add Support for Inline Software Keyboard

Some games use a particular method for allowing text input; this may be at a character creation/name screen, a chat window for multiplayer, or an opportunity to name a pet. Most of these games use what is called a software keyboard applet to facilitate the text input. Last year, the emulator was updated with a software keyboard implementation that allowed the user to provide keyboard input to these software keyboard calls.

However, there are a few games that use a different method called an inline software keyboard, and any time this applet was called the emulator would softlock. A few of these games are Monster Hunter Generations Ultimate, Fate/EXTELLA LINK, and Gnosia. Because of this, for Monster Hunter and Gnosia the game could not proceed into gameplay without the use of a save file, and for Fate/EXTELLA LINK, the character could not be renamed. With the inline software keyboard implemented, these games can now be played normally without the need for a save file.

Implemented on #1868 by caian

Add Show Confirm Exit Toggle

Up until now, when a user closes the emulator window while playing a game there is a pop-up window asking if they want to stop emulation. In some situations, such as using Ryujinx in a multi-emulator frontend, this pop-up window is undesirable. You can now toggle an option restricting the pop-up from occurring.

Implemented on #1856 by macabeus

Emulate a circular zone for keyboard analog sticks

While most would agree that a controller would seem like the best input option, some users opt to instead use a keyboard. This gets tricky when analog input comes into play, as the key basically only offers a 0 or a 1, with no range in-between. Because analog stick input was configured to be a square zone instead of circular, pressing two directional keys at once for a diagonal direction was not working. To fix it the square zone was changed to a circular zone, and now keyboard users can play their favorite games that require diagonal stick input without issues.

Implemented on #1906 by mageven

Image is Everything - Update Controller Images

The UI input page is a frequent destination for many users due to the need to remap for docked vs. handheld modes, or changing the controller type to match the game’s expected input. For some, this was an exercise in frustration because the controller images themselves were sometimes less than visually appealing, depending on the operating system and theme being used. With the new controller images, no matter which OS or theme you use, the controller is easily visible. Example below taken from Linux.

Before:

After:

Implemented on #1951 by Ayato-Kirishima

Fix some GLXBadDrawable crashes on Linux

While not as popular as Windows, there are still plenty of users who enjoy Ryujinx on Linux. There has been a long-standing Linux client bug where the emulator would crash as emulation was stopped. In titles containing embedded games, the first step to launching an embedded game is to stop emulation, which in turn would cause the emulator to crash. This meant that any multi-game collections or other embedded games were not working on the Linux client. The issue was fixed by slightly changing the order of operations during emulation termination, and adding a dispose function for the GLRender Widget. Now the emulator exits properly and can launch multi-game collections/embedded games in Linux again.

Implemented on #1900 by SeraUQ

TZ: Fix loop condition in GetTZName

The time service has been implemented and working since 2019, but this doesn’t necessarily mean that no bugs are present! Since the Switch firmware 11.0.0 release, some users have experienced incorrect wall time with time zones that incorporate DST (Daylight saving time) rules. Thanks to mageven, a bug was found and fixed in POSIX timezone name parsing logic that was causing it to return too early and fail to properly parse the timezone name.

Implemented on #1950 by mageven

Stay Alive, Damnit! - Prevent Display Sleep on Windows while running a game

Though not a widespread issue, some Windows users were experiencing the display going to sleep while playing a game after a period of time, even if they were actively using controls. With this update, Windows will not allow the display to sleep while the emulator is actively running a game.

Implemented on #1850 by EliEron

Making it Easy for Newcomers - Initial Setup: Reload keys before verifying firmware

The first time someone installs the emulator, they are prompted to install their keys file. Up until now, after putting the keys file in the right place, the user had to close the emulator and re-open it in order to install a firmware. With this update, closing and reopening the emulator is no longer needed; the user can install firmware immediately after placement of the keys file.

Implemented on #1955 by mageven

Change of default options - Enabled PPTC and Docked mode by default

With PPTC (Profiled Persistent Translation Cache) nearing eight months since it was merged to master, and enough games having been tested to determine its viability as a default option, the decision was made to enable PPTC by default. This way many users who may have not enabled the feature on their own will reap the benefits of reduced load times and smoother game transitions.

Implemented on #1844 by EmulationFanatic

Additionally, as Docked mode offers visual and input option improvements over Handheld, it was also finally configured to be enabled by default.

Implemented on #1953 by Ayato-Kirishima

We want to highlight again how encouraged we are by the new contributors who have stepped up and are helping to improve the emulator.

New code contributors January 2021:

Caian
macabeus
EliEron
PineappleEA
Ayato-Kirishima
steveice10

Thanks to everyone that has supported us so far, be it via Patreon donations, code contributions, testing games in the emulator, or simply being an active member of our community. You’ve helped make this emulator what it is today!

We now have an active Patreon campaign with specific goals and restructured subscriber benefits/tiers, so head on over if you want to help push Ryujinx forward!