September & October 2020 Progress Report

September & October 2020 Progress Report

We saw quite a few graphical improvements in September and October, so let's take a look!

GPU emulation improvements:

XMAD bug:

We had a long-standing issue on Super Mario Odyssey where the dithering patterns that show up on as transparency on objects close to the camera, commonly used to reveal any graphics behind it, wouldn't render properly on the emulator. gdkchan found that the issue was a due to one of the shader instructions (XMAD) not being properly decoded. After fixing the issue, the dithering pattern now renders correctly.



A nice surprise was that this also fixed lighting on Animal Crossing: New Horizons. Indoor areas used to be darker than they should on this title but with the fix they now have correct illumination.

Texture and Buffers management improvements:

riperiperi fixed annoying texture and buffer related issues on a few games. One of the affected games was Mario Kart 8 Deluxe, where the Animal Crossing course did not render properly. With the fixes, it now renders much better.

One of the main things it fixed was the texture data not being written back to memory when it was modified by the GPU and later deleted (to free VRAM space). It also includes some minor optimizations related to GPU texture and buffer management, which brings a small performance improvement to some games.

Texture cache improvements:

Another change by riperiperi brought improvements to some games with depth related issues. Burnout Paradise was one of those games. It used to have a broken depth of field blur effect, as we can see on the screenshot below.

The issue was that the shader responsible for this effect was not using the correct depth buffer. riperiperi fixed the issue by making the code responsible for finding textures that have been previously used by the game on the texture cache less strict. The GPU has a feature called component swizzle that allows re-ordering or replacing value of the texture color components when they are read on a shader; the change consisted of making the "compatibility" check of this component between two textures less strict, which allowed the correct depth buffer to be found by the emulator. The game now renders much better:

Better viewport flipping and depth mode detection:

The main motivator for this change was a bug that caused some games to render upside-down on GPUs without support for the "NV_viewport_swizzle" OpenGL extension. This issue affected all AMD and Intel GPUs, and also some older NVIDIA GPUs that do not support this extension. gdkchan fixed the issue by using a different viewport flipping method that does not require the extension, and works properly on all the GPUs.

This was not the only change, however. Another related change improved the heuristics that the emulator uses to detect the depth mode set by the game using higher level APIs (such as NVN), and then pass it down to the host API (in this case OpenGL which is the only API we use right now). This fixed graphical issues such as the misplaced blur on Zelda Link's Awakening, and broken shadows on Burnout Paradise.



Vulkan support (no, it’s not what you are thinking):

The Switch is a current console with a fairly powerful mobile GPU. As such, it also supports modern graphics APIs such as Vulkan. While the number of games that make use of Vulkan on the platform is very low, there are still a few games using it. In Ryujinx they did not render anything because the Vulkan driver on the Switch uses GPU commands that are not used by the other APIs, and for this reason they were not supported on the emulator yet. gdkchan reverse engineered the required parts of the driver to implement the missing functionality.

Turok 2 and Doom 64, two games that use the Vulkan API on the Switch, now go in-game with these changes. They are also a prerequisite to enable Super Mario 3D All-Stars to render, since those games also use the Vulkan API. We'll get to that soon.

1D textures:

A bug with the management of 1D textures caused the fog to not render properly on Monster Hunter Generations Ultimate. The emulator does not “know” when a 1D texture is rendered to, simply because this information is never given to the GPU. For the GPU in this case, a 1D texture and a 2D texture with one line is the same, so it does not need to know if the texture is 1D or 2D. The emulator would just assume that the texture is 2D, which breaks if an attempt is made by the game to read said texture on a shader afterwards as it expects a 1D texture, but the texture on the texture cache is 2D.

gdkchan fixed the issue by simply transforming all 1D textures into 2D textures with a single line. This change allows the correct texture to be found on the texture cache.



2D array ASTC compressed textures:

ASTC is a texture compression format created by Arm and usually supported exclusively on mobile GPUs; desktop GPUs do not support it. For this reason, we need to decompress these textures ourselves and pass the decompressed data to the GPU in a format that it supports. The decompressor had no support for 2D array textures, which meant that only the base array layer was being decoded, and the others were totally blank.

This caused most of the rendering to be completely black on Donkey Kong Country: Tropical Freeze on GPUs without ASTC support.

The fix allows the game to render properly, for the most part.

BRX improvements:

Shaders support indirect branches on the Switch. In simple terms, this allows the shader program to change the location of the code that is currently being executed to any other location the game wants. This is something very complicated to handle on an emulator, and we translate the shader in a format that the user GPU can understand and execute. There was an issue with this instruction that was preventing some games from rendering. After tweaking the heuristics used to find possible branch targets for this instruction, gdkchan fixed the issue and allowed more games to render, such as Koi no Hanasaku Hyakkaen.

Super Mario 3D All-Stars bugs collection:

Super Mario 3D All-Stars is a collection of all 3D Mario games (excluding Super Mario Galaxy 2). The games were originally released on the Nintendo 64, Gamecube and Wii. Not surprisingly (but perhaps a bit disappointing for some), these games are running on the Switch thanks to the use of Nintendo’s own emulators. While there is not much point in playing the collection on a Switch emulator since these games can already be played on several platforms on emulators for their respective consoles, it is still interesting to see them working just as a test of the emulator’s performance and accuracy.

We had one HLE related issue that prevented the games from booting. A bug on the surface flinger implementation would cause more buffer than available to be reported to the game, which caused a memory corruption and crash. There was also a NVDEC H264 decoding related bug that caused crashes on the launcher. The former was fixed by Thog, while the latter was fixed by gdkchan.

As mentioned before Super Mario 3D All-Stars uses Vulkan and, as expected from an emulator, it uses some features in an unusual way. These unusual methods helped to uncover bugs in the GPU emulation and assisted us in finding unimplemented shader instructions.

The first issues we noticed were several characters not rendering completely, if at all. Where's Mario’s body? The Toads are also missing...

This bug was caused by an incomplete implementation of a shader instruction. The LDC instruction can be used to load data from a constant buffer (a buffer that can be read, but not modified by the shader), and it also supports indexing, but the indexing was not supported on the emulator, which meant that for the animated parts of Mario body (and the other characters), it was loading the animation matrices from the wrong place. gdkchan implemented the missing functionality, which made this section render much better.

But that was just the beginning.

After getting further into the game, we encounter more nightmare fuel as can be seen on the screenshot below:

What happened to Rosalina? The answer is: a missing shader instruction! The LEA.HI shader instruction was not implemented. gdkchan implemented it and now Rosalina also renders properly:

Great! But it's not over yet...

This is not what a shadow is supposed to look like. This issue was caused by the wrong blend function being used, and a shader bug where it was mapping the wrong register as the output alpha component. gdkchan fixed both issues, and now the shadows render correctly.

And lastly, we observed broken rendering on the terrace:

This was caused by the lack of support for dual source blend (a feature that allows blending together two distinct outputs from a pixel shader). This was also fixed by gdkchan:

Unfortunately, it still wasn’t over. Starbits interaction didn’t work which meant that the game is not really playable, as it's not possible to progress past this area. But we will get there soon!

DOOM 64 texture issues:

As previously  mentioned, this game uses the Vulkan API and started rendering with the changes to enable Vulkan, but it was still not playable due to a rendering bug:

The issue was caused by one of the variants of the LOP3 shader instruction not being decoded properly. With it fixed, the game now renders correctly:

This issue was fixed by gdkchan.

More texture management improvements:

Another batch of texture related improvements made by riperiperi improved many games. One of them was proper texture flushing, which allowed texture data to be written back to CPU visible memory when it is read by the game. This fixed missing thumbnails in many games, among other issues.

Below we can see race thumbnails finally working on Mario Kart 8 Deluxe:

In addition to that, it also fixed some game breaking bugs, such as the success meter never moving on Snipperclips, which made the title unplayable. With these changes, it can now be played!

This update also brought significant speed improvements on games that were previously bottlenecked by texture modification checks, such as Zelda Breath of the Wild. Not all games had performance improvements however; in fact, some are slower after this change. But performance changes were not the main goal of the update; rather the true goal was increasing emulation accuracy.

Image textures fix:

riperiperi also fixed a bug that prevented Unreal Engine 4 games from working on some GPUs. Image buffers were not being bound (something that Unreal Engine games tends to use quite a lot); this caused driver crashes. After the fix, more users can now enjoy UE4 games on this emulator.

Note that Unreal Engine games in general still suffer from a lot of texture related bugs and other issues on this emulator, so not all of them are playable.

Transform feedback fixes:

For a long time, SNK Heroines had an issue that made most models T-posed on this emulator. That prevented the game from being playable. The issue can be seen on the screenshot below:

After some investigation, gdkchan discovered that the issue was caused by invalid usage of transform feedback buffers. Fortunately the issue was easy to fix, and the game now renders as it should:

This fix should also help Xenoblade, which was still suffering random grass issues mentioned in the last progress report.

If you pay attention to the frame rate, you will notice it is not great. This game was being slowed down by ASTC texture decompression. gdkchan discovered that the game seemed to do many redundant updates of the texture data, and added a data modification check that would skip the decompression if the data was not modified. This greatly improved performance, and the game is now running close to full speed depending on the hardware.

Party time!

Super Mario Party had some texture issues caused by its use of bindless textures. We already talked about bindless textures quite a few times in past progress reports, but it doesn't hurt to explain it again. The bindless texture feature allows shaders to access a texture without the game explicitly binding them. Basically, the game passes a handle to the shader, and the shader program can load the texture information from memory using that handle. The problem is that the handle has no meaning for the host platform; it is only really meaningful for the Switch GPU. Currently we emulate that by doing data flow analysis at translation time on the shader. This is done so that the emulator can figure out where the handle comes from. By knowing where the handle comes from, the emulator can then load the handle from memory and bind the appropriate texture. The shader bindless access is then replaced with a regular texture access, since we already know where the texture is located and can bind it.

While this works to some extent, there are some cases that still doesn't work, still requiring proper bindless texture support. On Super Mario Party, some shaders would fail to compile simply because the shader translator on the emulator was not capable of finding where the handle was coming from, and generated an invalid texture access on the shader. gdkchan fixed the invalid access, making the shader compile. Below we can see the results.



Note that it is still not perfect. In fact, properly rendering this game will require a completely different approach to support bindless texture. At least now it is playable!

CPU emulation improvements:

Direct calls + recursion = stack overflow:

LDj3SNuD fixed a regression introduced after the direct call changes. On some games that makes use of deep recursive calls, a stack overflow could occur due to the high number of calls. In simple terms, each function call uses some memory. This memory is released when the function returns. A recursive function call is a call to a function that was already called before, but did not yet return (and thus, did not release the memory yet). A high number of recursive function calls may cause all the memory that they use (stack memory) to be quickly exhausted, which causes a stack overflow error, triggering an emulator crash. The issue was fixed by using a jump instead of a call, when a recursive call (with the function calling itself) is detected. With this change, games such as Terraria and Blaster Master Zero 2 are now considered playable.

32-bit instructions:

LDj3SNuD also implemented several 32-bit instructions required by Super Mario 3D All-Stars’ embedded Super Mario Galaxy: namely the UMAAL, VABD, VABDL, VADDL, VHADD, VQSHRN and VSHLL instructions. The implementation of these instructions allowed the title to go in-game! We already talked extensively about it here though, so we will spare you the repeat information.

HLE improvements:

Error applet:

Ac_K implemented support for the system error applet. In addition to help us to diagnose issues when games break and call the error applet to show the user information about the error, it also allowed some games to boot further. Some affected games are considered playable.

One example of such a game is REKT:

Ghost Blade HD is also now considered playable:

Motion controls:

emmaus added support for motion controls which allows using motion as an input method on the games that support it. This is especially useful for some games that require it, such as the tutorial on Splatoon 2 that requires motion controls, and has no way to disable it until completion. Note that the tutorial can be still finished without motion in this game, but it is considerably more difficult as the camera is locked to a specific angle.

GUI improvements:

Automatic updates:

Checking updates daily on the website can be tedious. For this reason, several users requested that an option was added to check for updates automatically, and if an update was found the emulator would download and install the update. MelonSpeedruns and DrHacknik heard their pleas and implemented this feature. Now a prompt is shown on launch prompting to update if one is available. This launch-time check can be disabled in the settings.

Closing words, and we need your support!

First, we'd like to thank all our current supporters, and apologize for the delay on this progress report. But on the bright side, less time writing progress reports means more time fixing bugs!

We have several developers interested in working full time in this project. If feasible through sufficient Patreon donations, this means we can spend more time working on the emulator and users can get the features that they want faster. We will share more details about that soon, so stay tuned!

We also have a few sub-projects in the works which will be revealed in due time.