First Patreon Goal Met, and November & December 2020 Progress Report

First Patreon Goal Met, and November & December 2020 Progress Report

2021 is finally here, and we spent last year's final months on a compatibility expansion and bug-fixing spree. But before we take a look at all the improvements that finished out the year, we want to take a moment to celebrate and say thank you to everyone who helped us reach our first Patreon goal—Amiibo emulation support—so quickly; we'll be working hard to implement this feature into the emulator ASAP, so stay tuned! Separately, with today's emulator update we have delivered on our promise to list the names of those in the $10 & $20 monthly Patreon tiers in the Help > About section of the emulator; thank you for your support! We've added a couple of new Patreon goals and lowered the threshold for our Vulkan goal. These tasks are already on our to-do list, however meeting the below Patreon goals gives us the resources to immediately tackle each respective feature.

(GOAL MET) - Amiibo Emulation

In-progress, ETA 3-4 weeks:

Allows emulated scans of a selected Amiibo and subsequently unlock exclusive content in games which support this function.

$1000/month (almost there!) - User Profile Support

ETA once goal is reached: ~1 month:

This will allow the creation of multiple user profiles, the option of using a custom name instead of the current default name "Player",  and the ability to use a custom profile picture instead of the current hardcoded (and very old) Ryujinx logo. We will also attempt to load the official profile pictures from the installed firmware, and provide them as an additional option.

$1250/month - Vulkan GPU Backend

ETA once goal is reached: ~6-8 weeks:

This one is a biggie. Vulkan will significantly improve performance for and reduce the amount of graphical glitches on AMD GPUs and intel iGPUs. Moreover, SPIR-V (Vulkan shader language) is faster than GLSL (OpenGL shader language) so, even without a disk shader cache, there would be significantly less stutter on first run than on native OpenGL.

$1500/month - ARB Shaders

ETA once goal is reached: ~3-4 weeks:

ARB shaders will further reduce stuttering on first-run by improving the shader compilation speed on NVIDIA GPUs using the OpenGL API.

And now the progress report!

GPU improvements:

Buffer to texture copies size fix

A bug on buffer to texture copies caused some Unreal Engine 4 games to crash. This bug was causing the emulator to copy too much data which in turn caused memory corruption, as unrelated data was being overwritten in memory. Fixing this allowed a few Unreal Engine 4 games to boot further, with the tested titles now reaching menus, but they unfortunately still crashed when attempting to start a new game.

One such game was Remothered: Broken Porcelain, which renders quite well as we can see on the above screenshot. The Bravely Default II demo also had improvements, but neither of these are playable just yet.

Fixed on #1670 by gdkchan.

Make sure to enable rasterizer discard before clear

The switch GPU supports a feature called rasterizer discard, where the GPU simply discards the output produced by triangle rasterization. This also affects clears, which means that attempting to clear a framebuffer with this feature enabled wouldn't actually do anything. Xenoblade uses the feature to disable draws from rendering into framebuffers, while still allowing the shaders to run. It is not supposed to be enabled for clears but, due to a bug on the emulator, the rasterizer discard was not being disabled.

This caused a lot of "ghost" shadows to be rendered on Xenoblade as they were not being properly cleared. Fixing the issue was simple, fortunately, and allowed the game to now render shadows properly:

Fixed on #1680 by riperiperi.

Shader cache

This feature is the most highly requested item we’ve ever implemented. A disk shader cache allows the shaders used by the game to be saved to the disk. On subsequent runs, the emulator does not need to compile them while the game is running. Instead, all shaders are pre-compiled at startup. This eliminates stutters caused by shader compilation, as long that specific part of the game was already played before.

We have a separate blog post covering the feature, you can check it out here if you haven't already.

Implemented on #1701 by Thog.

Force early depth test implementation

Early depth test is an optimization that the GPU can do that performs depth testing before the pixel shaders runs. This saves the GPU from having to do potentially expensive pixel shader computations if the pixel is not visible in the scene being rendered. This optimization is normally not enabled if doing so would cause a change in the output produced by the pixel shader, however, applications may still force it to be enabled in those cases. Xenoblade is the only game known to do this.

Implementing the feature fixes interior volumes being incorrectly rendered, as can be seen below:

Implemented on #1755 by riperiperi. Originally discovered by Rodrigo.

CPU improvements:

Half float optimizations

Half float is a 16-bit floating point format. It has less precision than the more common 32-bit float format but also uses half of the memory, so it may be useful in some cases where higher precision is not a must. The Arm CPU that the Switch has only supports doing a few operations with this float format, which are mainly conversions to and from half float values. The implementation of this instruction on the emulator was optimized by using dedicated x64 instructions to do the conversion, which means that games making use of the functionality will now be a little bit faster.

Implemented on #1650 by LDj3SNuD.

Floating point accuracy improvements

A bug in one of the floating point instructions caused Luigi’s Mansion 3 to get stuck in a specific part of the game. The bug caused the elevator to keep moving forever when trying to go to a specific floor.

The bug was fixed by properly supporting the flush-to-zero mode on the responsible instructions, and the game can now be progressed through normally.

Fixed on #1630 by LDj3SNuD.

New CPU intructions implemented: VFMA, VFNMA, VFNMS and VRINTX

Some missing 32-bit CPU instructions prevented many games from booting. Those instructions are responsible for doing a fused multiply and accumulate operation (FMA), and also integer rounding. Implementing them made Spirit Hunter: NG, Fairy Fencer F: Advent Dark Force, Cabela's: The Hunt - Championship Edition, STURMWIND EX, Megadimension Neptunia VII, Baldur's Gate and Baldur's Gate II: Enhanced Editions, Planescape: Torment and Icewind Dale: Enhanced Editions, Little Inferno, Prinny Can I really Be the Hero, Prinny 2 Dawn of Operation Panties Dood, Human Resource Machine, 7 Billion Humans, TY the Tasmanian Tiger and many more booth further, many of which are now playable!

Implemented on #1758, #1783, #1762 and #1776 by Sharmander.

PPTC improvements

PPTC is a feature that helps reducing load times by saving JIT generated code on disk. The feature was recently improved by fixing a bug that would prevent the memory used for JIT compilation to not be reclaimed when the feature was active. This reduces memory usage and should fix some out of memory errors on system with lower amounts of RAM.

Other improvements also includes better logging and general code refactoring.

Fixed on #1712 and #1814 by LDj3SNuD.

HLE improvements:

VR support

A few Nintendo Switch games support Virtual Reality with the use of the Nintendo Labo kit. The feature was not working on the emulator due to missing OS service functions. Implementing the required functions allowed the feature to be enabled on the games supporting it. One example of such a game is The Legend of Zelda: Breath of the Wild, as we can see below:

A few other games support VR, such as Super Mario Odyssey and Captain Toad: Treasure Tracker. There are also a few games that only work in VR mode, such as Spice and Wolf VR. The feature can't be fully enjoyed yet due to the lack of support for the motion sensor on the Nintendo Switch (the one on JoyCons is already supported, but the SixAxis sensor on the unit itself isn't).

Implemented on #1688 by Ac_K.

Audout service fixes

The audio output service (called simply "audout") is used by some games to output a raw audio stream on the Switch. A few missing functions prevented some games from progressing past menus. Namely, the "GetAudioOutBufferCount", "GetAudioOutPlayedSampleCount" and "FlushAudioOutBuffers" functions. Implementing them made games such as Atelier Shallie: Alchemists of the Dusk Sea DX and Devil May Cry 2 playable.

Implemented on #1725 by Ac_K.

Audio renderer regression fixes

Earlier this year, we announced Amadeus, a large project that rewrote the entire audio renderer implementation on the emulator. While the old implementation was based mostly on guess work and on the client side implementation (basically, on what games do when communicating with the OS), the new one was based on proper reverse engineering of the service.

The new implementation was complete and fairly accurate, but complete does not necessarily mean bug free. While it was tested extensively before release, we can't possibly test very game so a few bugs managed to go unnoticed. Two particular bugs caused an out of range access on arrays within the audio renderer, and caused Resident Evil 6 and Shovel Knight: Treasure Trove to crash at boot. Both issues are now fixed, and the games are playable once again.

Resident Evil also has significantly better audio now thanks to Amadeus!

Fixed on #1739 and #1742 by Thog.

Save data size fix

A bug on the function that games uses to get the maximum size of save data was preventing some games from working. The size was not being properly written to the output. Fixing this issue allowed The Language of Love to boot.

This game is now playable!

Fixed on #1748 by Ac_K.

IPC improvements

On the Switch OS, processes can communicate which each other using inter-process communication. This is the way that the game uses to send requests to services, and how services send responses to those requests back to the game. This was not accurately implemented on the emulator; the services were just reading the requests from the game memory directly and writing the responses there. There was no concept of separate process or address space per service. Those changes enable each service to have their own guest processes (they are not real host processes, but they have their own processes on the HLE OS). In addition to that, they also now use the correct system calls to receive and reply to requests sent by the game.

This change does not have any immediately noticeable improvements for end users, but it does benefit homebrew developers creating custom services; they can now use the emulator to test their code. In the future, this will also allow emulation of the Nintendo Switch OS services.

Implemented on #1458 by gdkchan.

Scheduler context switch code rewrite

Context switching is the process of transferring control from one thread to another. The old code had quite a few bugs that caused issues on some games. The rewrite fixed a few intermittent crashes and softlocks caused by bugs on the old implementation that would allow errant operations such as the same thread running on two different guest CPU cores, and race conditions. One of the games that could crash due to this was Bayonetta 2, which is now much more stable.

Fixed on #1786 by gdkchan.

GPU memory allocation speed improvements

The process of allocation GPU memory on the emulator implementation of the NVIDIA driver used to be very slow, due to its use of linear search to find free memory regions. With an optimization that changed it to use a binary search tree instead, the allocation process is much faster. In practice, this decreases the loading time in Fire Emblem Three Houses (to reach the title screen) by about 20 seconds.

Other games might benefit from this as well, but most games don't have a visible improvement. The amount of improvement is entirely dependent on the way the game manages GPU memory.

Implemented on #1722 by Sharmander.

NGCT service functions

The NGCT (No Good Content for Terra, according to switchbrew) is responsible for filtering "bad" words, and is used by games released in China when such filtering is necessary. One of the games that makes use of this is Horace, which is now playable thanks to the implementation of the service.

Implemented on #1756 by Ac_K.

Loader improvements

A bug in the loader prevented games using a very large BSS size from being loaded into memory. The bug was caused by the use of the wrong integer type for those sizes. The emulator was using a signed type which means that the size could be negative (something that is not valid). It was fixed by changing to an unsigned type which means that the size can't be negative, and the correct size is now used.

This fixes failure to load on Hatsune Miku: Project DIVA Mega 39's/Mega Mix (only the US version was affected; the Japanese version was already working before!), Death Mark, Darkest Dungeon, CHAOS CODE - NEW SIGN OF CATASTROPHE, Air Missions: HIND, Doukoku Soushite..., and more.

Most of these games are now playable.

Reported on #1792 by EliEron.

Fixed on #1802 by gdkchan.

It's time to wake up!

Some games, like The World Ends with You, requires waking the Switch up from sleep mode to progress past certain parts. We added a new option on the UI to simulate this, along with the required HLE OS support for the functionality. This allows the game to be progressed through now with a single click.

We might also add a key binding for that in the future, so that users don't need to exit full screen mode to perform the required action.

Implemented on #1750 by Ac_K.

Frame pacing improvements

Uneven frame times can cause a significant loss in the perceived frame rate of a game. Even if the game is running at, say, 60 fps, it can feel much slower than that if the frames are not presented at the correct time. On top of that, incorrect frame pacing can cause the game to visibly stutter.

This was fixed by using a more precise wait mechanism for frame presentation and VSync signaling. Instead of using the host OS waits (which only has millisecond precision on windows, in the best case), it now uses a mixed method that spin waits for the remaining fraction of the time.

Another problem was that the GPU command processing was not being interrupted for frame presentation (which happens on the same thread, due to OpenGL not being multithreading friendly). This further contributed to the frame pacing issues (two frames could be presented at once) and would also cause the GPU to be idle for no reason, since at some point the game will need to wait until the frame is presented to reuse the framebuffer, and will not be able to submit more commands until that is done. This issue was fixed by interrupting GPU command processing for frame presentation, if a frame is available.

These changes brought significant improvements to both Xenoblade games available on the Switch, but those are not the only games that benefit from it! Nearly all games feel smoother now.

Before:

After:

Not only the frame times are much more stable now, the average frame rate is also higher on this title.

Fixed on #1741 by riperiperi.

GUI improvements:

Toggle docked/handheld mode with a key binding

Switch games usually render at a higher resolution, and with more quality in docked mode. The reason for that is simple: in docked mode the device is not running on battery power, which allow slightly higher clock speeds to be used by the CPU/GPU. This increases their processing speed and makes increases in resolution possible without too much of a compromise on the frame rate.

This change allows the mode to be changed by simply pressing a hotkey without the need for navigating the emulator’s options menu.

Implemented on #1685 by SeraUQ.

Toggle docked/handheld mode by clicking on the status bar

We also had a related change allowing the mode to be changed by clicking on the status bar to toggle docked/handheld modes. The currently selected mode is shown.

Implemented on #1726 by Ac_K.

U l t r a  W i d e

We received some user requests to support ultra-wide resolutions in the emulator. This can be used together with mods that changes the game aspect ratio to allow them to fill ultra wide screens. A new setting was added on the UI that allows the aspect ratio to be extended to 21:9 and beyond. It also now supports a 4:3 for those that prefer a more retro look. The default aspect ratio is 16:9, which is the current standard and what the Switch has. It's also possible to also stretch the image now. Before, it would automatically fill the window/screen but insert borders to preserve the aspect ratio. With the new stretch option, it fills all the space without trying to preserve the aspect ratio.

We highly recommend keeping it on the default 16:9 ratio, as anything else will distort the image without mods. But the option is now there for those that want it!

Implemented on #1777 by Ac_K.

Other:

OpenAL is now distributed with the emulator binaries

A small nuisance for first time users was the requirement to install OpenAL. It is required for audio to work on certain games with the emulator. The emulator also has a SoundIO audio backend (whose library is shipped with the emulator), but this does not work well with all games. Now OpenAL is also shipped with the emulator, so users don't need to install it on their system! On top of that, we are now using a newer version of the OpenAL library, which further improves audio output in some games when this backend is used.

Implemented on #1847 by Thog.

Closing words:

We hope you have enjoyed all the progress we made during the year, and we plan to tackle many more improvements in 2021. There's still a very long way to go! We recently restructured our Patreon tiers & benefits; we published a post with more details about those changes; be sure to check it out here if you haven't already!

In the last progress report we also mentioned some sub-projects we have been working on. One of them is Arm64 support. For those not aware, Arm is a CPU architecture that powers most mobile devices. Apple also recently announced a move from x86 to Arm CPUs, and the recent release of the Apple M1 emphasizes the importance of supporting Arm in the future in order to make the emulator available on new platforms.

The work has already begun, and we have made some progress using the Raspberry Pi 4 board as a test platform. Below you can see screenshots of a few games running on the device.

There's still a lot of work to be done, but it's exciting to see these first games booting on a new platform! This should eventually allow Android devices and new Apple Arm devices running MacOS to be supported in the future.

Thanks to everyone that has supported us so far be it via Patreon donations, code contributions, testing games in the emulator, or simply being an active member of our community. You’ve helped make this emulator what it is today!
We now have an active Patreon campaign with specific goals (one of which was just met...more on that soon!) and restructured subscriber benefits/tiers , so head on over if you want to help push Ryujinx forward!