Progress Report January 2023
Gather back round the fire for another month. This time you can tell us all about your resolutions, both ongoing and already failed.
Compared to 2022 with Legends Arceus, January was far less hectic. Fire Emblem: Engage being playable on day 1 meant that there wasn’t a need for developers to leave whatever they were currently working on to handle damage control. We’re trying to build our day 1 streak back up and Engage will be a fine addition to that collection!
Before we wander further, take a look at our patreon goals below. We’d like to reiterate once again that any features listed below will eventually be worked on, regardless of the goal being met. It would simply become a priority as soon as the incentive amount was sustained. This, of course, isn’t true for the full-time development goals which, by nature, are dependent on financial backing.
$2000/month - Texture Packs / Replacement Capabilities - getting close!
This will facilitate the replacement of in-game graphics textures which enables custom texture enhancements, alternate controller button graphics, and more.
$2500/month - One full-time developer - Not yet met.
This amount of monthly donations will allow the project's founder, gdkchan, to work full-time on developing Ryujinx. All our contributors currently only work on the project in their spare time!
$5000/month - Additional full-time developer - Not yet met.
This amount of monthly donations will allow an additional Ryujinx team developer to work full-time on the project.
Moving swiftly onwards…
Resolution scaling is a core sway for the use of emulators when playing both retro and modern games. Escaping the clutches of 480p Gamecube games and… 480p Switch games (depression) is just as tantalizing. While our resolution scaler, added all the way back in 2020, is able to scale a significant chunk of the Switch’s library, there were a few games over the years that gave it trouble.
The release of Fire Emblem: Engage highlighted an old issue with the scaler which caused games to reset back to native on certain actions. Opening menus, scene transitions etc. The fix actually owes itself to Pokémon Brilliant Diamond and Shining Pearl released over a year prior, which had the same problem and is potentially a candidate for longest authoring to merge pull request on the repo.
As this change was originally intended for BDSP, Deltarune and Crash Team Racing, MSAA textures were also un-blacklisted from the scaling algorithm. This allows any titles making use of these such as: Pokémon Mystery Dungeon: Rescue Team DX, Rune Factory 5 and Cruis'n Blast to finally scale correctly.
Pokémon Mystery Dungeon: Rescue Team DX normally uses a very aggressive blur filter so the above screenshots were taken using a mod to remove this, allowing the changes to be much more noticeable. Crusin' Blast is similar in that it also uses some heavy post-processing which can fuzz edges, especially at higher resolutions. Even so, the car and distant detail is much improved!
Koei Tecmo are a studio that simply love to develop some of the most jank and finicky games on the market, especially when it comes to emulating them. Both Hyrule Warriors: Age of Calamity and Fire Emblem: Three Houses would regularly suffer from large slowdowns when looking in certain directions or seemingly at random. Age of Calamity in particular would sometimes slow to a complete crawl when too much was happening at once. Texture caches are extremely important for emulators, as texture creation costs are extraordinarily high. With the addition of a second, more niche, short duration texture cache, the troublesome cases of a texture's reference being wiped from the main texture pool while still being in use, are heavily mitigated. This was ultimately the cause of lots of the major slowdowns in the two games mentioned prior, which should both see healthy performance boosts under scenarios where they used to struggle.
A small memory leak on AMD, Intel and Apple (still weird typing that) GPUs was resolved this month, which was caused by old Vulkan swapchains not being destroyed when a new one was created. This was experienced whenever the window was resized and could add up to a large chunk of VRAM over multiple changes. Nvidia was unaffected, as it seems their Vulkan driver was doing some wizardry behind the scenes to recognise a redundant swapchain and destroy it automatically. This was our fault though and isn’t really within the scope of the driver to handle this. As such, old swapchains are now manually destroyed when a new one is created.
On the topic of VRAM usage and texture caches, our current `AutoDeleteCache` has a hard limit of 2048 entries which will try to keep the most active textures cached while removing older entries. Unfortunately, this doesn’t work too well when we have a small number of very large textures which don’t trigger any of the cache limit safeguards, but do take up a large amount of memory. For example take the following scene:
Not particularly complex, low number of textures, surely this can’t be an issue? Well, the textures used here are few in number but huge in size; the exact scenario we don’t want. To resolve the large memory use here, we can force a deletion whenever a texture is unmapped and not in a GPU-modified sub-range region. This dramatically lowers the VRAM usage in high-stress scenes such as the above from Witch’s Garden.
Visual novels are the unit tests of emulators and EVE ghost enemies is another example of this. Character portraits were failing to render due to the ‘Modified’ texture flag not being cleared.
Simply clearing this flag once the texture is modified from the CPU on the GPU thread resolves this bug.
Persona 4 Golden is a true masterpiece in the gaming world and a personal favorite in the series. However, while it sure is an old game at this point, it isn’t THIS old:
While the aesthetic here is somewhat interesting, it certainly wasn’t correct. The cause was isolated to the CSET and CSETP shader instructions, which either had only partial implementations or none at all due to the rarity of their use. Fully implementing CSET and fixing the partial implementation of CSETP completely fixes Persona 4 Golden. Go make history.
As we mentioned a couple of months ago, with our transition to .NET 7 the path toward NativeAOT has been opened to us. For any unaware, NativeAOT allows your C# code to be compiled directly to self-contained native binaries, meaning you can run .NET applications on platforms without JIT permissions. Normally when you run any .NET program, what you are actually running is an “intermediate language”, which is then compiled to your system's bytecode at launch via the .NET runtime. This can introduce additional latency and usually shows itself in slower startup times.
There are, however, limitations to NativeAOT. You cannot make use of the feature when some .NET features are utilized within your codebase; the main offender for Ryujinx being ‘Reflection’. Reflection is complicated, but in the simplest terms possible it allows your program to know stuff about its own code dynamically. Unfortunately this is a no-no for ahead-of-time compiling, and there has been a significant push to remove uses of reflection from Ryujinx for a fair while. This month reflection was removed from the multithreaded GPU abstraction layer (GAL), which is another step toward the goal. Expect to see more changes referencing the removal of reflection, and we hope you now know why it’s important!
Last month we highlighted some changes in the OpenGL backend to allow Pinball FX3 and Sphinx and the Cursed Mummy to render correctly. It took until January for those changes to cascade their way into a Vulkan equivalent. Both of these titles now render correctly on both backends.
Finally for our GPU section: in December we showed Ryujinx running on a Raspberry Pi, and this month you too can try it! We’ve relaxed our Vulkan requirements, which should allow a larger variety of devices and drivers to run Ryujinx if they don’t fully conform to the modern spec.
Don’t expect this change to allow your old Celeron laptop to boot Switch games though. It’s mainly for open-source Vulkan drivers that may lack a feature or two. As stated last month, performance on the Pi is dreadful, but we’re hopeful that newer Qualcomm devices might be interesting in the future.
Of course, these devices are ARM64 based. And Ryujinx is for x86_64 systems right?
Wrong! Make way for an ARM64 backend for our CPU JIT, ARMeilleure. While this doesn’t allow every ARM device a full hypervisor-style native execution of code, similar to that seen on our macos1 build, lots of instructions are mapped almost 1:1 and should have minimal overhead compared to a similar x86 CPU executing them. Some additional optimization to ARM bit manipulation and feature detection were also added to slightly improve performance for ARM64 processors.
This contributes to the upstream of the macOS changes as it is required for ARM64 processors to run 32-bit titles such as Mario Kart 8 Deluxe.
This change necessitated a couple of other changes to our JIT architecture. The PPTC, which caches CPU instructions so that they don’t need to be translated multiple times, now needs to check your CPU architecture, as an x86 cache is obviously not compatible on ARM systems and vice versa.
Any global state was also removed from the PPTC, which allows applications that make use of the Nintendo Switch’s JIT service, like the NSO N64 emulator, to attach a cache instance to each guest process. This allows sub-processes like the N64 games in the NSO emulator to gain the individual benefits of faster launches.
We’ve already covered some of the changes that originated from the macOS upstream roadmap, such as the ARM64 JIT and the relaxing of Vulkan requirements, but January still had plenty more to dedicate to its own section.
We’d like to preface this section by explaining that 99% of what is mentioned here is already included in the macos1 build. While we’re working on upstreaming the basic changes, our focus is not on fixing the known issues with the first mac release. This is why there’s been a bit of a delay for any further mac releases; realistically, nothing beyond minor fixes and adjustments have advanced what is currently available.
A variety of the MoltenVK workarounds were implemented which cover a lot of ground. MVK portability subsets, shader specialization and ASTC format checks are a few of the highlights, but the pull request linked above lists the rest. Notably, this does not include our geometry shader emulation or transform feedback emulation. We’re hopeful that the MoltenVK team will be figuring out their own geometry shader implementation sometime in the first quarter, which would be preferable to using our own. These changes allow Mario Kart 8 Deluxe to render correctly on self-built releases.
Some smaller changes related to Vulkan/MoltenVK were also merged in January:
- Explicitly enable precise occlusion queries on Vulkan - Fixes ink collision in Splatoon 2/3 in MoltenVK.
- Use volatile read/writes for GAL threading - Resolves some random crashes experienced on ARM CPUs re-ordering command queues.
- Change BitfieldExtract and BitfieldInsert for SPIRV-Cross - Fixes a case where the SPIR-V -> MSL shader conversion could be invalid and crash on MoltenVK.
- Reset queries on same command buffer on Vulkan - Fixes occlusion queries having bogus results on macOS, mostly visible in Super Mario Odyssey and A Hat in Time.
These final two major changes shift back toward CPU and Memory emulation.
The Switch uses page sizes of 4Kb, while macOS and some other platforms use 16Kb or higher. It was therefore required to implement support for platforms without support for 4Kb granularity. With this change in place, it is no longer required to boot self-compiled builds with Rosetta.
While the ARM JIT is cool and all, it wasn’t the centerpiece of why Apple Silicon support was so interesting in the first place. Our Apple Hypervisor is now also fully implemented, and with the above page size changes, is now fully on-par with macos1 as far as the CPU and Kernel side go. There is still a fair bit of work to bring the MoltenVK and GPU side up to snuff, but it’s getting there.
With a lot of these changes in place, we’re hoping that anyone who was interested in macOS contributions should now have a solid base to begin, with most games running at varying levels of performance and graphical fidelity on self-compiled builds.
While we’re waiting for Avalonia 11.0 (required to distribute on FlatHub), January did not slow down as far as changes to our WIP GUI are concerned.We saw a huge number of codebase refactors and improvements spanning from:
- User profile manager refactor - Uses dialog overlays instead of a new window.
- About window refactor - Uses dialog overlays instead of a separate window.
- TitleUpdateWindow refactoring - Switch to a dialog overlay instead of a separate window to apply updates.
Alongside visual refactors and changes, there were a fair, lot, of, different, refactors to improve readability and modularity of the frontend
A long-standing Windows bug where the “hide cursor on idle” setting didn’t actually work, was fixed this month alongside a couple of settings window rejigging. The resolution scaling drop-down menu was re-ordered to match GTK and the side nav-bar has been given a little bit of padding so that item selections don’t overlap.
Notifications have also been implemented which should allow some interesting use-cases in the future. For those unaware a notification looks like this:
There are still some questions to be answered on where they’d be best utilized to walk the line of informative, but not annoying. Shader compilation warnings are a potential future option here.
Avalonia will now also swallow keyboard TextInput events to avoid the annoying “bell” sound on macOS when using a keyboard as a controller. This change will take the place of a workaround in macos1 that currently breaks text input in a few places.
And finally for GUI, we have a CrowdIn page for translation help! If you’d like to help translate new strings or add your own language to Avalonia then we’d appreciate it. Lots of new translations were already added this month but there’s always someone out there who can expand that.
As mentioned earlier, sometimes changes take… a long time to reach end-users. Someone may have a great idea, write it all up and then get bogged down in other things, have their attention pulled away due to the release of a new Pokémon game, or drop off the face of the planet for a few years. Regardless, way back in late 2020, gdkchan - project founder, wanted to redesign how Ryujinx handles service implementations and start to decouple the OS HLE project from the kernel. Two years later and a multitude of changes in between, these goals finally materialized.
The advantages this redesign brings are numerous but simply boil down to being simpler (no more manual read/writes to message buffers), lower latency (less allocation and copying) and allowing multiple threads to process IPC which should eventually fix some bugs. Services need to be manually ported to this new system, so there is a lot of transition work to be done in the interim; most services at the moment are not using the new implementation so don’t see any of these benefits yet. We’ll keep you posted on what’s been migrated, and what that fixes as they happen.
More general changes this month included:
- `GetProgramID` and `GetApplicationProcessID` were implemented in the pm services - required for certain homebrew applications such as EdiZon (unfortunately crashes on boot due to other issues).
- The prepo and lm service libraries were migrated to the new HIPC system and cleaned up along with the sm library.
- A polyphase upsampler was implemented into the audio renderer - Skyward Sword HD makes use of this but we’re currently unsure of any user-end impact on audio.
- PCM24 and PCM8 output was implemented via format conversion - this allows audio output devices that didn’t support PCM16 or PCM32 to output in alternative formats.
- RomFS loading now takes process ID into account - fixes issues when multiple processes are attempting to be run simultaneously.
For anyone who uses the command line interface or our headless builds, lots of missing arguments were added in January such as the macroHLE setting, cursor hiding on idle and even which user profile’s information and saves you’d like to load. This is mainly useful for frontend users so if a terminal window scares you, none of this is important!
What to say that we haven’t already. We’re already bounding into the new year with not one, but TWO day 1 playable exclusives and we’re hoping that this continues. There are some killer releases this year after all! As per usual, if you’d like to support our efforts in making this a reality, you can find us on Patreon for financial support and GitHub if you’re familiar with 3D-graphics, low-level software engineering and C#/.NET! Outside of direct contribution, testing games and reporting bugs goes a long way in letting the team know what works, what doesn’t, and what should be on the priority list.
As a final comment, LDN3.1 (an update to our multiplayer build) has been released and includes a large performance boost for AMD GPUs in Scarlet/Violet among other changes! Check out the changelog and downloads here.
Bye bye! For now…