devlog > graphics
Starting to micro-optimize, and 3D artwork
Captain's Log: Stardate 78310.9
First things first
Before doing any kind of subtle optimization, the first thing is of course to make sure that you can actually measure whether your so-called "optimizations" make things faster. Often things that seem like they should be faster either don't do anything, or even make things worse.
So to start, I added a simple timing framework to my goldens tests, which exercise every physics feature that Anukari supports. Now when I run those 70-or-so tests, each one computes a timing benchmark and stores it in a CSV file along with the current time, the GPU backend that was used for the simulation, some machine ID info, etc.
This gives me the ability to graph the overall change in runtime across all the golden tests, as well as look at each individual test to see which ones were sped up or slowed down.
I decided for now to just check this CSV file into git. This is really simple, but kind of cool: each time I make a micro-optimization, I commit the code along with the updated profiling stats in the CSV file. So I have a nice history of how much impact each code change had.
Micro-optimizing
While Anukari's physics simulation runs to my satisfaction on Windows (with a decent graphics card), it is still dismayingly slow on MacOS. It works for small presets, but there's little margin, and for example in Garage Band you get buffer overruns all the time (not sure why yet, but Garage Band seems to have a lot more overhead compared to other DAWs).
One huge advantage of porting the physics to Metal, aside from the speedups I've already seen from optimizing the way the memory buffers are configured, is that I can get meaningful results out of Apple's Metal GPU profiling tools. The most immediate finding is that the simulation is NOT primarily memory-bound, and rather the long pole is ALU saturation. This is pretty surprising, but I did a lot of work early on to make the memory layout efficient and compact, so it is believable. And I've also added a lot of physics features that require computation, so it checks out.
Thus I've begun micro-optimizing the code. This is a little frustrating, as Apple has a tendency to put all their good Metal information in videos instead of documents, and I hate videos for technical information like this. Fortunately they have transcripts, but that's not perfect, as they're obviously machine-generated, and also the speakers are often referencing slides I can't see. But as disorganized and poorly-formatted as Apple's guidance here is, they do provide some juicy nuggets.
One thing I'm doing is looking for places where I can replace floats with half-floats, because the ALU has double the bandwidth for halfs. That's pretty obvious, but it's tricky because for most physics features, halfs have way too little precision. But there are opportunities, for example with the envelope attack duration, I'm pretty sure half precision is still far beyond human perception.
Another thing I'm doing is trying to figure out how to get rid of 64-bit longs from the computations. I don't use them much, but the global sample number is a long (32 bits worth of samples is less than 1 day, so it could easily overflow). I also use longs for my GPU-based PRNG. In both cases I think it will be possible to avoid them, but this will require some thought.
There are a ton of other little things, like reordering struct fields so that Apple's compiler can automatically vectorize the loads, getting rid of a pesky int mod operation which can take hundreds of cycles, and switching array indexes from unsigned to signed ints. The latter is quite interesting, apparently Apple's fastest memory instructions depend on the guarantee that the offset can be represented by a signed int, and if you use an unsigned int the compiler can't make that guarantee and has to fall back to something slower.
3D Artwork
I've started to reach out to a few 3D artists about redoing all the 3D models for the various instrument components. I'm looking forward to getting professional models, as I think it will make the whole plugin much more beautiful and interesting to look at.
This will also be an opportunity to change things so that the animations are part of the 3D models themselves (using bone skinning) instead of hard-coded into the Anukari engine. This will be an advantage when users start creating (or commissioning) their own 3D models for the engine: they won't be limited to the built-in animations. They can do anything they want.
Anyway after talking to some artists, I quickly realized that I have vastly under-specified what I actually want things to look like. So I've spent a fair bit of time working on a visual style document, where I include a bunch of reference images for what I want each thing to look like, along with commentary on what I'm going for, etc. In some cases I'm hand-drawing the weird stuff I want because I can't find anything similar online.
This is a lot of work, but it's really fun because I'm finally having to think about exactly what I want things to look like. Today, the MIDI automations, LFOs, etc, are all super simple and most of them are copied from the same knob asset. But my plan is to make every single kind of automation a unique 3D thing, hopefully with a visual and animation that conveys its purpose, at least to the extent that I can imagine ways to do so.
Weird Filament bug on Metal backend
Captain's Log: Stardate 78280.5
The last couple of days I continued to work through the various bugs that I had noted with the new Filament renderer. In particular, what I was looking at was that on MacOS with the Metal backend, if I changed the render quality settings enough, eventually I could pretty reliably get it into a state where there were weird graphical glitches like the one below.
I was able to get other weird glitches as well, like everything becoming green-tinted. At first I thought there was some specific configuration that caused it, but eventually I realized that it was kind of random, and really difficult to pinpoint exactly how to reproduce it. Sometimes things would work fine until I moved the camera around a bit, and then it would go corrupt.
The first thing I tried was to strip down my renderer to the absolute basics, getting rid of any code that wasn't absolutely necessary to reproduce the issue. This was disconcerting, because I was able to remove nearly all the renderer code and still get glitches.
The next thing I tried was to reproduce the glitches in one of Filament's demo apps, the gltf_viewer. I tried and tried and nothing I did would cause the glitches.
I started looking for any difference between how the gltf_viewer and Anukari worked. The way we set up the CAMetalLayer is different, so I transplanted my CAMetalLayer code into gltf_viewer, and it worked just fine.
Next I started from the other direction: removing things from gltf_viewer one by one until something weird happened. It took quite a bit of trial and error, but I found that when I removed the code that configured the renderer to clear the pixels before a render pass, I could get weird corruption to happen.
This is pretty weird, since in a 3D renderer when you have a skybox, clearing the color channel is unnecessary since you'll be overwriting the entire thing with skybox pixels anyway. But I went back to Anukari and turned on clearing, and it completely fixed the problem. I don't know what the bug is in the Filament Metal backend exactly, but there's definitely something wrong with it.
Anyway, there are no more glitches in Anukari on MacOS, and I filed a bug for the Filament folks to see if they'd like to fix it: https://github.com/google/filament/issues/8229.
Lions, tigers, and high-DPI, oh my
Captain's Log: Stardate 78275.5
While continuing to work on making the new 3D rendering engine and associated 2D GUI code more robust, I eventually came to a TODO I'd had for a while which was to double-check that things worked well when the little Auto-Scale Plug-In Window item was un-checked in Ableton Live:
This little obscure option ended up taking me down a multi-day rabbit hole learning about high-DPI handling in Windows.
Taking a step back, the basic history here is that around 2011, Apple introduced the Retina display, which essentially was a computer screen with double the DPI of a normal monitor. Here DPI means "dots per inch," that is, the number of pixels in a row an inch long. It didn't take super long for high-DPI displays to catch on with non-Apple PC manufacturers.
The difficulty with high-DPI displays is that historically, desktop applications tend to have their GUI layouts written using pixels as the main measurement. So, a button might be sized as 64x128 pixels. When most monitors had similar DPIs, this worked reasonably well. However, when that application is run on a monitor where each pixel is physically half as wide, that 64x128 pixel button might be much too small. Perhaps a better example is font rendering, where if a text area is defined in terms of pixels, the font might become so small as to be unreadable on a high-DPI display.
Apple's solution to this problem was clever: by standardizing their Retina displays at exactly double the DPI of a normal display, the pixel dimension calculations in old apps could simply be scaled up by a factor of 2 by the OS. Because this is an exact doubling, even when things like bitmaps are scaled, they come out sharp (just lower resolution). And for things like fonts, or other drawing primitives, the OS-level rendering engine can rasterize things at the full 2x pixel resolution behind the scenes to get high-resolution results.
Microsoft had a more difficult problem to solve. Since Microsoft has much less control over what hardware its OS runs on, Windows has to provide a variety of scaling factors, for compatibility with monitors at various DPIs. In practice this is a 100% - 500% slider that users can adjust to change the size of the Windows GUI:
This brings about two major issues.
First, because this scaling factor is not an integer (like Apple's 2x scale), old apps written before high-DPI monitors were a thing look awful when they're scaled up by the OS. There's just no way around it; especially things like fonts are just God-awful when you blow up 1 pixel to, say, 1.75 pixels. It's impossible to make this look good. So if you're ever running an older Windows app with this scale set to something that isn't a multiple of 1x, this is why it looks blurry and awful. (And this is why the settings menu will recommend 200% or 300% depending on your monitor, because those will at least scale somewhat better.)
The second issue is much more subtle. Because Windows runs on such heterogeneous hardware, it provides different scaling factors for each monitor. Without this, if a user had two displays where one had double the DPI of the other, they could never get apps to appear at a good size on both monitors.
From what I can tell, having come into all this after the fact, Microsoft went through several iterations of poor support for high-DPI monitors. This started with an API for applications to declare their "DPI Awareness" at the process level. Basically, by default apps are registered as UNAWARE of DPI issues, and Windows will automatically bitmap-scale them up, so they'll be large enough to read but awfully blurry. The DPI-awareness API allows a newer application to register itself as SYSTEM_AWARE, meaning, "I know how to scale myself up and down in a way that will look sharp, based on a global DPI setting." On a single monitor system, this would largely work, with a big caveat I'll discuss below.
But, not all systems are single-monitor. So Microsoft introduced another DPI-awareness setting called PER_MONITOR_AWARE. This is more complex for an application to handle, because it has to respond to scale changes if, for example, the user drags the window from a low-DPI monitor to a high-DPI monitor. In other words, there's no longer just one global DPI for the application to scale to.
As an absolutely wonderful tidbit of history, the original PER_MONITOR_AWARE setting was actually buggy and failed to account for a bunch of important scenarios. And thus Microsoft had to introduce yet another setting called PER_MONITOR_AWARE_V2 that fixed these issues.
(There's even a super edge-cased setting UNAWARE_GDISCALED for thins like fonts in GDI apps to look slightly less horrible even though they are bitmap-scaled. But let's ignore that.)
Okay, so this is already all a huge mess, and obviously very complicated for Windows applications to deal with. It is much more complex than Apple's simple "blow everything up 2x" strategy.
But wait, there's more!™
Yes, actually things get more complicated, and this is where the history starts to tie in to problems for Anukari. You see, Anukari will primarily be used as a VST plugin in host applications (DAWs) like Ableton Live. Now, think about this: a host application like Ableton Live itself will have some DPI-awareness setting for its own GUI. But VST plugins are DLLs loaded into the host process, and those VST plugins might have different DPI-awareness. So what if the host is DPI-aware and the user loads an old VST plugin that is not DPI-aware?
Yes, this is an actual nightmare, and Windows added an API to handle it: it is now possible to declare the DPI-awareness not just for a process, but for a specific thread within that process. This means that different threads can have different DPI-awareness, and the way it works is that when a native OS window is created, it inherits the DPI-awareness of the thread that created it. So now each process, thread, and window in a Windows application has its own associated DPI-awareness from the list of 5 different awareness modes.
(I am pretty sure that Gary Larsen did a comic about Satan welcoming a software engineer to hell, and their job was to deal with DPI-awareness in Windows applications. "Your room is right in here, Maestro.")
So, back to the little Auto-Scale Plug-In Window menu item in Ableton Live. This is a per-VST-plugin setting, and it is on by default. What "on" means is that Ableton will set up the main thread for the VST plugin with a DPI-awareness of UNAWARE, and will let the OS scale it up in an ugly way. That's right, Ableton makes new, fancy, DPI-aware plugins look terrible by default.
Disabling this setting makes Ableton set up the main thread for the VST plugin in a way that is DPI-aware, allowing the VST plugin to scale itself in a way that looks nice and sharp. But, obviously, the plugin needs to actually be DPI-aware for this to work. If not, it will render weird, possibly only drawing its GUI to a part of the window, unscaled, with black bars around it.
In the case of Anukari, most of the GUI scaled itself up and looked good, but the 3D renderer did not scale up. Actually, something weirder happened. The 3D renderer window was actually scaled to the proper size, but the viewport within the 3D renderer to which the 3D scene was drawn was not scaled up, and thus occupied a sub-portion of the window.
This was weird to me, because I thought my code was taking into account the DPI scaling amount. And it turns out that it was, but there was a deeper, more demented issue. I joked about hell before with the Windows DPI-awareness APIs, but this issue turned out to be in the much worse category of eldrich driver bug horrors.
After banging my head against the wall, I finally found that the Vulkan API was, when given a window with correct pixel dimensions for a high-DPI monitor, generating a swap chain for that window with pre-scaling dimensions. In concrete terms, this bug means that in a DPI-aware context with 125% scaling, if I give Vulkan a 1250x1250 pixel window, it will give me back a swap chain of 1000x1000 pixels, and refuse to give one with the correct number of pixels even if I try to force it. Which is flat-out broken.
Now, why am I seemingly the only person to have noticed this? It's a huge gaping hole that breaks any Vulkan application. Well... this bug only happens if the Vulkan client code has a per-thread DPI-awareness set that does not match the native window's DPI-awareness. In other words, the NVIDIA driver doesn't correctly handle per-window DPI-awareness. Let me spell that out: the NVIDIA driver has not been updated to work correctly with a Windows API that was introduced 8 years ago in 2016. I reported the bug to NVIDIA but doubt I'll get a response, as I found reports of similar issues with OpenGL from 4 years ago with no resolution.
Fortunately there is a simple workaround, which is to change the rendering thread's DPI-awareness to that of the window it is rendering to, and the Filament folks were quick to accept my PR to implement this. It's not perfect, because an application could technically request Vulkan swap chains for multiple windows that have different DPI-awareness contexts, but... it will do.