devlog > graphics
Complications from custom 3D models
While working on the new 3D model assets for Anukari, one "little" TODO that I filed away was to make the mouse interactions with the 3D objects work correctly in the presence of custom models. This includes things like mouse-picking or box-dragging to select items.
In the first revision, because the 3D models were fixed, I simply hard-coded each entity type's hitbox via a cylinder or sphere, which worked great. However, with custom 3D models this is no longer tenable. The old hitboxes did not necessarily correspond to the shape or size of the custom models. This became really obvious and annoying quite quickly as we began to change the shapes of the models.
This was one of those problems that seems simple, and spirals into something much more complicated.
My first idea was to use Google Filament's mouse picking feature, which renders entity IDs to a hidden buffer and then samples that buffer to determine which entity was clicked. This has the advantage of being pixel-perfect, but it uses GPU resources to render the ID buffer, and it also requires the GUI thread to wait a couple frames for the Renderer thread to do the rendering and picking. Furthermore, it doesn't help at all with box-dragging, as this method is not capable of selecting entities that are visually obscured by closer entities. But in the end, the real killer for this approach was the requirement for the GUI thread to wait on the Renderer thread to complete a frame or two. This is doable, but architecturally problematic, due to the plans that I have to simplify the mutex situation for the underlying data model -- but that's a story for another day.
My second idea was to write some simple code to read the model geometry and approximate it with a sphere, box, or cylinder, and use my existing intersection code based on that shape. But I really, really don't want to find myself rewriting the mouse-picking code again in 3 months, and I decided that this approach just isn't good enough -- some 3D models would have clickable areas that were substantially different from their visual profile.
So finally I decided to just bite the bullet and use the Bullet Physics library for collision handling. It supports raycasting, which I use for mouse-picking, and then I use generalized convex collision detection for frustum picking. The documentation for Bullet sucks really hard, but with some help from ChatGPT it wasn't too bad to get up and running. The code now approximates the 3D model geometry with a simplified 42-dimensional convex hull, which is extremely fast for the collision methods I need, and approximates even weird shapes quite well (I tried using the full un-approximated 3D geometry for pixel-perfect picking, but it was too slow). I'm very happy with the results, and it seems that pretty much any 3D model someone can come up with will work well with mouse interactions.
The things that made this a week-long job rather than a 2-day job were the ancillary complications. The main issue is that while the old hard-coded hitboxes were fixed at compile-time, the new convex hull hitboxes are only known by the Renderer thread, and can dynamically change when the user changes the 3D skin preset. This introduced weird dependencies between parts of the codebase that formerly did not depend on the Renderer. I ended up solving this problem by creating an abstract EntityPicker interface which the Renderer implements, so at least the new dependencies are only on that interface rather than the Renderer itself.
An example here is when the user copies and pastes a group of entities. The data model code that does this has a bunch of interesting logic to figure out where the new entities should go, in order to avoid overlapping them with any existing entities. It's a tricky problem because we want them to go as close as possible to where the user is looking, but have to progressively fall back to worse locations if the best locations are not available. Anyway, this requires being able to query the AABBs of existing entities, which is now geometry-dependent.
Another example is when creating new entities. This is similar to copying and pasting, but the requirement to have the entity go near where the user clicked the mouse is less flexible. A final example is rotating entities, where the rotational sphere radius needs to be known, as well as the desired diameter for the "rotation cage' that appears around the entity to indicate it's being rotated.
Anyway, it took a few days but finally I think I have all of these use-cases working correctly. Fortunately I had unit tests for all this stuff, so that helped a lot. This is a pretty nice milestone, since I think this is the last "heavy lift" for the new 3D model configurability feature.
As usual, there are still a few fiddly details that I need to address. The biggest one is that it's a little slow when you delete 1,000 entities. This is an edge case, but it is noticeable and irritates me. I think I know what I'll do to speed it up, but we'll see.
Starting to micro-optimize, and 3D artwork
Captain's Log: Stardate 78310.9
First things first
Before doing any kind of subtle optimization, the first thing is of course to make sure that you can actually measure whether your so-called "optimizations" make things faster. Often things that seem like they should be faster either don't do anything, or even make things worse.
So to start, I added a simple timing framework to my goldens tests, which exercise every physics feature that Anukari supports. Now when I run those 70-or-so tests, each one computes a timing benchmark and stores it in a CSV file along with the current time, the GPU backend that was used for the simulation, some machine ID info, etc.
This gives me the ability to graph the overall change in runtime across all the golden tests, as well as look at each individual test to see which ones were sped up or slowed down.
I decided for now to just check this CSV file into git. This is really simple, but kind of cool: each time I make a micro-optimization, I commit the code along with the updated profiling stats in the CSV file. So I have a nice history of how much impact each code change had.
Micro-optimizing
While Anukari's physics simulation runs to my satisfaction on Windows (with a decent graphics card), it is still dismayingly slow on MacOS. It works for small presets, but there's little margin, and for example in Garage Band you get buffer overruns all the time (not sure why yet, but Garage Band seems to have a lot more overhead compared to other DAWs).
One huge advantage of porting the physics to Metal, aside from the speedups I've already seen from optimizing the way the memory buffers are configured, is that I can get meaningful results out of Apple's Metal GPU profiling tools. The most immediate finding is that the simulation is NOT primarily memory-bound, and rather the long pole is ALU saturation. This is pretty surprising, but I did a lot of work early on to make the memory layout efficient and compact, so it is believable. And I've also added a lot of physics features that require computation, so it checks out.
Thus I've begun micro-optimizing the code. This is a little frustrating, as Apple has a tendency to put all their good Metal information in videos instead of documents, and I hate videos for technical information like this. Fortunately they have transcripts, but that's not perfect, as they're obviously machine-generated, and also the speakers are often referencing slides I can't see. But as disorganized and poorly-formatted as Apple's guidance here is, they do provide some juicy nuggets.
One thing I'm doing is looking for places where I can replace floats with half-floats, because the ALU has double the bandwidth for halfs. That's pretty obvious, but it's tricky because for most physics features, halfs have way too little precision. But there are opportunities, for example with the envelope attack duration, I'm pretty sure half precision is still far beyond human perception.
Another thing I'm doing is trying to figure out how to get rid of 64-bit longs from the computations. I don't use them much, but the global sample number is a long (32 bits worth of samples is less than 1 day, so it could easily overflow). I also use longs for my GPU-based PRNG. In both cases I think it will be possible to avoid them, but this will require some thought.
There are a ton of other little things, like reordering struct fields so that Apple's compiler can automatically vectorize the loads, getting rid of a pesky int mod operation which can take hundreds of cycles, and switching array indexes from unsigned to signed ints. The latter is quite interesting, apparently Apple's fastest memory instructions depend on the guarantee that the offset can be represented by a signed int, and if you use an unsigned int the compiler can't make that guarantee and has to fall back to something slower.
3D Artwork
I've started to reach out to a few 3D artists about redoing all the 3D models for the various instrument components. I'm looking forward to getting professional models, as I think it will make the whole plugin much more beautiful and interesting to look at.
This will also be an opportunity to change things so that the animations are part of the 3D models themselves (using bone skinning) instead of hard-coded into the Anukari engine. This will be an advantage when users start creating (or commissioning) their own 3D models for the engine: they won't be limited to the built-in animations. They can do anything they want.
Anyway after talking to some artists, I quickly realized that I have vastly under-specified what I actually want things to look like. So I've spent a fair bit of time working on a visual style document, where I include a bunch of reference images for what I want each thing to look like, along with commentary on what I'm going for, etc. In some cases I'm hand-drawing the weird stuff I want because I can't find anything similar online.
This is a lot of work, but it's really fun because I'm finally having to think about exactly what I want things to look like. Today, the MIDI automations, LFOs, etc, are all super simple and most of them are copied from the same knob asset. But my plan is to make every single kind of automation a unique 3D thing, hopefully with a visual and animation that conveys its purpose, at least to the extent that I can imagine ways to do so.
Weird Filament bug on Metal backend
Captain's Log: Stardate 78280.5
The last couple of days I continued to work through the various bugs that I had noted with the new Filament renderer. In particular, what I was looking at was that on MacOS with the Metal backend, if I changed the render quality settings enough, eventually I could pretty reliably get it into a state where there were weird graphical glitches like the one below.
I was able to get other weird glitches as well, like everything becoming green-tinted. At first I thought there was some specific configuration that caused it, but eventually I realized that it was kind of random, and really difficult to pinpoint exactly how to reproduce it. Sometimes things would work fine until I moved the camera around a bit, and then it would go corrupt.
The first thing I tried was to strip down my renderer to the absolute basics, getting rid of any code that wasn't absolutely necessary to reproduce the issue. This was disconcerting, because I was able to remove nearly all the renderer code and still get glitches.
The next thing I tried was to reproduce the glitches in one of Filament's demo apps, the gltf_viewer. I tried and tried and nothing I did would cause the glitches.
I started looking for any difference between how the gltf_viewer and Anukari worked. The way we set up the CAMetalLayer is different, so I transplanted my CAMetalLayer code into gltf_viewer, and it worked just fine.
Next I started from the other direction: removing things from gltf_viewer one by one until something weird happened. It took quite a bit of trial and error, but I found that when I removed the code that configured the renderer to clear the pixels before a render pass, I could get weird corruption to happen.
This is pretty weird, since in a 3D renderer when you have a skybox, clearing the color channel is unnecessary since you'll be overwriting the entire thing with skybox pixels anyway. But I went back to Anukari and turned on clearing, and it completely fixed the problem. I don't know what the bug is in the Filament Metal backend exactly, but there's definitely something wrong with it.
Anyway, there are no more glitches in Anukari on MacOS, and I filed a bug for the Filament folks to see if they'd like to fix it: https://github.com/google/filament/issues/8229.