devlog

Way more detail than you ever wanted to know about the development of the Anukari 3D Physics Synthesizer [see archive]

More workarounds for Apple

Captain's Log: Stardate 78573.2

Automatic Bypassing Workaround

While testing the new AnukariEffect plugin in various DAWs for compatibility, I found that it was doing some very strange stuff in GarageBand (and Logic Pro, which seems to share the same internals). I had noticed weird stuff in GarageBand before even with the instrument plugin, and had a TODO to do a deep dive, so I figured that now was as good a time as any to finally get the plugin working well with Apple's DAWs.

What I had seen in the past with the Anukari (instrument) plugin was that sometimes the physics simulation would inexplicably stop working. I had seen this at GarageBand startup, but also after it had been open for a while. I couldn't see any reason in Anukari's logs for the problem, and occasionally it would just start working again. But this was fairly rare and I hadn't had time to find a way to reproduce it.

But with the AnukariEffect plugin, this was happening constantly. Since it was easy to reproduce, I pretty quickly found out that GarageBand will simply stop calling into the plugin's ProcessBlock function, which is where audio processing happens, and in Anukari, is where the physics simulation occurs.

It turns out that GarageBand is extremely aggressive about this. It has some heuristics about when a plugin is no longer producing audio, and at that time it will stop calling into the plugin, to save CPU/power. For example, for an instrument, if it hasn't received MIDI input in a while it might be automatically bypassed. And for an effect, if the track is not playing or the effect is not receiving audio input, it will be automatically bypassed.

This is reasonable behavior, and other DAWs do it too, but for example in a VST3 plugin (as opposed to an AudioUnit) the plugin can specify the number of "tail samples" as being kInfiniteTail, in other words, the plugin can state "I might keep generating audio samples forever even without input." VST3 plugins can also set their sub-type to Generator, which also communicates to the DAW that they might continue to generate audio even without input. (Note that an AudioUnit can be a generator at the top level, but instruments/effects can't also be generators. Which is a pretty big oversight.)

And other DAWs that do aggressive automatic bypassing, like Cubase or Nuendo, they provide the option to disable the feature. But of course having a knob like this is anathema to Apple, especially in GarageBand, and thus it cannot be disabled.

Anyway, for an instrument or effect plugin running in an Apple DAW, aggressive automatic bypassing is just a fact of life. And if that plugin is a continuous physics simulation like Anukari, this is a huge problem, because the simulation part of the plugin will become unresponsive, and furthermore, weird discontinuous things may happen if it is bypassed and un-bypassed at inopportune moments.

So as usual for working with Apple, the solution to Apple's oversimplification of the problem is to push more complexity into the non-Apple software: Anukari now can detect that it has been automatically bypassed, and will seamlessly transfer ownership the physics simulation to a background thread. When DAW processing resumes, it seamlessly transfers ownership back. This is optional (but highly recommended), so users who really need to save power can disable it.

This really is much more complicated than I'd like, partly because Apple doesn't provide any indication that the plugin is bypassed. From what I can tell there's no notification whatsoever, except that ProcessBlock stops getting called. So detecting this condition requires a keepalive timer and a background thread that monitors it. Once it detects that ProcessBlock hasn't run for too long, it begins running the simulation directly. Then when ProcessBlock resumes being called, it detects that the keepalive is fresh again and stops.

There are some very tricky details here to do all this reliably in the real-time audio thread without priority inversion issues with a mutex. The keepalive timer is an atomic, and the monitoring thread never acquires the mutex unless the keepalive is stale. This does mean that the audio thread has to acquire the mutex for each audio block, but because we use the atomic keepalive timer to guarantee that the mutex will never be contended, this is OK, because on all the platforms where Anukari will run, an uncontended mutex acquisition is simply an atomic CAS operation. (This is a great tip I learned from Fabian Renn-Giles in his excellent ADC23 talk.)

There is one moment where the audio thread's mutex acquisition could be contended, which is when the automatic bypass is being lifted. The monitoring thread may be holding it while running the simulation itself. This is not a big deal though, because the monitoring thread releases the mutex after simulating each small sample block. The audio thread will try to acquire the mutex, fail, and return a silent buffer. But in doing so it will update the keepalive timer, and next time it runs it will acquire the mutex without contention. The reason this dropped block is not a big deal is that we're coming back from being bypassed anyway -- this just adds a few samples of latency before audio starts. No problem.

There's one last detail, which is that while all the above complexity keeps the physics simulation running, so the user can continue to interact with the plugin, there will not be any audio output while it's bypassed. This cannot be fixed, and could be a little confusing. So Anukari now displays a pulsating "BYPASSED" message on the master output level meter when it is in this state. And that message has a tooltip explaining how the DAW is doing potentially annoying things.

Less Waste Still Makes Haste

In my previous post Waste Makes Haste I wrote about how Anukari has to run a spin loop on a single GPU core to convince MacOS to actually clock up the GPU so that it performs well enough for Anukari to function.

That workaround continues to be extremely effective. However, while testing multiple Anukari instances, I realized that each instance was running a spin loop on the GPU, so e.g. 4 instances would run 4 spin loops. Running the one spin loop is pretty stupid, but gets the job done and is well worth it. But running 4 spin loops is purely wasteful, since only the 1 is required to get the GPU clocked up.

Fixing this requires coordination among all Anukari audio threads within the same process. Somehow a single audio thread needs to run the spin loop, and the others need to just do regular audio processing. But if the first thread running the loop is e.g. bypassed, another thread needs to pick up the work, and so on.

I ended up devising another overly-complicated solution here, which is to use another shared atomic keepalive timer. Each audio thread checks it periodically to see if it has expired, and if so, attempts a CAS to update it. If that CAS fails, it means some other thread got to it first. If the CAS succeeds, it means that this thread now owns the spin loop and needs to keep updating the keepalive. There are some other details, but this algorithm turned out to be mercifully easy to get right with just a couple of CAS operations and a nano timer. (And it doesn't even require that each thread sees only unique nano timestamps!)

An alternative solution would be to have an entirely separate thread run the GPU spin loop, instead of having an audio thread be responsible for tending to it. This could also be a good solution. However it would require its own tricky details so that the spin loop would pause when all audio threads were bypassed. And also it would require initializing some Metal state that each audio thread already initializes anyway. I will probably keep the current solution unless it proves unreliable, in which case I'll move to this alternative.

Mouse Cursor Hell

The last workaround I spent time on this last week is making custom mouse cursors work well in GarageBand and Logic.

Since Anukari has a somewhat sophisticated 3D editor, custom mouse cursors are very useful for helping make it clear what is happening. So for example, when the user drags the right mouse button to rotate the camera, the mouse cursor changes to a little rotation icon for the duration of the drag, and then goes back to being a pointer when the button is released.

Or, well, it goes back to being a pointer in every DAW except GarageBand and Logic, because of course Apple is doing something fucking weird with the mouse cursor in their DAWs. Humorously, as I investigated this issue, I discovered that the mouse cursor often gets stuck in GarageBand/Logic even without plugins, and that users have been complaining about this for at least 10 years. One user in a forum post basically said, "don't worry about the busted mouse cursors so much, you just get used to it." So Apple has been ignoring an obvious mouse cursor bug for a decade. Sounds about right.

Anyway, I narrowed down the problem to the fact that changing the mouse cursor using [NSCursor set] inside of a mouseDown or mouseUp event sometimes doesn't work. Err, it does work, in the sense that the call succeeds, and if you call [NSCursor currentCursor] it will return the one you just set. But visually the cursor will not change.

I tried about a billion things, and ultimately ended up with a workaround that force-sets the mouse cursor inside the next mouseMove or mouseDrag event (described in more detail here on the JUCE forums). This is not perfect, but it's pretty good, and much better than no workaround at all.

Yikes... as you can tell I'm pretty sick of dealing with compatibility with Apple's DAWs. But I'm not done yet. There are two more Apple-specific issues that I'm aware of, which hopefully I can address over the next week.

by Evan at 2/8/2025, 10:27:14 PMmacos gpu optimization bug

newer postarchiveolder post