GPU memory corruption is fun
Captain's Log: Stardate 77667
Sigh, my fix for the determinism described in my last Captain's Log entry was a correct fix, but I accidentally committed a line of code that I had added while going down what ultimately ended up to be an implementation dead-end, which caused me so much grief over the last day. I had originally planned to initialize the PRNG seeds in one place, but later realized it had to be done in a second place. I moved the initialization code, but I failed to remove the OpenCL memory map code. And the map was done using CL_MAP_WRITE_INVALIDATE_REGION.
The problem there is that that invalidate flag allows the OpenCL implementation to optimize things, because it can assume that the calling process overwrote the entirety of the memory region. So it doesn't need to initialize that memory region with anything because you're going to overwrite it all anyway. That's faster! But if you map it that way and don't actually overwrite it... you're gonna have a bad time.
Sadly, because the GPU memory that I was corrupting was the PRNG seeds, it didn't cause an obvious bug. It just made the randomness nondeterministic. Which is really hard to catch! But fortunately I added a test to make sure that repeatedly toggling the simulation pause feature did not affect the audio output. That test failed for reasons that took me a day to understand, and it helped me catch this nasty issue.
I keep thinking I'm done with the tests, and then I keep running into weirder/subtler issues. Which is frustrating, but I just need to keep telling myself that every bug I find this way is a bug that wasn't discovered by a user, in a live music production setting... 😄