Bad memory access on the GPU: pain
Captain's Log: Stardate 77823.5
I've noticed a pattern in my devlog updates, where frequently a day of triumph is followed by a day of pain. Today was a day of pain.
I set out this morning to clean up and thoroughly test the new envelope follower feature, and immediately ran into a weird problem where ONLY the tests would fail due to OpenCL error -44: "Invalid Program" -- running the binary myself, it worked just fine. The error was especially weird, because normally bad opencl code that produces error -44 would also come with build info strings that gave compiler errors, but this time there was no build info at all. Just the -44 error with no explanation whatsoever.
First I ruled out anything that might differ between the user binary and the test binary. I disabled opencl code obfuscation, compared the initialization logic, etc. None of that helped.
Then I noticed that some of the tests that executed opencl code succeeded. Eventually I narrowed it down to the fact that after the simulator fuzz test ran, all subsequent opencl tests would fail. This was really weird, because I'm very careful to completely destroy and reinitialize the opencl environment between tests, for exactly the reason that I don't want them influencing one another.
I banged my head against the wall quite a bit, finding several small bugs, each time thinking AHA! but none of them solved the problem. Finally, I discovered and fixed a bug that corrected the issue: the c++ code was writing an invalid data structure that caused the opencl code to access unallocated memory.
The crazy thing here is that for some reason, this issue persisted even after the opencl environment had been torn down -- the entire process had to be terminated before it was cleared. This would appear to be a driver bug.
Anyway, despite the pain, I am THRILLED that the simulator fuzz test caught this. It was extremely subtle, and would have been 1,000x harder to track down if it happened on a user's machine.