Golden testing with audio clips
Captain's Log: Stardate 77642.8
Today I finished another substantial chunk of unit tests. I'm getting to be pretty satisfied with the test coverage, and I've found and fixed a couple more small bugs. I do think I'll add unit tests for a couple more modules, but it's getting a bit boring so I decided to take a slight detour to work on my "golden testing" idea.
Golden testing is something we did a lot at Google for huge pipelines. The basic idea is that you run a pipeline with some kind of predefined input to produce an output. You then thoroughly verify that output dataset through manual inspection, and save it as a "golden" example of what the pipeline should do, given that input. Later, after making changes to the pipeline, you run it again and compare its output to the golden dataset. If it's identical, you know that your changes didn't break anything. If it has diffs, you then manually explain them -- either you caught a bug, or maybe you decide the new output is better, in which case you make the new output the golden dataset for future comparison.
The nice thing about this form of regression testing is that it is highly sensitive -- it's really good at finding bugs that introduce changes that you didn't mean to introduce. The drawback is that it can often be sensitive to things that don't matter, wasting time on checking diffs that don't matter.
Anyway, I realized this kind of test could be fantastic for a VST plugin like Anukari. The way it works is that I save a preset file that has a mass/spring system with some property I want to test. For example, a really simple preset with just a single mallet connected to a single mass with a mic. Then I save a small MIDI file that plays the preset a little bit. Then there's a unit test that loads the preset, renders the audio from the MIDI clip, and outputs a .flac file. It compares the audio data with the "golden" file.
The thing that's so cool about this is that it's really easy to make a bunch of tests for all the different physics features, and once I've reviewed what each little test sounds like and I save a golden file, then I can make all kinds of changes to the simulation code and when the tests pass, I know with huge confidence that I haven't screwed anything up -- all kinds of test cases sound correct.
This should make working on performance optimization much easier, since I'll never have to fear that I screwed up the physics. And it will also make porting to CUDA and Metal much easier, because if the audio sample output from each of those backends matches the golden data, I know for sure that the port is working correctly.
So despite this being kind of a boring test thing, I'm really excited about being able to move faster again. And this was motivated by a recent bug -- I broke modulation for sensors, and didn't notice for weeks. With a good set of golden tests, that will never happen again.