The chaos monkey lives
In the last couple of days I finally got around to building the "chaos monkey" that I've wanted to have for a long time. The chaos monkey is a script that randomly interacts with the Anukari GUI with mouse and keyboard events, sending them rapidly and with intent to cause crashes.
I first heard about the idea of a chaos monkey from Netflix, who have a system that randomly kills datacenter jobs. This is a really good idea, because you never actually know that you have N+1 redundancy until one of the N jobs/servers/datacenters actually goes down. Too many times I have seen systems that supposedly had N+1 redundancy die when just one cluster failed, because nobody had tested this, and surprise, the configuration somehow actually depends on all the clusters being up. Netflix has the chaos monkey, and at Google we had DiRT testing, where we simulated things like datacenter failures on a regular basis.
But the "monkey" concept goes back to 1983 with Apple testing MacPaint. Wikipedia claims that the Apple Macintosh didn't have enough resources to do much testing, so Steve Capps wrote the Monkey program which automatically generated random mouse and keyboard inputs. I read a little bit about the original Monkey and it's funny how little has changed since then. They had the problem that it only ran for around 20 minutes at first, because it would always end up finding the application quit menu. I had the same problem, and Anukari now has a "monkey mode" which disables a few things like the quit menu, but also dangerous things like saving files, etc.
The Anukari chaos monkey is decently sophisticated at this point. It generates all kinds of random mouse and keyboard inputs, including weird horrible stuff like randomly adding modifiers and pressing keys during a mouse drag. It knows how to move and resize the window (since resizing has been a source of crashes in the past). It knows about all the hotkeys that Anukari supports, and presses them all rapidly. I really hate watching it work because it's just torturing the software.
The chaos monkey has already found a couple of crashes and several less painful bugs, which I have fixed. One of the crashes was something completely I completely didn't expect, and didn't think was possible, having to do with keyboard hotkey events deleting entities while a slider was being dragged to edit the parameters of such entities. I never would have tested this manually because I didn't think it was possible.
The chaos monkey is pretty simple. The biggest challenges were just keeping it from wreaking havoc on my workstation. I'm using pyautogui, which generates OS-level input events, meaning that the events will get sent to whatever window is active. So at the start, if Anukari crashed, the chaos monkey would start torturing e.g. VSCode or Chrome or something. It was horrible, and a couple of times it got loose and went crazy. It also figured out how to send OS-level hotkeys to open the task manager, etc.
Eventually the main safety protection I ended up implementing is that prior to each mouse or keyboard event, the script uses the win32 APIs to query the window under the mouse, and verifies that it's Anukari. There's some fiddly stuff here, like figuring out whether a window has the same process ID as Anukari (some pop-up menus don't have Anukari as a parent window), and some special handling for file browser menus, which don't even share the process ID. But overall I've gotten it to the point where I have let it run for hours on my desktop without worry.
The longest Anukari has run now with the Chaos monkey is about 10 hours with no crashes. Other things looked good too, for example, it doesn't leak memory. I have a few more ideas on how to make the chaos monkey even more likely to catch bugs, but for now I'm pretty satisfied.
Here's a quick video of the chaos monkey interacting with Anukari. Note that during the periods where the mouse isn't doing anything, it's mashing hotkeys like crazy. I'm starting to feel much more confident about Anukari's stability.