devlog

Way more detail than you ever wanted to know about the development of the Anukari 3D Physics Synthesizer [see archive]

Working better on some Radeon chips

Captain's Log: Stardate 79013.9

The issue with Radeon

As discussed in a previous post, I've been fighting with Radeon mobile chips, specifically the gfx90c. The problem originally presented with a user that had both an NVIDIA and a Radeon chip, and even though they were using the NVIDIA chip for Anukari, somehow in the 0.9.6 release something changed that caused the Radeon drivers to crash internally (i.e. the drivers did not return an error code, they were simply aborting the process).

I'd like to eventually offer official support for Radeon chips. That's still likely a ways off, but at the very least I don't want things crashing. Anukari is extremely careful about how it interacts with the GPU, and when a particular GPU is not usable, it should (preferred) simply pick a different GPU, or at the very least, show a helpful message in the GUI explaining the situation.

Unfortunately it was difficult to debug this issue remotely. The user was kind enough to run an instrumented binary that confirmed that Anukari was calling clBuildProgram() with perfectly valid arguments, and it was simply aborting. I really needed to run Anukari under a debugger to learn more.

So I found out what laptop my bug-reporting user had, and ordered an inexpensive used Lenovo Ideapad 5 on eBay. I've had to buy a lot of testing hardware, and I've saved thousands of dollars by buying it all second-hand or refurbished. In this case it did take two attempts, as the first Ideapad 5 I received was super broken. But the second one works just fine.

Investigation

After getting the laptop set up and running Anukari under the MSVC debugger, I instantly was seeing debug output like this just prior to the driver crash:

LLVM ERROR: Cannot select: 0x1ce8fdea678:
ch = store 0x1ce8fe462a8, 0x1ce8fde8fb8, 0x1ce8fe470b8,
  undef:i32
  0x1ce8fde8fb8: f32,ch = load 0x1ce8fdd7638, 0x1ce8fde8b80,
  undef:i64
    0x1ce8fde8b80: i64 = add 0x1ce8fcbc600, Constant:i64<294>
      0x1ce8fcbc600: i64,ch,glue = LD_64 
        TargetExternalSymbol:i64'arguments', Register:i64 %noreg, 
        TargetConstant:i32<0>, TargetConstant:i32<4>,
        TargetConstant:i32<4>, TargetConstant:i32<8>,
        TargetConstant:i32<34>, TargetConstant:i32<1>, 0x1ce8a26ec90
        0x1ce8fce97f8: i64 = TargetExternalSymbol'arguments'
        0x1ce8fcc3148: i64 = Register %noreg
        0x1ce8fcbc330: i32 = TargetConstant<0>
        0x1ce8fcbbfe8: i32 = TargetConstant<4>
        0x1ce8fcbbfe8: i32 = TargetConstant<4>
        0x1ce8fcbbe08: i32 = TargetConstant<8>
        0x1ce8fcbc768: i32 = TargetConstant<34>
        0x1ce8fcc1f58: i32 = TargetConstant<1>
      0x1ce8fde8b08: i64 = Constant<294>
    0x1ce8fcc2c20: i64 = undef
  0x1ce8fe470b8: i32 = add FrameIndex:i32<30>, Constant:i32<294>
    0x1ce8fdea420: i32 = FrameIndex<30>
    0x1ce8fe47040: i32 = Constant<294>
  0x1ce8fdea498: i32 = undef

First of all, I want to call out AMD on their exceptionally shoddy driver implementation. It's just absurd that they'd allow a compilation error internal to the driver to abort the whole process. Clearly in this case clBuildProgram() should return CL_BUILD_PROGRAM_FAILURE, and the program log (the compiler error text) should be filled with something helpful, at a minimum, the raw LLVM output, but preferably something more readable. This is intern-level code, in a Windows kernel driver. Wow.

After reading through this carefully, all I could really make of it was that LLVM was unable to find a machine instruction to read data from this UpdateEntitiesArguments struct in addrspace=7 and write it to memory in addrspace=1. From context, I could guess that addrspace=1 is private (thread) memory, and addrspace=7 is whatever memory the kernel arguments are stored in. I had a harder time understanding why it couldn't find such an instruction. I thought maybe it had to do with an alignment problem, but wasn't sure.

This struct contains a number of fields, and I couldn't tell from the error which field was the problem. So I just used a brute-force approach and commented out most of the kernel code, and added code back in slowly. It compiled fine until I uncommented a line of code like float x = arguments.field[i]. I did some checking to ensure that field was aligned in a sane way, and after confirming that, I came to the conclusion that the gfx90c chip simply does not have an instruction for loading memory from addrspace=7 with a dynamic offset. In other words, the gfx90c appears to lack the ability to address arrays in argument memory with a non-constant offset.

Which, as far as I can tell, means that the gfx90c really doesn't support OpenCL properly. Every other OpenCL implementation I've used can do this, including NVIDIA, Intel Iris, Apple, and even newer Radeon chips like the gfx1036. I don't see anything in the OpenCL specification that would indicate that this is a limitation.

But even assuming that it's somehow within specs for an OpenCL implementation not to support this feature, obviously aborting in the driver is completely unreasonable behavior. Again, this is a really shoddy implementation, and when people ask about why Anukari doesn't yet officially support Radeon chips, this is the kind of reason that I point to. The drivers are buggy, and worse they are inconsistent across the hardware.

The (very simple) workaround

Anyway, I have very good (performance) reasons for storing some small constant-size arrays (with dynamic indexes) in kernel arguments, but those reasons really apply more to the CUDA backend. So I made some simple changes to Anukari to store these small arrays in constant device memory, and the gfx90c implementation now works just fine.

Given that I upgraded my primary workstation recently to a very new AMD Ryzen CPU, I now have two Radeon test chips: the gfx90c in the Ideapad 5, and the gfx1036 that's built-in to my Ryzen. The Anukari GPU code appears to work flawlessly on both, though doesn't perform all that well on either. Next up will be doing more testing of the Vulkan graphics, which have also been a pain point in the past on Radeon chips.

by Evan at 7/19/2025, 9:48:15 PMgpu bug radeon

newer postarchiveolder post