I agree with most of your post and fled the AMD ecosystem some time ago because of the machine learning situation, but their problem seemed to be more the firmware bugs and memory management of compute shaders than the higher level libraries.
The obvious solution to this one would be not to use ROCm. ROCm has always been a bit of a train wreck for small users and it doesn't seem to do anything special anyway. The way forward would be something more like Vulkan which the server that today's link points to seems to be using. The existence of a badly managed software package doesn't really imply that users have to use it, they can use an alternative.
It would be nice if AMD sorts themselves out though. The NVidia driver situation on linux is painful and if AMD can reliably run LLMs without the hardware locking then I'd much rather move back to using their products.