I couldn't come up with a single failure mode the agent with a gpt5.x model behind it couldn't one shot. I created socket overruns.. dangling file descriptors.. badly configured systemd units.. busted route tables.. "failed" volume mounts..
Had to start creating failures of internal services the models couldn't have been trained on and it was still hard to have scenarios it couldn't one shot.