When they get stuck, I find adding debug that the model can access helps. + Sometimes you need to add something into the prompt to tell it to avoid some approach at a point.
Please tell me what CPU it is. I would give it a try. I doubt strongly a very well documented CPU can't be emulated by writing the code with modern AIs.
Interesting. When I had Claude write a language transpiler it always checked that tests passed before declaring a feature ready for PR. There was never a case where it gave up on achieving that goal.