[1] as the author knows (“MLX uses Metal to compile tensor operations for this accelerator. Somewhere in that stack, the computations are going very wrong”) there’s lots of soft- and firmware in-between the code being run and the hardware of the neural engine. The issue might well be somewhere in those.