They are not able to comprehend that for anything more complicated than that, the code might compile, but the logical errors and failure to implement the specs start piling up.
Grok 4 Fast told me its own internal system prompt has rules against autonomous operation, so that might have something to do with it. I am having decent results with it though.