Most of those commits since the last few months are thanks to Codex reviews (but the code is not AI generated): 5.5 since it came out, and 5.4 etc before that, almost always on Extra High because it's for a framework that underlies the other stuff I do so I want make to sure everything's correct.
Sometimes I have to run multiple passes on the same task: I rarely continue any session beyond 4-5 prompts to avoid "bloat" or accumulate "stale context", so sometimes Codex finds different stuff in subsequent reviews of the same file/subsystem.
The project is modular enough where each file can be considered standalone with only 1-2 dependencies, and I already used to write a lot of comments everywhere (something some people laughed at), so maybe that helps the AI along?
I'm taking this, along with my own experience, to mean that the GPTs are cheaper to use for refactors of an existing body of work than they are for creating a new one.
(And perhaps part of that is in the name? These "LLM" contraptions are very good at translation, after all. And tokens seem to relate more to concepts than to specific phrases or words.)