upvote
This isn't really much of an excuse given contemporary models though. My current game project has a GUI editor mode and it was not difficult at all whatsoever to set it up such that whenever I run a debug build of the game:

- It opens to the editor mode rather than the gameplay mode on launch

- It makes a .run/ directory next to the executable if one doesn't already exist

- It makes a timestamped directory within .run/ for this current debug run

- It automatically records stdout to stdout.txt, stderr to stderr.txt, and a crash.txt if the game crashes, in the directory for this run

- When the “take debug screenshot” function is invoked (which can be done by pressing F12), it saves a timestamped (based on time since executable launched) screenshot in the directory for this run

- Editor actions and 3D camera movements are recorded to playback.txt in the directory for this run

With all of this in place, I can do a debug build, run the game, do something in the editor, and take one or more screenshots where things went wrong. Then, Codex can see the log files and screenshots and try to diagnose the problem. When attempting to fix the problem, it can automatically recompile the debug build and rerun it with a launch option that plays back the latest recording file, which does the same sequence of editor actions/camera movements and takes screenshots at the same points in the process. Then it can compare this to the initial recorded run and see what needs to be fixed.

We could be having a GUI renaissance right now but for various primarily aesthetic reasons people are churning out TUIs, and personally I think it's a huge mistake.

reply
for snapshot tests it seems better to diff a data representation such as some yaml string, than to diff UIs
reply
The whole UI seems better for LLMs to consume and also displays nicely in-editor for humans. Test failures become failing screenshot tests essentially, which are really comfortable changes to review.
reply