upvote
TDD helps a lot, but it’s no guarantee - LLM is smart enough to “fake” the code to pass tests .

I’m working on project - a password manger, where I have full end to end test harnesses - cli client makes changes, sync them to the server and then observe the data in iOS app running in the emulator. More then once I noticed codex just hard coded expected values from test harnesses directly into UI layout in iOS app to make the test pass .

Similar issues in the crypto layer - test were written first , then code was written . During the review I noticed that code was made to just pass the test - check if signature values exists instead of checking if signature is valid. LLM can help with code review as well, but it has to be guided specifically what to look for for. This is with codex 5.4 model

reply
That's a great approach, though I'd also recommend setting up a strong basis for linting, type checking, compilation, etc depending on the language. An LLM given a full test suite and guard rails of basic code style rules will likely do a pretty good job.

I would find it a bit tricky to write a full test suite for a product without any code though. You'd need to understand the architecture a bit and likely end up assuming, or mocking, what helpers, classes, config, etc will be built.

reply
You absolutely can. This is one of recommended directions with agentic coding. But you can go farther and ask llm to write tests too. The review/approve them.
reply
Yes, I mostly do spec driven developement. And at the design stage, I always add in tests. I repeat this pattern for any new features or bug fixes, get the agent to write a test (unit, intergration or playwright based), reproduce the issue and then implement the change and retest etc... and retest using all the other tests.
reply
To expand on the "Yes": the AI tools work extremely well when they can test for success. Once you have the tests as you'd like them, you may want to tell the LLM not to modify the tests because you can run into situations where it'll "fix" the tests rather than fixing the code.
reply
yes. depending on the techstack your experience might be better or worse. HTML/CSS/React/Go worked great, but it struggled with Swift (which I had no experience in).
reply
Yes
reply