upvote
I use one model for coding and another writing tests for that very reason. It’s surprisingly good at TDD
reply
I find this fascinating because it's the sort of anthropomorphism that betrays a fundamental understanding of what an LLM is. Language models are not people. You can just achieve the same thing with a fresh context window. The only solid technical reason you'd want a different model is if you find a certain model produces better code and another produces better reviews. Nobody has really tested this, of course.
reply
I believe the theory isn't that one is better than the other, but that different models would make different mistakes, so you can be more confident in the places where the code and tests agree.
reply
I read that to mean you can arm it with a harness that you design informing the user that tests pass. A LLM can leverage this to run tests faster than I would run the same harness myself. You can then have any programmatic logic needed to support that usage sufficient to cover your use case and have a degree of certainty that the product at least passed those tests.
reply