upvote
I've always thought that writing good tests (unit, integration or e2e) is harder than the actual coding by maybe an order of magnitude.

The tricky part of unit tests is coming up with creative mocks and ways to simulate various situations based on the input data, w/o touching the actual code.

For integration tests, it's massaging the test data and inputs to hit every edge case of an endpoint.

For e2e tests, it's massaging the data, finding selectors that aren't going to break every time the html is changed, and trying to winnow down to the important things to test - since exhaustive e2e tests need hours to run and are a full-time job to maintain. You want to test all the main flows, but also stuff like handling a back-end system failure - which doesn't get tested in smoke tests or normal user operations.

That's a ton of creativity for AI to handle. You pretty much have to tell it every test and how to build it.

reply
The false comfort usually comes from line coverage. I had a model at 97.2% coverage, 92 tests. Ran equivalence partitioning on it (partition inputs into classes, test one from each) and found 6 real gaps: a search scope testing 1 of 5 fields, a scope checking return type but not filtering logic, a missing state machine branch. SimpleCov said the file was covered. The logical input space was not. The technique is old (ISTQB calls it specification-based testing) but the manual overhead made it impractical until recently. Agents made it possible to apply across 60+ models, which is the one thing they have changed for testing so far.
reply
Use tla+ and have it go back and forth with you to spec out your system behavior then iterate on it trying to link the tla+ spec with the actual code implementing it

Pull out as many pure functions as possible and exhaustively test the input and output mappings.

reply
[dead]
reply