upvote
Unfortunately from our experience tests don’t scale as well as code.

First of all, static tests are very brittle: you rely on selectors, need wait times, and can’t really test a lot of dynamic content (think AI chats/interactions). Then it’s all the infrastructure around it: solving captchas, handling auth, handling email OTP (each of our agents has access to its own inbox), spinning up simulators and handling video recording and screenshots.

To ensure stable results we do a lot of harness engineering, where we inject trajectories of previous tests to ensure the stability and also the split into smaller steps helps to prevent context overload and decision fatigue.

Regarding security part, the product can operate solely without any access to the codebase, you can just give us a URL or a mobile app build and we will do the testing.

reply
Goodness I really didn't expect such lazy copy-pasting of responses for a YC company.
reply