upvote
I think there is already papers and presentations on integrating these kind of iterative code understanding/verificaiton loops in harnesses. There may be some advantages over fuzzing alone. But I think the cost-benefit analysis is a lot more mixed/complex than anthropic would like people to believe. Sure you need human engineers but it's not like insurmountably hard for a non-expert to figure out
reply
That's funny, this is how I've been doing security testing in my code for a while now, minus the 'taint analysis'. Who knew I was ahead of the game. :P

In all seriousness though, it scares me that a lot of security-focused people seemingly haven't learned how LLMs work best for this stuff already.

You should always be breaking your code down into testable chunks, with sets of directions about how to chunk them and what to do with those chunks. Anyone just vaguely gesturing at their entire repo going, "find the security vulns" is not a serious dev/tester; we wouldn't accept that approach in manual secure coding processes/ SSDLCs.

reply
In a large codebase there will still be bugs in how these components interoperate with each other, bugs involving complex chaining of api logic or a temporal element. These are the kind of bugs fuzzers generally struggle at finding. I would be a little freaked out if LLMs started to get good at finding these. Everything I've seen so far seems similar to fuzzer finds.
reply