No, writing an advertisement is not weird. What's weird is that it's top of HN. Or really, no, this isn't weird either if you think about it -- people lookin for a gotcha "Oh see, that new model really isn't that good/it's surely hitting a wall/plateau any day now" upvoted it.
It's the flaw in the "given enough eyeballs, all bugs are shallow" argument. Because eyeballs grow tired of looking at endless lines of code.
Machines on the other hand are excellent at this. They don't get bored, they just keep doing what they are told to do with no drop-off in attention or focus.
Would it be cheaper than Claude Mythos doing it? No idea. Maybe, maybe not.
But it’s weird how we’re willing to throw away money to a megacorp to do it with “automation” for potentially just as much if not more as it would cost to just have big bounty program or hiring someone for nearly the same cost and doing it “normally”.
It would really have to be substantially less cost for me to even consider doing it with a bot.
So would I, but it doesn't negate that we, humans, are bad at this. We will get bored and our focus will begin to drift. We might not notice it, we might not want to admit it, but after a few continuous hours we will start missing things.
And if there were, the cost would be more like $20M than 20K.
Having all code reviewed for security, by some level of LLM, should be standard at this point.
I think people forget that it's hard to be clever and tidy 100% of the time. Big programs take a lot of discipline and an understanding of the context that can be really hard to maintain. This is one of several reasons that my second draft or third draft of code is almost always considerably better than the first draft.
The thesis is, the tooling is what matters - the tools (what they call the harness) can turn a dumb llm into a smart llm.
The general approach without LLMs doesn't work. 50 companies have built products to do exactly what you propose here; they're called static application security testing (SAST) tools, or, colloquially, code scanners. In practice, getting every "suspicious" code pattern in a repository pointed out isn't highly valuable, because every codebase is awash in them, and few of them pan out as actual vulnerabilities (because attacker-controlled data never hits them, or because the missing security constraint is enforced somewhere else in the call chain).
Could it work with LLMs? Maybe? But there's a big open question right now about whether hyperspecific prompts make agents more effective at finding vulnerabilities (by sparing context and priming with likely problems) or less effective (by introducing path dependent attractors and also eliminating the likelihood of spotting vulnerabilities not directly in the SAST pattern book).
“PKI is easy to break if someone gives us the prime factors to start with!”