upvote
[work at Mozilla]

I agree that LLMs are sometimes wrong, which is why this new method here is so valuable - it provides us with easily verifiable testcases rather than just some kind of analysis that could be right or wrong. Purely triaging through vulnerability reports that are static (i.e. no actual PoC) is very time consuming and false-positive prone (same issue with pure static analysis).

I can't really confirm the part about "local" bugs anymore though, but that might also be a model thing. When I did experiments longer ago, this was certainly true, esp. for the "one shot" approaches where you basically prompt it once with source code and want some analysis back. But this actually changed with agentic SDKs where more context can be pulled together automatically.

reply
Please, implement "name window" natively in Firefox.

I have to use chrome because the lack of it.

reply
I've seen fairly poor results from people asking AI agents to fill in coverage holes. Too many tests that either don't make sense, or add coverage without meaningfully testing anything.

If you're already at a very high coverage, the remaining bits are presumably just inherently difficult.

reply
Security has had pattern matching in traditional static analysis for a while. It wasn't great.

I've personally used two AI-first static analysis security tools and found great results, including interesting business logic issues, across my employers SaaS tech stack. We integrated one of the tools. I look forward to getting employer approval to say which, but that hasn't happened yet, sadly.

reply
This description is also pretty accurate for a lot of real-world SWEs, too. Local bugs are just easier to spot. Imperfect security boundaries often seem sufficient at first glance.
reply
But you're not a member of Anthropic's Red Team, with access to a specialist version of Claude.
reply
[dead]
reply