upvote
"No one bothered to look" is how most vulnerabilities work. Systems development produces code artifacts with compounding complexity; it is extraordinarily difficult to keep up with it manually, as you know. A solution to that problem is big news.

Static analyzers will find all possible copies of unbounded data into smaller buffers (especially when the size of the target buffer is easily deduced). It will then report them whether or not every path to that code clamps the input. Which is why this approach doesn't work well in the Linux kernel in 2026.

reply
With a capable static analyzer that is not true. In many common cases they can deduce the possible ranges of values based on branching checks along the data flow path, and if that range falls within the buffer then it does not report it.
reply
Be specific. Which analyzer are you talking about and which specific targets are you saying they were successful at?
reply
Intrinsa's PREfix static source code analyzer would model the execution of the C/C++ code to determine values which would cause a fault.

IIRC they were using a C/C++ compiler front end from EDG to parse C/C++ code to a form they used for the simulation/analysis.

see https://web.eecs.umich.edu/~weimerw/2006-655/reading/bush-pr... for more info.

Microsoft bought Intrinsa several years ago.

reply
I'm sure this is very interesting work, but can you tell me what targets they've been successful surfacing exploitable vulnerabilities on, and what the experience of generating that success looked like? I'm aware of the large literature on static analysis; I've spent most of my career in vulnerability research.
reply
PREfix wasn't designed specifically for finding exploitable bugs - it was aimed somewhere in between Purify (runtime bug detection) and being a better lint.

One of the articles/papers I recall was that the big problem for PREfix when simulating the behaviour of code was the explosion in complexity if a given function had multiple paths through it (e.g. multiple if's/switch statements). PREfix had strategies to reduce the time spent in these highly complex functions.

Here's a 2004 link that discusses the limitations of PREfix's simulated analysis - https://www.microsoft.com/en-us/research/wp-content/uploads/...

The above article also talks about Microsoft's newer (for 2004) static analysis tools.

There's a Netscape engineer endorsement in a CNet article when they first released PREfix. see https://www.cnet.com/tech/tech-industry/component-bugs-stamp...

reply
> Not "hidden", but probably more like "no one bothered to look".

Well yeah. There weren't enough "someones" available to look. There are a finite number of qualified individuals with time available to look for bugs in OSS, resulting in a finite amount of bug finding capacity available in the world.

Or at least there was. That's what's changing as these models become competent enough to spot and validate bugs. That finite global capacity to find bugs is now increasing, and actual bugs are starting to be dredged up. This year will be very very interesting if models continue to increase in capability.

reply
I was just thinking about this and what it means for closed source code.

Many people with skin in the game will be spending tokens on hardening OSS bits they use, maybe even part of their build pipelines, but if the code is closed you have to pay for that review yourself, making you rather uncompetitive.

You could say there's no change there, but the number of people who can run a Claude review and the number of people who can actually review a complicated codebase are several orders of magnitude apart.

Will some of them produce bad PRs? Probably. The battle will be to figure out how to filter them at scale.

reply
I have no doubt that LLMs can be as good at analyzing binaries than at analyzing source code.

An avalanche of 0-day in proprietary code is coming.

reply
> This is something a lot of static analysers can easily find.

And yet they didn't (either noone ran them, or they didn't find it, or they did find it but it was buried in hundreds of false positives) for 20+ years...

I find it funny that every time someone does something cool with LLMs, there's a bunch of takes like this: it was trivial, it's just not important, my dad could have done that in his sleep.

reply
Remember Heartbleed in OpenSSL? That long predated LLMs, but same story: some bozo forgot how long something should/could be, and no one else bothered to check either.
reply
Hey we are the bozos
reply
Lets all get together and self-reflect on the bozos way.
reply
I believe that once the OpenBSD team started cleaning up some of the other gross coding style stuff as part of their fork into LibreSSL that even fairly simplistic static analysis tools could spot the underlying bugs that caused heartbleed.
reply
The bug that caused Heartbleed was extremely obvious: read a u16 out of a packet, copy that many bytes of the source packet into the reply packet. If someone put that code in front of you in isolation you would spot it instantly (if you know C). The problem --- this is hugely the case with most memory safety bugs --- is that it's buried under a mountain of OpenSSL TLS protocol handling details. You have to keep resident in your brain what all the inputs to the function are, and follow them through the code.
reply
It's much, much, easier to run an LLM than to use a static or dynamic analyzer correctly. At the very least, the UI has improved massively with "AI".
reply
Most people have no idea how hard it is to run static analysis on C/C++ code bases of any size. There are a lot of ways to do it wrong that eat a ton of memory/CPU time or start pruning things that are needed.

If you know what you're doing you can split the code up in smaller chunks where you can look with more depth in a timely fashion.

reply
And even if that's true (and it frequently is!), detractors usually miss the underlying and immense impact of "sleeping dad capability" equivalent artificial systems.

Horizontally scaling "sleeping dads" takes decades, but inference capacity for a sleeping dad equivalent model can be scaled instantly, assuming one has the hardware capacity for it. The world isn't really ready for a contraction of skill dissemination going from decades to minutes.

reply
Most likely no-one runned them, given the developer culture.
reply
There’s the classic case of the Debian OpenSSL vulnerability, where technically illegal but practically secure code was turned into superficially correct but fundamentally insecure code in an attempt to fix a bug identified by a (dynamic, in this case) analyzer.
reply