Static analyzers will find all possible copies of unbounded data into smaller buffers (especially when the size of the target buffer is easily deduced). It will then report them whether or not every path to that code clamps the input. Which is why this approach doesn't work well in the Linux kernel in 2026.
IIRC they were using a C/C++ compiler front end from EDG to parse C/C++ code to a form they used for the simulation/analysis.
see https://web.eecs.umich.edu/~weimerw/2006-655/reading/bush-pr... for more info.
Microsoft bought Intrinsa several years ago.
One of the articles/papers I recall was that the big problem for PREfix when simulating the behaviour of code was the explosion in complexity if a given function had multiple paths through it (e.g. multiple if's/switch statements). PREfix had strategies to reduce the time spent in these highly complex functions.
Here's a 2004 link that discusses the limitations of PREfix's simulated analysis - https://www.microsoft.com/en-us/research/wp-content/uploads/...
The above article also talks about Microsoft's newer (for 2004) static analysis tools.
There's a Netscape engineer endorsement in a CNet article when they first released PREfix. see https://www.cnet.com/tech/tech-industry/component-bugs-stamp...
Well yeah. There weren't enough "someones" available to look. There are a finite number of qualified individuals with time available to look for bugs in OSS, resulting in a finite amount of bug finding capacity available in the world.
Or at least there was. That's what's changing as these models become competent enough to spot and validate bugs. That finite global capacity to find bugs is now increasing, and actual bugs are starting to be dredged up. This year will be very very interesting if models continue to increase in capability.
Many people with skin in the game will be spending tokens on hardening OSS bits they use, maybe even part of their build pipelines, but if the code is closed you have to pay for that review yourself, making you rather uncompetitive.
You could say there's no change there, but the number of people who can run a Claude review and the number of people who can actually review a complicated codebase are several orders of magnitude apart.
Will some of them produce bad PRs? Probably. The battle will be to figure out how to filter them at scale.
An avalanche of 0-day in proprietary code is coming.
And yet they didn't (either noone ran them, or they didn't find it, or they did find it but it was buried in hundreds of false positives) for 20+ years...
I find it funny that every time someone does something cool with LLMs, there's a bunch of takes like this: it was trivial, it's just not important, my dad could have done that in his sleep.
If you know what you're doing you can split the code up in smaller chunks where you can look with more depth in a timely fashion.
Horizontally scaling "sleeping dads" takes decades, but inference capacity for a sleeping dad equivalent model can be scaled instantly, assuming one has the hardware capacity for it. The world isn't really ready for a contraction of skill dissemination going from decades to minutes.