Very likely not nearly as well, unless there are many open source libraries in use and/or the language+patterns used are extremely popular. The really huge win for something like the Linux kernel and other popular OSS is that the source appears in the training data, a lot. And many versions. So providing the source again and saying "find X" is primarily bringing into focus things it's already seen during training, with little novelty beyond the updates that happened after knowledge cutoff.
Giving it a closed source project containing a lot of novel code means it only has the language and it's "intuition" to work from, which is a far greater ask.
The llms know about every previous disclosed security vulnerability class and can use that to pattern match. And they can do it against compiled and in some cases obfuscated code as easily as source.
I think the security engineers out there are terrified that the balance of power has shifted too far to the finding of closed source vulnerabilities because getting patches deployed will still take so long. Not that the llms are in some way hampered by novel code bases.
Do the reports include patterns that could be matched against decompiled code, though? As easily as they would against proper source? I find it a bit hard to believe.
Same thing applies to humans: the better someone knows a codebase, the better they will be at resolving issues, etc.
Simply because the source code contains names that were intended to communicate meaning in a way that the LLM is specifically trained to understand (i.e., by choosing identifier names from human natural language, choosing those names to scan well when interspersed into the programming language grammar, including comments etc.). At least if debugging information has been scrubbed, anyway (but the comments definitely are). Ghidra et. al. can only do so much to provide the kind of semantic content that an LLM is looking for.
But I guess I'm not the first one to have that idea. Any references to research papers would be welcome.
I don't know if it's correct, but it speculated that it's part of a command line processor: https://chatgpt.com/share/69d19e4f-ff2c-83e8-bc55-3f7f5207c3...
Now imagine how much more it could have derived if I had given it the full executable, with all the strings, pointers to those strings and whatnot.
I've done some minor reverse engineering of old test equipment binaries in the past and LLMs are incredible at figuring out what the code is doing, way better than the regular way of Ghidra to decompile code.