Very much aligns with my experience. For me this is the most unsatisfying thing about AI-based workflows in general, they miss stuff humans would never miss.
All the time I wonder what am I missing that's right nearby? It's remarkable how many times I have to ask Claude code to fully ingest something before it actually puts it into context. It always tries to laser through to target it's looking for, which is often not what you want it to look for, at least not all you want it to look for. Getting these models to open up their field of vision is tough.
I have found that the “pro” approach is much more holistic and able to tackle rather “creative” problems that require very careful design and the overall artifact is tight and self-consistent. — Claude Code by comparison is incredible in exploration and targeted implementation but indeed is not great at seeing the forest.
But what if a coding agent was prompted to be more curious during development? Like a human developer, make mental notes of alternatives to try out and chase suspicious looking code which may seem unrelated to the task at hand. It could even spawn rabbit hole agents in parallel.
Taking a step back, this probably highlights major hazard with the increased usage of LLMs for coding, which is that everyone's style of work is going to converge because most code will be written by the 2-3 most popular models using the same system prompts.
I've had to explicitly direct the machine to read existing sibling code and follow the specific idioms and patterns in use.
> All the time I wonder what am I missing that's right nearby?
Add to the prompt "use coding conventions of the file which you are currently editing". That gets the machine (Opus and Sonnet at least) to go over the nearby code and occasionally mention something obvious.The wrong thing got fixed for copy.fail, because people jumped to blame AF_ALG.
[ed.: yes it's the same authencesn issue. https://github.com/V4bel/dirtyfrag/blob/892d9a31d391b7f0fccb... it doesn't say authencesn in the code, only in a comment, but nonetheless, same issue.]
[ed.2: the RxRPC issue is separate, this is about the ESP one]
The RxRPC one is definitely a different root cause (although caused by a very similar mistake).
For the ESP one it's a bit harder to tell. I don't think the wrong thing was fixed, just that there was a very similar bug in almost the same spot. Could be wrong about that though.
It's absolutely the same issue in authencesn/ESP. There's another one in RxRPC that is AIUI completely unrelated.
I bet that with a slightly looser prompt/harness, the LLM could have found these twin bugs too.
Yet at the same time, I also think that if the human researcher had manually scanned the code, he'd have noticed these bugs too.
FWIW I do think LLMs are great tools for finding vulnerabilities in general. Just that they were visibly not optimally applied in this case.
I think LLMs are great for vulnerability discovery, but you need to not skimp on the legwork and understanding what even you just found there.
They do not get bored like a human but they are trained on human language and replicate the same traits, such as laziness, and expressing boredom or annoyance (even if obviously they do not experience anything at all). It’s actually a lot of effort to get them to engage with things at a deeper level without skipping corners
> This finding was AI-assisted, but began with an insight from Theori researcher Taeyang Lee, who was studying how the Linux crypto subsystem interacts with page-cache-backed data.
There's no question that we live in the world where LLM AI was involved in finding the copy fail vulnerability at this specific time, and it's completely normal for people to see a vulnerability and then look closer and find related vulnerabilities or a deeper root cause, but there's no need to adopt an extreme "without AI LLM we don't find these vulnerabilities" position.
That is a very difficult fact pattern to which to attach the conclusion "LLMs have sabotaged security research" (my paraphrase).
Also I see you jumping around a lot to the defense of LLMs when I don’t think anyone is really attacking them. Maybe cool it a bit.
So like I said, just chill out.
Is there a counterfactual where you would say it explored well enough, besides both vulnerabilities published as one?
If there's a root cronjob that runs a world readable binary, you could modify it in the page cache and exploit it that way.
Modifying the page cache is a really strong primitive with countless ways to exploit it.
So it seems surprising to me that you can call it when the out fd is not writable? But I didn't retain the information about the vulnerability, so I'm missing something. There was something about copy on write, IIRC?
Also, NB, I said permission check, not mode check. The input fd to splice can and will be open for only reading quite often. Doesn't mean the kernel can't still do a write permission check.
(Except I didn't say that here. Oops. Getting confused with my posts.)
But splice is a more or less a generalization of sendfile, and sendfile is often used for webserving where the serving process does not have ownership of the documents it is serving. It doesn't make sense to limit splice such that it can't do the task it was built for. Maybe splice should just not write to the input fd? :P
Not really, splice(2) is actually more limited, it's an optimisation for reading and writing data between files and pipes without needing to make copies.
sendfile(2) works with any fds because it just exists to remove a fair bit of the copy overhead when doing a userspace read/write loop, but it does actually do a copy.
But apparently we can't be trusted with the page cache…
Maybe the kernel using supervisor-read-only flags could be made to work, only issue then is what happens if something does in fact need to write…
That's why is very very important to just step out and use saved time to go for a walk, to a park, sit on a bench, listen do birds, close eyes and zoom out.
The state we are in is actually brilliant.