upvote
> When your workflow consists of asking questions and getting answers immediately, you don't get to see what's nearby.

Very much aligns with my experience. For me this is the most unsatisfying thing about AI-based workflows in general, they miss stuff humans would never miss.

All the time I wonder what am I missing that's right nearby? It's remarkable how many times I have to ask Claude code to fully ingest something before it actually puts it into context. It always tries to laser through to target it's looking for, which is often not what you want it to look for, at least not all you want it to look for. Getting these models to open up their field of vision is tough.

reply
It’s interesting to compare how the agentic search performs, with these targeted reads and lots of tool calls in the stream, versus the older but still valid paradigm of using a high-reasoning model like GPT-X-pro and feeding in all the relevant files at once with no tools.

I have found that the “pro” approach is much more holistic and able to tackle rather “creative” problems that require very careful design and the overall artifact is tight and self-consistent. — Claude Code by comparison is incredible in exploration and targeted implementation but indeed is not great at seeing the forest.

reply
Do you think this is inherent or an artifact of prompting? Curiosity and side quests leads to higher token usage and longer time to finish, so I could understand why current harnesses and system prompts would not encourage that sort of thing.

But what if a coding agent was prompted to be more curious during development? Like a human developer, make mental notes of alternatives to try out and chase suspicious looking code which may seem unrelated to the task at hand. It could even spawn rabbit hole agents in parallel.

Taking a step back, this probably highlights major hazard with the increased usage of LLMs for coding, which is that everyone's style of work is going to converge because most code will be written by the 2-3 most popular models using the same system prompts.

reply
I've seen something similar, solutions generated feel very pythonic or javaesque in languages that are neither Python nor Java (C, Rust, Ruby)

I've had to explicitly direct the machine to read existing sibling code and follow the specific idioms and patterns in use.

reply

  > All the time I wonder what am I missing that's right nearby?
Add to the prompt "use coding conventions of the file which you are currently editing". That gets the machine (Opus and Sonnet at least) to go over the nearby code and occasionally mention something obvious.
reply
deleted
reply
No, unless I'm misreading it it's the *same* root cause: high 32 bits of Extended ESN in IPsec == authencesn module/cipher mode.

The wrong thing got fixed for copy.fail, because people jumped to blame AF_ALG.

[ed.: yes it's the same authencesn issue. https://github.com/V4bel/dirtyfrag/blob/892d9a31d391b7f0fccb... it doesn't say authencesn in the code, only in a comment, but nonetheless, same issue.]

[ed.2: the RxRPC issue is separate, this is about the ESP one]

reply
There are two vulnerabilities here.

The RxRPC one is definitely a different root cause (although caused by a very similar mistake).

For the ESP one it's a bit harder to tell. I don't think the wrong thing was fixed, just that there was a very similar bug in almost the same spot. Could be wrong about that though.

reply
(you probably wrote this while I was editing my post.)

It's absolutely the same issue in authencesn/ESP. There's another one in RxRPC that is AIUI completely unrelated.

reply
Or a follow up prompt: "find similar classes of bugs". Once the actual case has been layed out finding like bugs isn't too hard. I hear you on the creativity bit. Like any tool, AI can put blinders on. Using it to augment without it fully taking over your workflow is tough.
reply
Not just like any tool though. Interacting with agents can be incredibly boring and frustrating in a way that I personally do not experience with other technology
reply
Just on a side note. Negative synergy does not seem so uncommon with machine learning. We did some research maybe 10 yrs ago an human/ML based duplicate detection (for a municipal support ticket system) . Research showed that pure AI and pur human outperformed co-working. Human oversight often e.g. overcorrected machine work. I think it is a nice HCI problem to solve actually to amplify creativity and unique skills in such processes. Particularly if they can be to some degree repetitive and tiresome.
reply
I don't follow. LLMs spotted these bugs in the first place. You seem to be saying that these discoveries are indications that they're bad for vulnerability discovery.
reply
From what I understand, the copy fail bug was found by researcher who noticed something weird and then using AI to scan the codebase for instances where that becomes a problem.

I bet that with a slightly looser prompt/harness, the LLM could have found these twin bugs too.

Yet at the same time, I also think that if the human researcher had manually scanned the code, he'd have noticed these bugs too.

FWIW I do think LLMs are great tools for finding vulnerabilities in general. Just that they were visibly not optimally applied in this case.

reply
They could also have found all these things at the same time - and are slow-rolling the disclosures.
reply
I don't think the copy.fail people understood the issue they found, as is evident by the heavy focus on AF_ALG/aead_algif, which is essentially "innocent" as we're seeing here.

I think LLMs are great for vulnerability discovery, but you need to not skimp on the legwork and understanding what even you just found there.

reply
Right but without the LLM the bug doesn't get found at all.
reply
That's not necessarily true. Who's to say the security researchers wouldn't have found it if they'd searched the code manually?
reply
It's an AI security firm! You might just as productively ask "why did all the other engineers who ever looked at this code not find it, and why was Theori the one to actually surface it?".
reply
I’m hardly going to simp for LLM tools but the fact that the bug existed and no one had reported it seems proof positive no one was about to find it without them
reply
It would have taken a LOT longer but often this kind of manual search is so tedious people just don't do it. LLMs don't get bored.
reply
> LLMs don't get bored

They do not get bored like a human but they are trained on human language and replicate the same traits, such as laziness, and expressing boredom or annoyance (even if obviously they do not experience anything at all). It’s actually a lot of effort to get them to engage with things at a deeper level without skipping corners

reply
Safer to assume at least one of NSA, Mosad and a few others were sitting on it for years.
reply
Yes, I agree. I'm not the GP poster.
reply
No, they did not. Careful of falling for the psychosis.

> This finding was AI-assisted, but began with an insight from Theori researcher Taeyang Lee, who was studying how the Linux crypto subsystem interacts with page-cache-backed data.

https://xint.io/blog/copy-fail-linux-distributions

reply
Theori is an AI security research firm.
reply
You appear to want to die on the hill of "This vulnerability would never have been found if we lived in a world without LLM AI" which is a very strange hill to die on.

There's no question that we live in the world where LLM AI was involved in finding the copy fail vulnerability at this specific time, and it's completely normal for people to see a vulnerability and then look closer and find related vulnerabilities or a deeper root cause, but there's no need to adopt an extreme "without AI LLM we don't find these vulnerabilities" position.

reply
It's weird to say I want to "die on this hill" because that's not even something I believe. There was nothing especially difficult about this particular vulnerability. My only observation that nobody did find it before, then an LLM security firm went out looking for Linux LPEs, and thus it was discovered.

That is a very difficult fact pattern to which to attach the conclusion "LLMs have sabotaged security research" (my paraphrase).

reply
The finding started with human intuition and was assisted by an LLM. You can yell "AI sec firm" 1000 times. A human got it started. You shouldn't die on that hill.
reply
It seems as though this issue occurred to him, then he used their tool ("Xint Code") to analyze the codebase for instances of it.
reply
I don’t think that’s what the OP is saying at all, just that using LLMs needs to be a cooperative research process.

Also I see you jumping around a lot to the defense of LLMs when I don’t think anyone is really attacking them. Maybe cool it a bit.

reply
From the thread that ensued I feel comfortable that my interpretation of the comment (or rather, my confusion about it) was in fact germane.
reply
Germane or not the knee-jerk reactions related to LLMs are getting ridiculous and it seems like it’s the same people throwing down at a moments notice and then chalking it up to a misunderstanding.

So like I said, just chill out.

reply
It’s incredible humans spot stuff like this. I guess even more incredible that LLMs can do it!
reply
Right. Finding the bug is in itself a win. It seems we’re jumping from that spend-electricity-to-find-bugs win to arguing about how some things around it are not quite good or comfy.
reply
It’s very hard to see a root vuln similar to, but not the same as, another discovered by AI, as a lesson about AI not exploring.

Is there a counterfactual where you would say it explored well enough, besides both vulnerabilities published as one?

reply
Evidence or are you just riffing?
reply
These are all page cache poisoning attacks (dirtyfrag, copyfail, dirtypipe). Maybe the page cache should have defense-in-depth measures for SUID binaries?
reply
SUID mitigations have nothing to do with the vulnerability itself - just the exploit.

If there's a root cronjob that runs a world readable binary, you could modify it in the page cache and exploit it that way.

Modifying the page cache is a really strong primitive with countless ways to exploit it.

reply
splice() should maybe generally refuse to operate on things you can't write to.
reply
splice is documented to return EBADF if "One or both file descriptors are not valid, or do not have proper read-write mode."

So it seems surprising to me that you can call it when the out fd is not writable? But I didn't retain the information about the vulnerability, so I'm missing something. There was something about copy on write, IIRC?

reply
"proper read-write mode" for the input fd is reading only. The exploit is writing to the splice() input fd.

Also, NB, I said permission check, not mode check. The input fd to splice can and will be open for only reading quite often. Doesn't mean the kernel can't still do a write permission check.

(Except I didn't say that here. Oops. Getting confused with my posts.)

reply
OK, I may likely have too much sleep debt to understand, but given the bug is that splice can write to the input fd, you're suggesting maybe splice should only let you use an input fd if the process has access to write to it?

But splice is a more or less a generalization of sendfile, and sendfile is often used for webserving where the serving process does not have ownership of the documents it is serving. It doesn't make sense to limit splice such that it can't do the task it was built for. Maybe splice should just not write to the input fd? :P

reply
> But splice is a more or less a generalization of sendfile

Not really, splice(2) is actually more limited, it's an optimisation for reading and writing data between files and pipes without needing to make copies.

sendfile(2) works with any fds because it just exists to remove a fair bit of the copy overhead when doing a userspace read/write loop, but it does actually do a copy.

reply
Yes, it'd curtail splice() usage quite heavily. Maybe too much.

But apparently we can't be trusted with the page cache…

Maybe the kernel using supervisor-read-only flags could be made to work, only issue then is what happens if something does in fact need to write…

reply
Aren’t you just saying “don’t write bugs?”
reply
True! Building protections (e.g. physical pages in the page cache are not writeable 100% of the time) just for executables has of course countless circumventions as well (e.g. config files). Yeah, there is probably not that much to be done there, actually. Looking at some of the diffs it seems to me like the kernel makes it really not particularly obvious when/how this goes wrong. E.g. the patch for this is to look at an additional flag on the socket buffer to fix an arbitrary page cache write. This feels rather action at a distance. Logically this of course makes sense, the whole point of splice et al is to feed data from one file-like into another file-like, whatever those ends might be. That erases the underlying provenance of the data.
reply
deleted
reply
> When your workflow consists of asking questions and getting answers immediately, you don't get to see what's nearby.

That's why is very very important to just step out and use saved time to go for a walk, to a park, sit on a bench, listen do birds, close eyes and zoom out.

The state we are in is actually brilliant.

reply