undefined

[-]

It’s interesting to compare how the agentic search performs, with these targeted reads and lots of tool calls in the stream, versus the older but still valid paradigm of using a high-reasoning model like GPT-X-pro and feeding in all the relevant files at once with no tools.

I have found that the “pro” approach is much more holistic and able to tackle rather “creative” problems that require very careful design and the overall artifact is tight and self-consistent. — Claude Code by comparison is incredible in exploration and targeted implementation but indeed is not great at seeing the forest.

by ulrikrasmussen5 hours ago|

[-]

Do you think this is inherent or an artifact of prompting? Curiosity and side quests leads to higher token usage and longer time to finish, so I could understand why current harnesses and system prompts would not encourage that sort of thing.

But what if a coding agent was prompted to be more curious during development? Like a human developer, make mental notes of alternatives to try out and chase suspicious looking code which may seem unrelated to the task at hand. It could even spawn rabbit hole agents in parallel.

Taking a step back, this probably highlights major hazard with the increased usage of LLMs for coding, which is that everyone's style of work is going to converge because most code will be written by the 2-3 most popular models using the same system prompts.

by lloeki3 hours ago|

[-]

I've seen something similar, solutions generated feel very pythonic or javaesque in languages that are neither Python nor Java (C, Rust, Ruby)

I've had to explicitly direct the machine to read existing sibling code and follow the specific idioms and patterns in use.

by dotancohen2 hours ago|

[-]

  > All the time I wonder what am I missing that's right nearby?

Add to the prompt "use coding conventions of the file which you are currently editing". That gets the machine (Opus and Sonnet at least) to go over the nearby code and occasionally mention something obvious.

by 5 hours ago|

[-]

deleted

[-]

No, unless I'm misreading it it's the *same* root cause: high 32 bits of Extended ESN in IPsec == authencesn module/cipher mode.

The wrong thing got fixed for copy.fail, because people jumped to blame AF_ALG.

[ed.: yes it's the same authencesn issue. https://github.com/V4bel/dirtyfrag/blob/892d9a31d391b7f0fccb... it doesn't say authencesn in the code, only in a comment, but nonetheless, same issue.]

[ed.2: the RxRPC issue is separate, this is about the ESP one]

by firer14 hours ago|

[-]

There are two vulnerabilities here.

The RxRPC one is definitely a different root cause (although caused by a very similar mistake).

For the ESP one it's a bit harder to tell. I don't think the wrong thing was fixed, just that there was a very similar bug in almost the same spot. Could be wrong about that though.

[-]

(you probably wrote this while I was editing my post.)

It's absolutely the same issue in authencesn/ESP. There's another one in RxRPC that is AIUI completely unrelated.

by papascrubs14 hours ago|

[-]

Or a follow up prompt: "find similar classes of bugs". Once the actual case has been layed out finding like bugs isn't too hard. I hear you on the creativity bit. Like any tool, AI can put blinders on. Using it to augment without it fully taking over your workflow is tough.

by dgellow4 hours ago|

[-]

Not just like any tool though. Interacting with agents can be incredibly boring and frustrating in a way that I personally do not experience with other technology

by riedel5 hours ago|

[-]

Just on a side note. Negative synergy does not seem so uncommon with machine learning. We did some research maybe 10 yrs ago an human/ML based duplicate detection (for a municipal support ticket system) . Research showed that pure AI and pur human outperformed co-working. Human oversight often e.g. overcorrected machine work. I think it is a nice HCI problem to solve actually to amplify creativity and unique skills in such processes. Particularly if they can be to some degree repetitive and tiresome.

by tptacek14 hours ago|

[-]

I don't follow. LLMs spotted these bugs in the first place. You seem to be saying that these discoveries are indications that they're bad for vulnerability discovery.

by firer14 hours ago|

[-]

From what I understand, the copy fail bug was found by researcher who noticed something weird and then using AI to scan the codebase for instances where that becomes a problem.

I bet that with a slightly looser prompt/harness, the LLM could have found these twin bugs too.

Yet at the same time, I also think that if the human researcher had manually scanned the code, he'd have noticed these bugs too.

FWIW I do think LLMs are great tools for finding vulnerabilities in general. Just that they were visibly not optimally applied in this case.

by aerodexis8 hours ago|

[-]

They could also have found all these things at the same time - and are slow-rolling the disclosures.

[-]

I don't think the copy.fail people understood the issue they found, as is evident by the heavy focus on AF_ALG/aead_algif, which is essentially "innocent" as we're seeing here.

I think LLMs are great for vulnerability discovery, but you need to not skimp on the legwork and understanding what even you just found there.

by tptacek14 hours ago|

[-]

Right but without the LLM the bug doesn't get found at all.

by _AzMoo11 hours ago|

[-]

That's not necessarily true. Who's to say the security researchers wouldn't have found it if they'd searched the code manually?

by tptacek11 hours ago|

[-]

It's an AI security firm! You might just as productively ask "why did all the other engineers who ever looked at this code not find it, and why was Theori the one to actually surface it?".

by cp99 hours ago|

[-]

I’m hardly going to simp for LLM tools but the fact that the bug existed and no one had reported it seems proof positive no one was about to find it without them

by UltraSane11 hours ago|

[-]

It would have taken a LOT longer but often this kind of manual search is so tedious people just don't do it. LLMs don't get bored.

by dgellow4 hours ago|

[-]

> LLMs don't get bored

They do not get bored like a human but they are trained on human language and replicate the same traits, such as laziness, and expressing boredom or annoyance (even if obviously they do not experience anything at all). It’s actually a lot of effort to get them to engage with things at a deeper level without skipping corners

by baq4 hours ago|

[-]

Safer to assume at least one of NSA, Mosad and a few others were sitting on it for years.

[-]

Yes, I agree. I'm not the GP poster.

by parliament3214 hours ago|

https://xint.io/blog/copy-fail-linux-distributions

[-]

No, they did not. Careful of falling for the psychosis.

> This finding was AI-assisted, but began with an insight from Theori researcher Taeyang Lee, who was studying how the Linux crypto subsystem interacts with page-cache-backed data.

by tptacek13 hours ago|

[-]

Theori is an AI security research firm.

by duk3luk311 hours ago|

[-]

You appear to want to die on the hill of "This vulnerability would never have been found if we lived in a world without LLM AI" which is a very strange hill to die on.

There's no question that we live in the world where LLM AI was involved in finding the copy fail vulnerability at this specific time, and it's completely normal for people to see a vulnerability and then look closer and find related vulnerabilities or a deeper root cause, but there's no need to adopt an extreme "without AI LLM we don't find these vulnerabilities" position.

by tptacek10 hours ago|

[-]

It's weird to say I want to "die on this hill" because that's not even something I believe. There was nothing especially difficult about this particular vulnerability. My only observation that nobody did find it before, then an LLM security firm went out looking for Linux LPEs, and thus it was discovered.

That is a very difficult fact pattern to which to attach the conclusion "LLMs have sabotaged security research" (my paraphrase).

by Yokohiii10 hours ago|

[-]

The finding started with human intuition and was assisted by an LLM. You can yell "AI sec firm" 1000 times. A human got it started. You shouldn't die on that hill.

by danudey13 hours ago|

[-]

It seems as though this issue occurred to him, then he used their tool ("Xint Code") to analyze the codebase for instances of it.

by ofjcihen10 hours ago|

[-]

I don’t think that’s what the OP is saying at all, just that using LLMs needs to be a cooperative research process.

Also I see you jumping around a lot to the defense of LLMs when I don’t think anyone is really attacking them. Maybe cool it a bit.

by tptacek10 hours ago|

[-]

From the thread that ensued I feel comfortable that my interpretation of the comment (or rather, my confusion about it) was in fact germane.

by ofjcihen10 hours ago|

[-]

Germane or not the knee-jerk reactions related to LLMs are getting ridiculous and it seems like it’s the same people throwing down at a moments notice and then chalking it up to a misunderstanding.

So like I said, just chill out.

by rayiner10 hours ago|

[-]

It’s incredible humans spot stuff like this. I guess even more incredible that LLMs can do it!

by keybored3 hours ago|

[-]

Right. Finding the bug is in itself a win. It seems we’re jumping from that spend-electricity-to-find-bugs win to arguing about how some things around it are not quite good or comfy.

by refulgentis13 hours ago|

[-]

It’s very hard to see a root vuln similar to, but not the same as, another discovered by AI, as a lesson about AI not exploring.

Is there a counterfactual where you would say it explored well enough, besides both vulnerabilities published as one?

by SubiculumCode10 hours ago|

[-]

Evidence or are you just riffing?

by formerly_proven14 hours ago|

[-]

These are all page cache poisoning attacks (dirtyfrag, copyfail, dirtypipe). Maybe the page cache should have defense-in-depth measures for SUID binaries?

by firer14 hours ago|

[-]

SUID mitigations have nothing to do with the vulnerability itself - just the exploit.

If there's a root cronjob that runs a world readable binary, you could modify it in the page cache and exploit it that way.

Modifying the page cache is a really strong primitive with countless ways to exploit it.

[-]

splice() should maybe generally refuse to operate on things you can't write to.

by toast013 hours ago|

[-]

splice is documented to return EBADF if "One or both file descriptors are not valid, or do not have proper read-write mode."

So it seems surprising to me that you can call it when the out fd is not writable? But I didn't retain the information about the vulnerability, so I'm missing something. There was something about copy on write, IIRC?

by eqvinox13 hours ago|

[-]

"proper read-write mode" for the input fd is reading only. The exploit is writing to the splice() input fd.

Also, NB, I said permission check, not mode check. The input fd to splice can and will be open for only reading quite often. Doesn't mean the kernel can't still do a write permission check.

(Except I didn't say that here. Oops. Getting confused with my posts.)

by toast013 hours ago|

[-]

OK, I may likely have too much sleep debt to understand, but given the bug is that splice can write to the input fd, you're suggesting maybe splice should only let you use an input fd if the process has access to write to it?

But splice is a more or less a generalization of sendfile, and sendfile is often used for webserving where the serving process does not have ownership of the documents it is serving. It doesn't make sense to limit splice such that it can't do the task it was built for. Maybe splice should just not write to the input fd? :P

by cyphar7 hours ago|

[-]

> But splice is a more or less a generalization of sendfile

Not really, splice(2) is actually more limited, it's an optimisation for reading and writing data between files and pipes without needing to make copies.

sendfile(2) works with any fds because it just exists to remove a fair bit of the copy overhead when doing a userspace read/write loop, but it does actually do a copy.

by eqvinox12 hours ago|

[-]

Yes, it'd curtail splice() usage quite heavily. Maybe too much.

But apparently we can't be trusted with the page cache…

Maybe the kernel using supervisor-read-only flags could be made to work, only issue then is what happens if something does in fact need to write…

by semiquaver12 hours ago|

[-]

Aren’t you just saying “don’t write bugs?”

by formerly_proven14 hours ago|

[-]

True! Building protections (e.g. physical pages in the page cache are not writeable 100% of the time) just for executables has of course countless circumventions as well (e.g. config files). Yeah, there is probably not that much to be done there, actually. Looking at some of the diffs it seems to me like the kernel makes it really not particularly obvious when/how this goes wrong. E.g. the patch for this is to look at an additional flag on the socket buffer to fix an arbitrary page cache write. This feels rather action at a distance. Logically this of course makes sense, the whole point of splice et al is to feed data from one file-like into another file-like, whatever those ends might be. That erases the underlying provenance of the data.

by 10 hours ago|

[-]

deleted

by varispeed13 hours ago|