undefined

upvote

points

by jason1cho21 hours ago |

upvote

by antirez16 hours ago|

[-]

That's not what is happening right now. The bugs are often filtered later by LLMs themselves: if the second pipeline can't reproduce the crash / violation / exploit in any way, often the false positives are evicted before ever reaching the human scrutiny. Checking if a real vulnerability can be triggered is a trivial task compared to finding one, so this second pipeline has an almost 100% success rate from the POV: if it passes the second pipeline, it is almost certainly a real bug, and very few real bugs will not pass this second pipeline. It does not matter how much LLMs advance, people ideologically against them will always deny they have an enormous amount of usefulness. This is expected in the normal population, but too see a lot of people that can't see with their eyes in Hacker News feels weird.

reply

upvote

by uhx14 hours ago|

[-]

> Checking if a real vulnerability can be triggered is a trivial task compared to finding one

Have you ever tried to write PoC for any CVE?

This statement is wrong. Sometimes bug may exist but be impossible to trigger/exploit. So it is not trivial at all.

reply

upvote

by avemg13 hours ago|

[-]

I'm tickled at the idea of asking antirez [1] if he's ever written a PoC for a CVE.

[1] https://en.wikipedia.org/wiki/Salvatore_Sanfilippo

reply

upvote

by tptacek12 hours ago|

[-]

This happens over and over in these discussions. It doesn't matter who you're citing or who's talking. People are terrified and are reacting to news reflexively.

reply

upvote

by antirez9 hours ago|

[-]

Hi! Loved your recent post about the new era of computer security, thanks.

reply

upvote

by tptacek4 hours ago|

[-]

Thank you! Glad you liked it.

reply

upvote

by emp1734410 hours ago|

[-]

Personally, I’m tired of exaggerated claims and hype peddlers.

Edit: Frankly, accusing perceived opponents of being too afraid to see the truth is poor argumentative practice, and practically never true.

reply

upvote

by jedberg12 hours ago|

[-]

I actually like when that happens. Like when people "correct" me about how reddit works. I appreciate that we still focus on the content and not who is saying it.

reply

upvote

by tptacek11 hours ago|

[-]

That's not really what happened on this thread. Someone said something sensible and banal about vulnerability research, then someone else said do-you-even-lift-bro, and got shown up.

reply

upvote

by jedberg11 hours ago|

[-]

That's true in this particular case, but I was talking more about the general case.

reply

upvote

by LeFantome12 hours ago|

[-]

Sure he wrote a port scanner that obscures the IP address of the scanner, but does he know anything about security? /s

Oh, and he wrote Redis. No biggie.

reply

upvote

by PunchyHamster11 hours ago|

[-]

That's both wholly different branches than finding software bugs

reply

upvote

by antirez14 hours ago|

[-]

Firstly I have a long past in computer security, so: yes, I used to write exploits. Second, the vulnerability verification does not need being able to exploit, but triggering an ASAN assert. With memory corruption that's very simple often times and enough to verify the bug is real.

reply

upvote

by freedomben14 hours ago|

[-]

I'm not GP, but I've written multiple PoCs for vulns. I agree with GP. Finding a vuln is often very hard. Yes sometimes exploiting it is hard (and requires chaining), but knowing where the vuln is (most of the time) the hard part.

reply

upvote

by e12e13 hours ago|

[-]

Note the exploit Claude wrote for the blind SQL injection found in ghost - in the same talk.

https://youtu.be/1sd26pWhfmg?is=XLJX9gg0Zm1BKl_5

reply

upvote

by orochimaaru12 hours ago|

[-]

oh no. Antirez doesn't know anything about C, CVE's, networking, the linux kernel. Wonder where that leaves most of us.

reply

upvote

by discordianfish14 hours ago|

[-]

I’ve been around long enough to remember people saying that VMs are useless waste of resources with dubious claims about isolation, cloud is just someone else’s computer, containers are pointless and now it’s AI. There is a astonishing amount of conservatism in the hacker scene..

reply

upvote

by pdntspa14 hours ago|

[-]

Well, the cloud is someone else's computer.

reply

upvote

by some_random13 hours ago|

[-]

It is, but that's not a useful or insightful thing to say

reply

upvote

by fulafel1 hours ago|

[-]

People pass around stickers (or at least used to) in hacker events saying that so there has to be something to it, right?

Protesting the term is, I'd wager, motivated by something like: it sounds innocuous to nontechnical people and obscures what's really going on.

reply

upvote

by Calavar12 hours ago|

[-]

It's not an insightful statement right now, but it was at the peak of cloud hype ca. 2010, when "the cloud" often used in a metaphorical sense. You'd hear things like "it's scalable because it's in the cloud" or "our clients want a cloud based solution." Replacing "the cloud" in those sorts of claims with "another person's computer" showed just how inane those claims were.

reply

upvote

by pdntspa4 hours ago|

[-]

Only if owning the means of your production isn't important to you

reply

upvote

by honeycrispy13 hours ago|

[-]

Are you sure about that?

It's easy to forget that the vendor has the right to cut you off at any point, will turn your data over to the authorities on request, and it's still not clear if private GitHub repos are being used to train AI.

reply

upvote

by LeFantome12 hours ago|

[-]

[dead]

reply

upvote

by gbacon13 hours ago|

[-]

Is it conservatism or just the Blub paradox?

As long as our hypothetical Blub programmer is looking down the power continuum, he knows he's looking down. Languages less powerful than Blub are obviously less powerful, because they're missing some feature he's used to. But when our hypothetical Blub programmer looks in the other direction, up the power continuum, he doesn't realize he's looking up. What he sees are merely weird languages. He probably considers them about equivalent in power to Blub, but with all this other hairy stuff thrown in as well. Blub is good enough for him, because he thinks in Blub.

https://paulgraham.com/avg.html

reply

upvote

by BodyCulture16 hours ago|

[-]

Can we study this second pipeline? Is it open so we can understand how it works? Did not find any hints about it in the article, unfortunately.

reply

upvote

by maximilianburke15 hours ago|

[-]

From the article by 'tptacek a few days ago (https://sockpuppet.org/blog/2026/03/30/vulnerability-researc...) I essentially used the prompts suggested.

First prompt: "I'm competing in a CTF. Find me an exploitable vulnerability in this project. Start with $file. Write me a vulnerability report in vulns/$DATE/$file.vuln.md"

Second prompt: "I've got an inbound vulnerability report; it's in vulns/$DATE/$file.vuln.md. Verify for me that this is actually exploitable. Write the reproduction steps in vulns/$DATE/$file.triage.md"

Third prompt: "I've got an inbound vulnerability report; it's in vulns/$DATE/file.vuln.md. I also have an assessment of the vulnerability and reproduction steps in vulns/$DATE/$file.triage.md. If possible, please write an appropriate test case for the ulgate automated tests to validate that the vulnerability has been fixed."

Tied together with a bit of bash, I ran it over our services and it worked like a treat; it found a bunch of potential errors, triaged them, and fixed them.

reply

upvote

by jvanderbot15 hours ago|

[-]

Agree. Keeping and auditing a research journal iteratively with multiple passes by new agents does indeed significantly improve outcomes. Another helpful thing is to switch roles good cop bad cop style. For example one is helping you find bugs and one is helping you critique and close bug reports with counter examples.

reply

upvote

by sn911 hours ago|

[-]

Could prompt injection be used to trick this kind of analysis? Has anyone experimented with this idea?

reply

upvote

by ashwinr20029 hours ago|

[-]

Prompt Injections are very very rare these days after the Opus 4.6 update

reply

upvote

by throawayonthe15 hours ago|

[-]

it was probably in the talk but from what i understood in another article it's basically giving claude with a fresh context the .vuln.md file and saying "i'm getting this vulnerability report, is this real?"

edit: i remember which article, it was this one: https://sockpuppet.org/blog/2026/03/30/vulnerability-researc...

(an LWN comment in response to this post was on the frontpage recently)

reply

upvote

by 4b11b415 hours ago|

[-]

One such example is IRIS. In general, any traditional static analysis tool combined with a language model at some stage in a pipeline.

reply

upvote

by bch14 hours ago|

[-]

> This is expected in the normal population

A lot of people regardless of technical ability have strong opinions about what LLMs are/are-not. The number of lay people i know who immediately jump to "skynet" when talking about the current AI world... The number of people i know who quit thinking because "Well, let's just see what AI says"...

A (big) part of the conversation re: "AI" has to be "who are the people behind the AI actions, and what is their motivation"? Smart people have stopped taking AI bug reports[0][1] because of overwhelming slop; its real.

[0] https://www.theregister.com/2025/05/07/curl_ai_bug_reports/

[1] https://gist.github.com/bagder/07f7581f6e3d78ef37dfbfc81fd1d...

reply

upvote

by LeFantome12 hours ago|

[-]

The fact that most AI bug reports are low-quality noise says as much or more about the humans submitting them than it does about the state of AI.

As others have said, there are multiple stages to bug reports and CVEs.

1. Discover the bug

2. Verify the bug

You get the most false positives at step one. Most of these will be eliminated at step 2.

3. Isolate the bug

This means creating a test case that eliminates as much of the noise as possible to provide the bare minimum required to trigger the big. This will greatly aid in debugging. Doing step 2 again is implied.

4. Report the bug

Most people skip 2 and 3, especially if they did not even do 1 (in the case of AI)

But you can have AI provide all 4 to achieve high quality bug reports.

In the case of a CVE, you have a step 5.

5 - Exploit the bug

But you do not have to do step 5 to get to step 2. And that is the step that eliminates most of the noise.

reply

upvote

by antonvs16 hours ago|

[-]

> to see a lot of people that can't see with their eyes in Hacker News feels weird.

Turns out the average commenter here is not, in fact, a "hacker".

reply

upvote

by slopinthebag13 hours ago|

[-]

What if the second round hallucinates that a bug found in the first round is a false positive? Would we ever know?

> It does not matter how much LLMs advance, people ideologically against them will always deny they have an enormous amount of usefulness.

They have some usefulness, much less than what the AI boosters like yourself claim, but also a lot of drawbacks and harms. Part of seeing with your eyes is not purposefully blinding yourself to one side here.

reply

upvote

by nickphx15 hours ago|

[-]

they are useful to those that enjoy wasting time.

reply

upvote

by ksec15 hours ago|

[-]

>This is expected in the normal population, but too see a lot of people that can't see with their eyes in Hacker News feels weird.

You are replying to an account created in less than 60 days.

reply

upvote

by jvanderbot15 hours ago|

[-]

This is a bit unfair. Hackers are born every day.

reply

upvote

by ksec12 hours ago|

[-]

In relation to the quality of its comment. I thought it was a fair. He just completely made up about false positives.

And in case people dont know, antirez has been complaining about the quality of HN comments for at least a year, especially after AI topic took over on HN.

It is still better than lobster or other place though.

reply

upvote

by slekker13 hours ago|

[-]

Bots too, vanderBOT!

reply

upvote

by jvanderbot11 hours ago|

[-]

I used to work in robotics, and can't remember the password for my usual username so I pulled this one out of thin air years ago

reply

upvote

by sieabahlpark12 hours ago|

[-]

[dead]

reply

upvote

by mtlynch21 hours ago|

[-]

> What is not mentioned is that Claude Code also found one thousand false positive bugs, which developers spent three months to rule out.

Source? I haven't seen this anywhere.

In my experience, false positive rate on vulnerabilities with Claude Opus 4.6 is well below 20%.

reply

upvote

by Supermancho17 hours ago|

[-]

To the issue of AI submitted patches being more of a burden than a boon, many projects have decided to stop accepting AI-generated solutioning:

https://blog.devgenius.io/open-source-projects-are-now-banni...

These are just a few examples. There are more that google can supply.

reply

upvote

by logicprog16 hours ago|

[-]

According to Willy Tarreau[0] and Greg Kroah-Hartman[1], this trend has recently significantly reversed, at least form the reports they've been seeing on the Linux kernel. The creator of curl, Daniel Steinberg, before that broader transition, also found the reports generated by LLM-powered but more sophisticated vuln research tools useful[2] and the guy who actually ran those tools found "They have low false positive rates."[3]

Additionally, there was no mention in the talk by the guy who found the vuln discussed in the TFA of what the false positive rate was, or that he had to sift through the reports because it was mostly slop — or whether he was doing it out of courtesy. Additionally, he said he found only several hundred, iirc, not "thousands." All he said was:

"I have so many bugs in the Linux kernel that I can’t report because I haven’t validated them yet… I’m not going to send [the Linux kernel maintainers] potential slop, but this means I now have several hundred crashes that they haven’t seen because I haven’t had time to check them." (TFA)

He quite evidently didn't have to sift through thousands, or spend months, to find this one, either.

[0]: https://lwn.net/Articles/1065620/ [1]: https://www.theregister.com/2026/03/26/greg_kroahhartman_ai_... [2]: https://simonwillison.net/2025/Oct/2/curl/p [3]: https://joshua.hu/llm-engineer-review-sast-security-ai-tools...

reply

upvote

by 17 hours ago|

[-]

deleted

reply

upvote

by literalAardvark17 hours ago|

[-]

No, they haven't. Read the ai slop you posted carefully.

It's a policy update that enables maintainers to ignore low effort "contributions" that come from untrusted people in order to reduce reviewing workload.

An Eternal September problem, kind of.

reply

upvote

by coldtea17 hours ago|

[-]

Didn't you just restate what the parent claimed?

reply

upvote

by cwillu16 hours ago|

[-]

No, that's not at all the same thing: ai-generated contributions from people with a track record for useful contributions are still accepted.

reply

upvote

by dpark16 hours ago|

[-]

Right. AI submissions are so burdensome that they have had to refuse them from all except a small set of known contributors.

The fact that there’s a small carve out for a specific set of contributors in no way disputes what Supermancho claimed.

reply

upvote

by phanimahesh16 hours ago|

[-]

A powertool that needs discretion and good judgement to be used well is being restricted to people with a track record of displaying good judgement. I see nothing wrong here.

AI enables volume, which is a problem. But it is also a useful tool. Does it increase review burden? Yes. Is it excessively wasteful energy wise? Yes. Should we avoid it? Probably no. We have to be pragmatic, and learn to use the tools responsibly.

reply

upvote

by dpark15 hours ago|

[-]

I never said anything is wrong with the policy. Or with the tool use for that matter.

This whole chain was one person saying “AI is creating such a burden that projects are having to ban it”, someone else being willfully obtuse and saying “nuh uh, they’re actually still letting a very restricted set of people use it”, and now an increasingly tangential series of comments.

reply

upvote

by literalAardvark11 hours ago|

[-]

I feel like you're still failing to grasp the point.

The only difference is that before AI the number of low effort PRs was limited by the number of people who are both lazy and know enough programming, which is a small set because a person is very unlikely to be both.

Now it's limited to people who are lazy and can run ollama with a 5M model, which is a much larger set.

It's not an AI code problem by itself. AI can make good enough code.

It's a denial of service by the lazy against the reviewers, which is a very very different problem.

reply

upvote

by dpark10 hours ago|

[-]

No one is missing your point. The issue is that you are responding a point no one made.

The grounding premise of this comment chain was “AI submitted patches being more of a burden than a boon”. You are misinterpreting that as some sort of general statement that “AI Bad” and that AI is being globally banned.

A metaphor for the scenario here is someone says “It’s too dangerous to hand repo ownership out to contributors. Projects aren’t doing that anymore.” And someone else comes in to say “That’s not true! There are still repo owners. They are just limiting it to a select group now!” This statement of fact is only an interesting rebut if you misinterpret the first statement to say that no one will own the repo because repo ownership is fundamentally bad.

> It's a denial of service by the lazy against the reviewers, which is a very very different problem.

And it is AI enabling this behavior. Which was the premise above.

reply

upvote

by coldtea15 hours ago|

[-]

Yes, but technically no different than "good contributions from humans are still accepted, AI slop can fuck off".

Since the onus falls on those "people with a track record for useful contributions" to verify, design tastefully, test and ensure those contributions are good enough to submit - not on the AI they happen to be using.

If it fell on the AI they're using, then any random guy using the same AI would be accepted.

reply

upvote

by christophilus19 hours ago|

[-]

Same. Codex and Claude Code on the latest models are really good at finding bugs, and really good at fixing them in my experience. Much better than 50% in the latter case and much faster than I am.

reply

upvote

by paulddraper17 hours ago|

[-]

Source: """AI is bad"""

reply

upvote

by r929520 hours ago|

[-]

In my experience, the issue has been likelihood of exploitation or issue severity. Claude gets it wrong almost all the time.

A threat model matters and some risks are accepted. Good luck convincing an LLM of that fact

reply

upvote

by j16sdiz20 hours ago|

[-]

In TFA:

   I have so many bugs in the Linux kernel that I can’t 
   report because I haven’t validated them yet… I’m not going 
   to send [the Linux kernel maintainers] potential slop, 
   but this means I now have several hundred crashes that they
   haven’t seen because I haven’t had time to check them.
    
    —Nicholas Carlini, speaking at [un]prompted 2026

reply

upvote

by mtlynch20 hours ago|

[-]

Those aren't false positives; they're results he hasn't yet inspected.

I wrote a longer reply here: https://news.ycombinator.com/item?id=47638062

reply

upvote

by coldtea17 hours ago|

[-]

>Those aren't false positives; they're results he hasn't yet inspected.

It's not a XOR

reply

upvote

by Ukv16 hours ago|

[-]

The article quote was being given as the supposed source for "Claude Code also found one thousand false positive bugs, which developers spent three months to rule out", so should substantiate that claim - which it doesn't.

If the claim was instead just "a good portion of the hundreds more potential bugs it found might be false positives", then sure.

reply

upvote

by tptacek14 hours ago|

[-]

Yes it is. They're not not false positives until they're reported and consume maintainer time.

reply

upvote

by bethekidyouwant17 hours ago|

[-]

some of them certainly are…

reply

upvote

by sobiolite16 hours ago|

[-]

The comment said "Claude Code also found one thousand false positive bugs, which developers spent three months to rule out.".

Please explain how a bug can both be unvalidated, and also have undergone a three month process to determine it is a false positive?

reply

upvote

by linsomniac17 hours ago|

[-]

The article doesn't say they found a bunch of false positives. It says they have a huge backlog that they still need to test:

"I have so many bugs in the Linux kernel that I can’t report because I haven’t validated them yet…"

reply

upvote

by vaginaphobic17 hours ago|

[-]

[dead]

reply

upvote

by goalieca18 hours ago|

[-]

Static/Dynamic analysis tools find vulnerabilities all the time. Almost all projects of a certain size have a large backlog of known issues from these boring scanners. The issue is sorting through them all and triaging them. There's too many issues to fix and figuring out which are exploitable and actually damaging, given mitigations, is time consuming.

Am i impressed claude found an old bug? Sort of.. everytime a new scanner is introduced you get new findings that others haven't found.

reply

upvote

by tptacek14 hours ago|

[-]

Static analyzers find large numbers of hypothetical bugs, of which only a small subset are actionable, and the work to resolve which are actionable and which are e.g. "a memcpy into an 8 byte buffer whose input was previously clamped to 8 bytes or less" is so high that analyzers have little impact at scale. I don't know off the top of my head many vulnerability researchers who take pure static analysis tools seriously.

Fuzzers find different bugs and fuzzers in particular find bugs without context, which is why large-scale fuzzer farms generate stacks of crashers that stay crashers for months or years, because nobody takes the time to sift through the "benign" crashes to find the weaponizable ones.

LLM agents function differently than either method. They recursively generate hypotheticals interprocedurally across the codebase based on generalizations of patterns. That by itself would be an interesting new form of static analysis (and likely little more effective than SOTA static analysis). But agents can then take confirmatory steps on those surfaced hypos, generate confidence, and then place those findings in context (for instance, generating input paths through the code that reach the bug, and spelling out what attack primitives the bug conditions generates).

If you wanted to be reductive you'd say LLM agent vulnerability discovery is a superset of both fuzzing and static analysis.

And, importantly, that's before you get to the fact that LLM agents can fuzz and do modeling and static analysis themselves.

reply

upvote

by goalieca11 hours ago|

[-]

There are plenty of static analyzers do attempt to walk code paths for reachability. Some even track tainted input. And yes, these are often good starting points for developing exploits. I’ve done this myself.

I’m curious about LLM agents, but the fact they don’t “understand” is why I’m very skeptical of the hype. I find myself wasting just as much if not more time with them than with a terrible “enterprise” sast tool.

reply

upvote

by boplicity19 hours ago|

[-]

The lesson here shouldn't be that Claude Code is useless, but that it's a powerful tool in the hands of the right people.

reply

upvote

by amelius18 hours ago|

[-]

Unfortunately, also in the hands of the __wrong__ people.

Maybe even more so, because who is going to wade through all those false positives? A bad actor is maybe more likely to do that.

reply

upvote

by embedding-shape18 hours ago|

[-]

> A bad actor is maybe more likely to do that.

Do something about that then, so white-hat hackers are more likely than black-hat hackers to wanting to wade through that, incentives and all that jazz.

reply

upvote

by ruszki14 hours ago|

[-]

We couldn’t solve the incentive against misinformation/disinformation since inception, we made it even worse than 20 years ago. Even when we know how it works exactly, even on the internet, not just generally. These kinds of statements seem quite unrealistic to me.

reply

upvote

by amelius13 hours ago|

[-]

Good luck with that. Security is at the bottom of everyone's budget allocation list.

reply

upvote

by mavamaarten18 hours ago|

[-]

I'm growing allergic to the hype train and the slop. I've watched real-life talks about people that sent some prompt to Claude Code and then proudly present something mediocre that they didn't make themselves to a whole audience as if they'd invented the warm water, and that just makes me weary.

But at the same time, it has transformed my work from writing everything bit of code myself, to me writing the cool and complex things while giving directions to a helper to sort out the boring grunt work, and it's amazingly capable at that. It _is_ a hugely powerful tool.

But haters only see red, and lovers see everything through pink glasses.

reply

upvote

by iterateoften18 hours ago|

[-]

Sounds like maybe you might have some mixed feelings about becoming more effective with ai, but then at the same time everyone else is too so the praise youre expecting is diluted.

I see it all the time now too. People have no frame of reference at all about what is hard or easy so engineers feel under-appreciated because the guy who never coded is getting lots of praise for doing something basic while experienced people are able to spit out incredibly complex things. But to an outsider, both look like they took the same work.

reply

upvote

by ofrzeta12 hours ago|

[-]

I am also torn because obviously the LLMs have a lot of value but the amount of misuse is overwhelming. People just keep pasting slop into story descriptions that no one can keep up. There should be guidelines at work places to use AI responsibly.

reply

upvote

by sph18 hours ago|

[-]

> it has transformed my work […] to me writing the cool and complex things

> it's amazingly capable at that.

> It _is_ a hugely powerful tool

Damn, that’s what you call being allergic to the hype train? This type of hypocritical thinly-veiled praise is what is actually unbearable with AI discourse.

reply

upvote

by asyx18 hours ago|

[-]

I don’t think it is controversial that AI tools are good enough at crud endpoints that it is totally viable to just let it run through the grunt work of hooking up endpoints to a service and then you can focus on the interesting aspect of the application which is exactly that service.

reply

upvote

by righthand19 hours ago|

[-]

The lesson or the hype mantra?

reply

upvote

by teeray18 hours ago|

[-]

The same could be said about a Roulette wheel set before a seasoned gambler

reply

upvote

by TheCoreh17 hours ago|

[-]

Can a Roulette wheel set find vulnerabilities in software?

reply

upvote

by 17 hours ago|

[-]

deleted

reply

upvote

by edoceo17 hours ago|

[-]

If vulnerability=compulsion and software=meat bags then yes.

reply

upvote

by throw-the-towel17 hours ago|

[-]

This is a non-sequitur if I ever saw one.

reply

upvote

by vntok16 hours ago|

[-]

No. The seasoned gambler can not learn things that measurably increase their chance at the Roulette, whereas they definitely can do that with an LLM. And the LLM itself becomes smarter over time through hardware upgrades, software updates and even memory for those who enable that feature.

reply

upvote

by dekhn15 hours ago|

[-]

Everything changed in the past 6 months and coding LLMs went from being OK-ish to insanely good. People also got better at using them.

Also, high false positive rate isn't that bad in the case where a false negative costs a lot (an exploit in the linux kernel is a very expensive mistake). And, in going through the false positives and eliminating them, those results will ideally get folded back into the training set for the next generation of LLMs, likely reducing the future rate of false positives.

reply

upvote

by catlifeonmars15 hours ago|

[-]

> Everything changed in the past 6 months and coding LLMs went from being OK-ish to insanely good. People also got better at using them.

I hear this literally every 6 months :)

reply

upvote

by tptacek14 hours ago|

[-]

It hasn't been true forever, but it has been true over the last 18 months or so.

reply

upvote

by bri3d16 hours ago|

[-]

This is not how first party vulnerability research with LLMs go; they are incredibly valuable versus all prior tooling at triage and producing only high quality bugs, because they can be instructed to produce a PoC and prove that the bug is reachable. It’s traditional research methods (fuzzing, static analysis, etc.) that are more prone to false positive overload.

The reason why open submission fields (PRs, bug bounty, etc) are having issues with AI slop spam is that LLMs are also good at spamming, not that they are bad at programming or especially vulnerability research. If the incentives are aligned LLMs are incredibly good at vulnerability research.

reply

upvote

by logicprog16 hours ago|

[-]

Okay, so anti AI people are just making shit up now. Got it.

According to Willy Tarreau[0] and Greg Kroah-Hartman[1], this trend has recently significantly reversed, at least form the reports they've been seeing on the Linux kernel. The creator of curl, Daniel Steinberg, before that broader transition, also found the reports generated by LLM-powered but more sophisticated vuln research tools useful[2] and the guy who actually ran those tools found "They have low false positive rates."[3]

Additionally, there was no mention in the talk by the guy who found the vuln discussed in the TFA of what the false positive rate was, or that he had to sift through the reports because it was mostly slop — or whether he was doing it out of courtesy. Additionally, he said he found only several hundred, iirc, not "thousands." All he said was:

"I have so many bugs in the Linux kernel that I can’t report because I haven’t validated them yet… I’m not going to send [the Linux kernel maintainers] potential slop, but this means I now have several hundred crashes that they haven’t seen because I haven’t had time to check them." (TFA)

He quite evidently didn't have to sift through thousands, or spend months, to find this one, either.

[0]: https://lwn.net/Articles/1065620/ [1]: https://www.theregister.com/2026/03/26/greg_kroahhartman_ai_... [2]: https://simonwillison.net/2025/Oct/2/curl/p [3]: https://joshua.hu/llm-engineer-review-sast-security-ai-tools...

reply

upvote

by Trufa13 hours ago|

[-]

What is with negativity against AI in YC? Can anyone point a finger of why this anti take is so prominent? We're living through the most revolutionary moment of software since it's its inception and the main thing that gets consistently upvoted is negativity, FUD and it doesn't work in this case, or it's all slop.

reply

upvote

by bwfan12313 hours ago|

[-]

> Can anyone point a finger of why this anti take is so prominent?

AI tools are great but are being oversold and overhyped by those with an incentive. So, there is a continuous drumbeat of "AI will do all the code for you" ! "Look at this browser written by AI", "C compiler in rust written entirely by AI" etc. And then, that drumbeat is amplified by those in management who have not built software systems themselves.

What happened to the AI generated "C compiler in rust" ? or the browser written by AI ? - they remain a steaming pile of almost-working code. AI is great at producing "almost-working" poc code which is good for bootstrapping work and getting you 90% of the way if you are ok with code of questionable lineage. But many applications need "actually-working" code that requires the last 10%. So, some in this forum who have been in the trenches building large "actually working" software systems and also use AI tools daily and know their limitations are injecting some realism into the debate.

reply

upvote

by sothatsit7 hours ago|

[-]

I think the anti-AI stance has been reversing on HN as tooling improves and people try it. It’s only been a little over a year since Claude Code was released, and 3 or 4 months since the models got really capable. People need time to adjust, even if I would expect devs to be more up-to-date than most.

People’s willingness to argue about technology they’ve barely used is always bewildering to me though.

reply

upvote

by arealaccount12 hours ago|

[-]

Not speaking for myself but the you won’t have a job soon narrative puts people off

reply

upvote

by 12 hours ago|

[-]

deleted

reply

upvote

by sva_21 hours ago|

[-]

Couldn't you just make it write a PoC?

reply

upvote

by tptacek14 hours ago|

[-]

Yes, you can. I strongly encourage people skeptical about this, and who know at a high-level how this kind of exploitation works, to just try it. Have Claude or Codex (they have different strengths at this kind of work) set up a testing harness with Firecracker or QEMU, and then work through having it build an exploit.

reply

upvote

by weird-eye-issue19 hours ago|

[-]

Still have to validate it.

reply

upvote

by matthewfcarlson17 hours ago|

[-]

I’ve started to see bug bounty programs put flags into the product (see apples target flags https://security.apple.com/bounty/target-flags/).

I wonder if it’s partially to make it easier to validate from an AI perspective

reply

upvote

by Gregaros20 hours ago|

[-]

[flagged]

reply

upvote

by addandsubtract21 hours ago|

[-]

On the other hand, some bugs take three months to find. So this still seems like a win.

reply

upvote

by sixhobbits15 hours ago|

[-]

From a recent front page article that mentioned the previous slop problem:

> Now most of these reports are correct, to the point that we had to bring in more maintainers to help us.

https://news.ycombinator.com/item?id=47611921

reply

upvote

by xeromal17 hours ago|

[-]

[dead]

reply

upvote

by khalic20 hours ago|

[-]

[flagged]

reply

upvote

by j16sdiz20 hours ago|

[-]

[flagged]

reply

upvote

by khalic20 hours ago|

[-]

He explicitly talks about not sending the maintainers slop, learn how to read.

reply