Anthropic's open-source framework for AI-powered vulnerability discovery

upvote

Anthropic's open-source framework for AI-powered vulnerability discovery

(github.com)

506 points

by binyu21 hours ago |

upvote

by tptacek21 hours ago|

[-]

The thing about things like this is that they're shop jigs. You can buy a crosscut sled if you really want to, but most woodworkers just make their own.

It was a different situation 2 years ago, when there was significant cost to building your own harness (but then: you probably weren't doing AI vuln research 2 years ago). Today, I think your best bet is to look at something like this for ideas, and then just ask for your own, to fit your own work style, with your own interface, your own notion of target and effort specification, and your own alerting.

reply

upvote

by redfloatplane20 hours ago|

[-]

"Shop jigs" is a great way to put it. I think a lot of software has gone from being made for general use to extremely individualised use. Before the Age of AI, it took so much human effort to write something that solved your problem that you might often go the extra mile so that others could re-use it. Now, it takes almost no effort, so the software stays ungeneralised. Some of the incentive has changed, I think. Most of the time I no longer share the things I've been building[0] because, for one thing they simply couldn't possibly have any benefit for others, and if they need something like it, they can build exactly the thing they want instead of having to extend or modify my thing. Like a jig!

0: https://redfloatplane.lol/blog/17-why-share/ (and related posts, I guess)

reply

upvote

by colmmacc19 hours ago|

[-]

Unless it is very specific to a proprietary product, craftspeople take their jigs with them from job to job, building up a personal library over a career. As a software developer I've always had a well-tuned IDE and shell config in a safe place.

Something I think about a lot is what is the equivalent for the software builders of today using AI tools? how do make these harnesses exportable and portable? You might think employers would be against this; make it more costly to leave. But I actually think most will favor this because it makes people more productive more quickly. But we have to find ways to normalize it and show that there are no security leaks in the process (like might make it in to a set of personal steering prompts).

reply

upvote

by tptacek17 hours ago|

[-]

Just nerding out here, not rebutting, but when you say "craftspeople take their jigs with them from job to job" --- sort of. Sometimes. I think if you put a woodworker in a position where they obliged to build a new miter sled or assembly table, they might actually be thrilled. You make a tool, you use it for awhile, you build up a mental list of things you'd like to improve about it, that you'd do differently if you got a do-over; now you have an excuse to do it.

reply

upvote

by happyopossum16 hours ago|

[-]

This, for like 37 things in my workshop right now.

reply

upvote

by aquajet18 hours ago|

[-]

Using something like pi helps. I've made my own dotfiles for skills/extensions I like and can install them just like my normal dotfiles

https://github.com/anishthite/agent-dotfiles

reply

upvote

by djfergus13 hours ago|

[-]

"Humor When you finish a job — completing a task, answering a question, fixing a bug, shipping a feature — end your final message with one short funny line. A quip, a dad joke, a wry observation, a playful self-roast. One line. No emoji spam. Make it land, then shut up."

whats the purpose of this? just fun or does it cause some desired behaviour?

reply

upvote

by ClikeX10 hours ago|

[-]

> does it cause some desired behaviour?

Fun is desirable.

reply

upvote

by agravier17 hours ago|

[-]

I've imported and adapted my personal agentic dev framework to my team relatively successfully (as I've kept it relatively harness independent), but it requires actually owning it, vibed or bloated or conceptually inconsistent stuff bite a lot when porting things over.

reply

upvote

by worldsayshi18 hours ago|

[-]

> craftspeople take their jigs with them from job to job

Except for software gigs the software typically belongs to the customer so you'd need to rewrite it every time...

reply

upvote

by ClikeX9 hours ago|

[-]

Depends. With all the web agencies I've made, the only code that belonged to customers was the actual website part. Any of the "jigs" that we made for our workflow was not part of that.

And contractually, any code I made was my employer's if I made it during office hours. Some even made a claim for code I would've written that during my employ that would be "competitive". Luckily, there was a massive difference in what I would do in my own time versus what they did.

reply

upvote

by borski18 hours ago|

[-]

Depends. If you are a contractor, like most craftspeople, your tools are your own.

reply

upvote

by ninjalanternshk17 hours ago|

[-]

My contracts always state I own tools created or byproducts of the work that don't end up in the work.

reply

upvote

by pjmlp7 hours ago|

[-]

Only if you are self employed, otherwise it belongs to the agency.

reply

upvote

by borski6 hours ago|

[-]

Again: it depends. It is all about how the contract is written.

reply

upvote

by pjmlp6 hours ago|

[-]

I never seen any other kind of contract, on my 50ys.

reply

upvote

by krzyk2 hours ago|

[-]

I'm curious how does it work, you handover the tools you wrote, .bashrc/.zshrc, etc?

When I'm hired in a company (not contract), they wipe the harddrive when I leave (well, I also do it before I hand it over sometimes). So they don't get the tools (I take them with myself, it would be a waste to loose them)

reply

upvote

by borski5 hours ago|

[-]

You're definitely right for most agencies; most will let you use it in a portfolio or something, but not necessarily retain the rights to the work.

Some agencies do, however; it's dependent on the contract specifics.

reply

upvote

by jaxn18 hours ago|

[-]

i have been thinking about this from a different direction: how do we make these shared within a company in a way that increases the productivity floor of the team/department/company. Sure, they can still be extended/enhanced by individuals, but we don’t need everyone configuring mcps, building institutional memory, etc.

for me, it’s not about the cost to leave, it’s about lowering the cost of onboarding and change.

reply

upvote

by beezlewax11 hours ago|

[-]

No effort? You are really drinking the AI marketing soup with that one.

"It takes less effort for some parts of the software development life cycle" would be more correct.

reply

upvote

by andhug19 hours ago|

[-]

That’s an interesting way to say “code quality in the age of ai has gone out the window”

reply

upvote

by drtz19 hours ago|

[-]

Are you suggesting that performing a specific task without unnecessary abstractions is indicative of poor quality?

reply

upvote

by jorl1719 hours ago|

[-]

This is exactly it.

I've said many times that I believe "using the computer will transparently involve having it write and run code for you" (and if you're not technical you won't even know it!). What you're saying goes in that direction as well.

I feel that it's often better for us to create purpose-built tools for our lives, and with every model release, the complexity of those tools grows.

These are really personal tools: they solve a problem that other people might have, but are very tied to your own specific way of working, and would be hard to explain or adapt to someone else. So: shop jigs.

I have about 10 custom scripts and programs that are like this -- I haven't felt like this since college! Back then I had all the time in the world to customize my setup...now I have agents!

In a way, I want to show this to all my friends, but whenever I mentally trace how that would go, I realize they wouldn't really understand a bunch of the quirks they have, because they are _my_ quirks. They're reasonably complex pieces of tech that solve my problems very well, which are themselves particular versions of broader problems, and which I (at least for now) have no interest in supporting.

It's so clear we're heading in this direction, and yet so many people still believe code will be for the elites. Maybe production-code...As for the rest, I think soon your mom and dad are going to have their computer running code it wrote to serve them. Security-wise it's scary, but it's exciting to think about!

reply

upvote

by ashdksnndck14 hours ago|

[-]

Sure it’s possible for anyone to build a harness if they had the inclination, but most people don’t have the inclination to do that.

And even if you did… I spent months refining AI workflows that were just obsoleted by ultracode.

reply

upvote

by Npovview12 hours ago|

[-]

Just as Python is batteries included language, we similarly need batteries included harnesses as well. This is what I don't like minimalism setups like Pi.

reply

upvote

by hsaliak17 hours ago|

[-]

I’ve been looking for a way to articulate this shift, and your analogy nails it. The value of libraries and infrastructure components in software engineering is eroding fast.

I am sure that in many organizations, teams responsible for this sort of work have less and less users coming to them.

reply

upvote

by tptacek15 hours ago|

[-]

Maybe for developer tooling, but on the consumer app side I think it's the opposite: MusicKit is much more valuable than Music.app now, because Claude can one-shot most reasonable things you could ask it to do. I think there's actually more value in ambitious libraries than there was 5 years ago, when any serious use of a library entailed a minimum 5-figure investment of time.

reply

upvote

by flir13 hours ago|

[-]

I had a pleasant experience one-shotting a dashboard on top of a library designed for building dashboards. Because everything was abstracted away, the chatbot had relatively few places it could get into the weeds. If I'd asked for the same thing from scratch, I think the result would have been more inconsistent, and would have had more bugs.

So I can definitely see the value in a library for constraining the chatbot to some well-worn paths.

reply

upvote

by AndrewKemendo2 hours ago|

[-]

100% concur and if you dig into any of these tools they are all frameworks and wrappers with prompt injections

reply

upvote

by nbardy10 hours ago|

[-]

In general this is the way I see open source going.

We won't reuse open source libraries as libraries we import, but as design inspiration for the bespoke tools we make.

It's too cheap to make your own stuff and too expensive to be stuck with someone else primitives.

But grounding AI Coding in existing tools is incredibly powerful.

reply

upvote

by borski18 hours ago|

[-]

I agree with this wholeheartedly.

reply

upvote

by claud_ia7 hours ago|

[-]

[flagged]

reply

upvote

by sieabahlpark20 hours ago|

[-]

[dead]

reply

upvote

by zuzululu20 hours ago|

[-]

[flagged]

reply

upvote

by ryancw20 hours ago|

[-]

As a woodworker, it’s a really nice analogy and beyond anything I’ve seen AI do.

reply

upvote

by zuzululu19 hours ago|

[-]

No idea why people are so upset I genuinely thought his references using analogy was a typical AI slop comment that I'm used to seeing from chatgpt

reply

upvote

by Retr0id19 hours ago|

[-]

Believe it or not, people have been making analogies since before AI

reply

upvote

by timacles19 hours ago|

[-]

They used to, they still do, but they used to too.

reply

upvote

by zuzululu19 hours ago|

[-]

I believe you and you can also believe AI is pretty good at it too.

reply

upvote

by 19 hours ago|

[-]

deleted

reply

upvote

by 20 hours ago|

[-]

deleted

reply

upvote

by ghhhibhc20 hours ago|

[-]

It really doesn’t

reply

upvote

by zuzululu20 hours ago|

[-]

[flagged]

reply

upvote

by sermah20 hours ago|

[-]

    user: zuzululu
    created: 47 days ago
    karma: 228`

ok

reply

upvote

by simonw21 hours ago|

[-]

I wonder how much this thing costs to run.

https://github.com/anthropics/defending-code-reference-harne... says:

> As a rough guideline, expect ~10K uncached input tokens/min and ~2K output tokens/min per agent. You can scale parallelism up to your account's ITPM limit (roughly 10 agents per 100K ITPM).

My guess would be hundreds of dollars with Opus and thousands of dollars with Mythos.

reply

upvote

by nikcub21 hours ago|

[-]

It's becoming apparent that it requires more tokens to secure code than it does to write it

May even be an order of magnitude more

reply

upvote

by Mtinie21 hours ago|

[-]

In all seriousness, wasn’t that always the case? Writing bad code is relatively cheap.

Ensuring code isn’t bad is the expensive part.

reply

upvote

by chrisweekly19 hours ago|

[-]

Sort of?

The definition of "bad" from a security PoV is rapidly expanding, in light of relatively new capabilities and increasingly cheap access to exploitable vulnerabilities.

reply

upvote

by fny18 hours ago|

[-]

I don't think the definition of "bad" is expanding. Rather the ability to detect and exploit "bad" is.

reply

upvote

by chrisweekly15 hours ago|

[-]

fair point. another way of putting it might be to say that, for all extant software, much more of it is "bad" than we realized even a month or two ago -- and the cost to create and maintain "good" software is increasing (even as the naive / surface-level / apparent cost is plummeting)

reply

upvote

by kenjackson14 hours ago|

[-]

Same thing happened with the growth of the internet. There was a time when there was basically no consideration of buffer overflow.

reply

upvote

by tptacek20 hours ago|

[-]

For now, maybe, yes? But the most important targets of this kind of work aren't AI outputs; it's legacy code, particularly (but not exclusively) old memory-unsafe code. In those situations the figure of merit isn't the token cost of recreating the target code; it's the cost of finding the same bugs with humans or preexisting tools.

Those costs can be extremely high.

reply

upvote

by thisogood20 hours ago|

[-]

[dead]

reply

upvote

by ath3nd20 hours ago|

[-]

Any newly produced AI code is immediately legacy and trash at the same time.

reply

upvote

by andai11 hours ago|

[-]

There's a parallel between looking for bugs and mining. As models get smarter, they'll find "deeper bugs".

I expect at some point formal verification will become more economical than red teaming. Writing it correctly is more expensive, but it may be cheaper than trying to secure incorrect software.

(Or rather, as hacking incorrect software becomes vastly cheaper, the amount of software worth writing properly will increase.)

I've been thinking, by Dijkstra's standards we have already been vibe coding for almost a century :)

reply

upvote

by 2 hours ago|

[-]

deleted

reply

upvote

by XCSme3 hours ago|

[-]

Not if the original code is secure...

reply

upvote

by sam-cop-vimes9 hours ago|

[-]

Are AI firms going to charge us to write code, and then charge us even more to secure it?!

reply

upvote

by smt889 hours ago|

[-]

Yes, obviously. Infosec has always been plagued by this. How many services make you pay for SSO?

reply

upvote

by windexh8er20 hours ago|

[-]

Given the slop that's made its way to Github we can see that this is a great profit model. Ship slop and then "fix" slop. What an efficient use of our planet!

reply

upvote

by bflesch21 hours ago|

[-]

It's weird because why can't they train the AI to simply output secure code?

The basic security flaws with regards to input validation and overflows should never ever be output by an AI. For "security flaws due to bad design" I'll cut them slack until AGI is achieved.

reply

upvote

by simonw20 hours ago|

[-]

> It's weird because why can't they train the AI to simply output secure code?

The most interesting security bugs have causes that are spread across large codebases, or networks of dependencies.

Training the AI to "output secure code" won't work if it doesn't also have access to the source code of every dependency that it's using... and even then, given current model speeds and prices most developers won't want to wait for an hour on every edit they make while the LLM reasons through all of the dependencies.

reply

upvote

by 12 hours ago|

[-]

deleted

reply

upvote

by tptacek19 hours ago|

[-]

What's destabilizing the industry right now isn't vulnerabilities AI introduces into new code; it's a flood of sev:hi vulnerabilities in existing code, not introduced by AI but discovered by it.

reply

upvote

by chrisweekly19 hours ago|

[-]

Agreed -- and, compounding the challenge, the flood of _reported_ high-sev CVEs is itself a kind of DDoS attack on maintainers.

reply

upvote

by froggit7 hours ago|

[-]

> What's destabilizing the industry right now isn't vulnerabilities AI introduces into new code; it's a flood of sev:hi vulnerabilities in existing code, not introduced by AI but discovered by it.

Vulnerability discovery has essentially moved to a "proof of work" computation model with AI that has some similarities to crypto like BTC or ethereum 1.0. I don't see any reason a well funded adversary couldn't use this same process on open-source code to develop exploits. I'm sure AI would be happy to try and create exploits from the results rather than fixes.

This sort of proof of work has a notable difference from crypto in the asymmetric nature of what each side is targeting. In crypto, each miner was attempting to find a solution to the same problem and they would all move on to a new one once a solution is found. However with AI vulnerability scanning, the non-deterministic nature means an adversary is likely to find different vulnerabilities. Even if it doesn't, the adversaries have a different post-discovery workflow (i.e. probably less compute intensive aka cheaper due to only needing one viable exploit to win) than the software maintainers do.

Considering it's possible both the adversary and their target could both do all this while running Claude puts Anthropic in a real "Merchant of Death" position.

reply

upvote

by tptacek4 hours ago|

[-]

This doesn't make sense. Claude isn't creating the vulnerabilities. They've been here the whole time. You just get to know about them now.

reply

upvote

by bflesch8 hours ago|

[-]

Even before that everybody was getting drowned in shitty reports from automated tools.

The goal of AI-generated code should not be that one needs a AI-based security review tool on top of it, but that the AI-generated code in itself is reasonably secure.

reply

upvote

by iammrpayments11 hours ago|

[-]

Hello Sam

reply

upvote

by ethanmg15 minutes ago|

[-]

[flagged]

reply

upvote

by bobkb19 hours ago|

[-]

I think these audit tools can look beyond just security and can look for compliance audits as well. The ability to audit real targets in staging environments makes it easy to identify issues.

reply

upvote

by niros_valtos16 hours ago|

[-]

I think that the cost of Opus is already prohibitively expensive, so not sure how that would compare to Mythos. Check this calculator- it shows that a company with 100 devs can hit ~2.5M cost on tokens annually, which is wild! https://ai-cost-calculator.arnica.io

reply

upvote

by Quinner52 minutes ago|

[-]

A 100 dev team is going to cost on the order of $25m a year (keep in mind cost is not just their salary, but also the HR/Management org to support a team of that size, benefits, office space, hardware/software). So if you think you get a 10% boost in productivity out of Opus, its not prohibitive at all.

reply

upvote

by pixl9716 hours ago|

[-]

It's wild, but how many FLOPs in computation is occurring in those 2.5M in tokens doing? Might not sound quite as wild using that metric.

reply

upvote

by binyu20 hours ago|

[-]

Claude workflows in ultra code mode works in a very similar fashion and it consumes a moderate amount of the session usage limit, depending on the complexity of the task. With the API it would probably get expensive quickly though

reply

upvote

by eranation16 hours ago|

[-]

We actually created a calculator to estimate scanning costs (including whether you do this continuously or not) https://ai-cost-calculator.arnica.io

It's an estimate, so it might be wrong, but it gives the ballpark based on our experience. Happy to hear everyone's feedback.

reply

upvote

by Terretta19 hours ago|

[-]

If you compare to their managed service, that estimate is likely 1/10th expectation, depending on codebase.

But even this larger number, in turn, can be about 1/10th the cost of a formal engagement to discover the type of findings it seems to be going for: things that do not show up from PR reviews or even /security-review without the pre-work steps in the open-source framework guided by an expert. That's not counting the time and delay to figure out how to do that engagement.

Bluntly: if it matters, while this is a month's vibing budget for a single scan, it is also "pennies on the dollar" dirt cheap.

At the same time, its findings still need an expert. Its suggestions may be helpful, they may be actively harmful, depends on the prework quality.

Recommendation to IT department heads: spend a couple grand on this, use the scare page to rustle up the budget to build a relationship with a red team that can find, triage, help remediate if needed, and train your in-house team to be "security minded".

reply

upvote

by mmaney1316 hours ago|

[-]

Just another example of an overextension of technology in a scenario where applying a proper harness would suffice.

Reminiscent of the early days of tax automation where importing a W2 cost hundreds of dollars until people realized typing in 6 boxes worth of data was easy and paying the automation fee ate up their entire tax return.

reply

upvote

by Analemma_21 hours ago|

[-]

I mean, you don't need to run it all the time, right? You do it once over your entire existing codebase to start and then once over the diff in your CI/CD pipeline when you make a new change. I'm sure it's not literally that simple but I doubt these need to churn 24/7/365 either.

reply

upvote

by xerxes24921 hours ago|

[-]

In the Mythos blogpost they revealed to run the model like a 1000 times on the same code-base maybe with slightly different prompt or temperature. That suggests it will just be pay to win. If the 'attacker' spends more money/tokens than the 'defender' you will eventually be outclassed.

reply

upvote

by sofixa19 hours ago|

[-]

It's even worse, it's loot box style. Not pay to win, but pay to have the chance to win. The result will always be non-deterministic, so for some cases it can give you what you're looking for from the first time, or it can take 1000 tries.

reply

upvote

by beering19 hours ago|

[-]

It’s never not been “loot box style”. None of your past hired security audits were guaranteed to catch all issues?

reply

upvote

by vb-844821 hours ago|

[-]

You are supposed to run it on full codebase before any single PR gets merge.

reply

upvote

by jazz9k21 hours ago|

[-]

Companies don't make production pushes yearly. For many, it's two week sprints..and that's one project.

This doesn't make any sense cost-wise. It would be cheaper to just hire a security engineer.

reply

upvote

by vessenes6 hours ago|

[-]

I agree the cost curve has shifted. But if we take the Mozilla team's Mythos report as a broad baseline, you need to hire something like 10 security engineers to equal the Mythos productivity. Put another way, everyone's under hiring security by a LOT right now, we just have been lucky enough to see similar under hiring on hackers.

reply

upvote

by kolesnikov-arch10 hours ago|

[-]

[flagged]

reply

upvote

by yalogin1 hours ago|

[-]

Anthropic realized security and safety are their main value prop compared to the competition. Either mythos or anything else since seem purpose built to streamline the messaging. It’s good, am not complaining, but i wonder how much this is intended to showcase what Claude can do over using it as is

reply

upvote

by HarHarVeryFunny3 hours ago|

[-]

They seem to be using this to advertise their "Claude Security" product which promises to find vulnerabilities in your software.

This makes for a somewhat amusing set of product offerings given that according to Dario 90% of all software is being AI generated.

Maybe next they can sell something to find the bugs in the security scanner ?

reply

upvote

by bwfan1232 hours ago|

[-]

> Maybe next they can sell something to find the bugs in the security scanner ?

So, tokens are used to produce sloppy code, and then this thing uses more tokens to fix vulnerabilities in the slop ? Whats not to like in this business model ? Similar to microsoft's. Create an OS which is vulnerable, and then enable business models for anti-virus software. Everyone wins.

More seriously, linters are turned off in ci because the amount of time spent chasing false-positives is prohibitive.

reply

upvote

by lanyard-textile21 hours ago|

[-]

>This repo is not maintained and is not accepting contributions.

Hm :)

reply

upvote

by Hamuko20 hours ago|

[-]

Why isn't Claude maintaining it?

reply

upvote

by politelemon12 hours ago|

[-]

They must have solved coding that well.

reply

upvote

by skeledrew20 hours ago|

[-]

They pretty much saying the efficacy of the tool can be tested by anyone to determine if it's worth purchasing the more polished and up-to-date commercial offering.

reply

upvote

by spacebacon21 hours ago|

[-]

This one is and should be adapted to every frozen model ASAP.

https://github.com/space-bacon/SRT

Significantly improve every frozen model overnight. LFG.

reply

upvote

by baby18 hours ago|

[-]

Our experience has been that without a good harness you don't really get much out of codex/claude. And you really need to spend time and energy figuring out why coding agents can't find bugs like you can.

Every week I see bugs (as an auditor) that our own harness (https://zkao.io/) can't find, and we have to figure out pretty interesting techniques in order to make the tool find them. Mind you I'm talking mostly about cryptographic vulnerabilities, not just webapp bugs. So IMO it's going to make a lot of sense for companies to have both their own harness (as tptacek is talking about) and pay for services that focus on making a good harness from experience (and audit firms are going to be the best at doing this, as they see a lot of bugs and can spend time "teaching" their harness about these bugs)

On the other hand, you have to find equally as good techniques to triage, because otherwise you just have some machinery that I call "vibe auditing" that just produces enough false positives to tire all the developers (who are already overwhelmed with crappy AI submissions in bugbounties and other AI tool that review all of their PRs).

At the end of the day, when your harness doesn't return any bug, you're left wondering "does it mean there's no bugs?" We're basically back in this reputation game, where you want to use the best tool, or the best team (that knows what the best tools are), and need to figure out which one is.

reply

upvote

by richardbarosky21 hours ago|

[-]

To be sure, security is an amazing AI/LLM use case. A huge swath of the work is pattern matching known security issues against stuff that's very precise to analyze -- programming language text.

Something that stands out is that for the strongest use cases, AI companies will prefer to sell the technique as a service rather than its raw output. For use cases where the output is less valuable, tokens are sold. If AI tokens were so magical in creating new value in developing software applications generally, they wouldn't be selling tokens directly. They'd hoard the tokens are use them to dominate SaaS software in any industry they want.

The same way as someone selling an expensive course in the stock market is signaling that they have more to gain by selling the course rather than taking their knowledge and making money in the stock market directly.

reply

upvote

by dgellow21 hours ago|

[-]

> The same way as someone selling an expensive course in the stock market is signaling that they have more to gain by selling the course rather than

Or they want to diversify

> If AI tokens were so magical in creating new value in developing software applications generally, they wouldn't be selling tokens directly.

That requires to build and sell a whole product they have little experience with, competing with their own customers. Not a great place for an AI vendor still trying to establish itself. It’s a lot of distraction, when you already have a lot to deal with the existing business. And strategically not too valuable

reply

upvote

by kenjackson13 hours ago|

[-]

What market is hotter than AI models? Do you think their energy would be better making games or image editing software?

reply

upvote

by dgellow12 hours ago|

[-]

No, I’m saying the opposite

reply

upvote

by Kiro20 hours ago|

[-]

> They'd hoard the tokens are use them to dominate SaaS software in any industry they want.

I don't understand this argument. I've ran and sold a semi-successful SaaS. The exhausting and frustrating parts are all the things an LLM cannot help you with. Coding the product is not the bottleneck or what grants you success.

reply

upvote

by zuzululu20 hours ago|

[-]

Good point but I do think LLM helps with those frustrating parts while not being able to outright solve them.

reply

upvote

by richardbarosky20 hours ago|

[-]

> Coding the product is not the bottleneck or what grants you success.

Agree, and I think that's the core of my point.

Not that it's irrational or doesn't make sense to sell tokens for purposes of software dev, but that if tokens were a true game changer for success in software dev, they wouldn't be leading with token sales, the same way they're not leading with token sales for security stuff -- it's more like "Contact Sales".

reply

upvote

by hyperpape20 hours ago|

[-]

> If AI tokens were so magical in creating new value in developing software applications generally, they wouldn't be selling tokens directly. They'd hoard the tokens are use them to dominate SaaS software in any industry they want.

This doesn't follow at all. Anthropic's revenue is growing 10x year over year selling tokens. Their tokens can be super magical, let them enter established industries and displace incumbents, and get 100% annual growth in those industries, and they would still be better off prioritizing selling tokens, because it's a great business.

What your argument shows is that there are limits. Their tokens are not quite powerful enough to make infinite money instantly in every area of software. Admittedly, that does seem true.

reply

upvote

by morpheos13718 hours ago|

[-]

kind of funny tokens don't prompt and steer themselves. it almost as if the value still lies with the human holding the tool.

reply

upvote

by latentsea7 hours ago|

[-]

They kinda do though, that's sort of how agents work. At least that's how it's always felt to me.

reply

upvote

by morpheos1374 hours ago|

[-]

what is doing the steering is the weights of the words that came before in context. there is no agent or agency. if your problems need median effort and are well represented in shape in the corpus then agents may work well. true inovation is impossible without careful prompting, wherein the agent becomes an associative engine (kind of a smart search engine) and you the human become the manager of the process.

reply

upvote

by skybrian21 hours ago|

[-]

Maybe, but an alternative argument that building an ecosystem is more valuable in the long run.

We started out with many companies forbidding their employees to use remote LLMs on their source code because of security concerns. Now many companies are starting to believe that they must analyze their all their source code with remote LLMs because of security concerns. When trusting Anthropic becomes normalized, that means they can sell more services that require access to the source code.

reply

upvote

by Melatonic20 hours ago|

[-]

Surprised we havent gotten an integrated "MetaSploit" AI update where it calls and messages a ton of people in a company and once it starts to find someone possibly vulnerable lets a human red teamer take over or guide it more by hand.

reply

upvote

by therealdrag017 hours ago|

[-]

Isn’t this analogous to saying if farming equipment is so productive why doesn’t John Deer hoard all the tractors and do the farming themselves?

reply

upvote

by derf_19 hours ago|

[-]

> If AI tokens were so magical in creating new value in developing software applications generally, they wouldn't be selling tokens directly.

If hardware were so magical in creating new value generally, TSMC would be designing the chips instead of selling fabrication as a service.

That is what US chip companies used to do, by the way (back when there was silicon in Silicon Valley, before they got their lunch eaten by Taiwan). If TSMC had to design all of the chips they fabricate now, they would be doing a lot less business. Conversely, if any other company that wanted to design a chip had to build their own cutting-edge fab first, NVIDIA would not exist.

reply

upvote

by energy12321 hours ago|

[-]

They can only do that if they're a monopoly, which they're not

reply

upvote

by DrewADesign20 hours ago|

[-]

> They can only do that if they're a monopoly, which they're not

Why do you say that? I reckon lots and lots of companies sell software that aren’t monopolies. Having competition, even stiff competition, isn’t anathema to running a business.

reply

upvote

by energy12320 hours ago|

[-]

You said "They wouldn't be selling tokens directly ... They'd hoard them"

But they can't do that because they aren't monopolies.

reply

upvote

by DrewADesign18 hours ago|

[-]

> You said

Just to clarify, I’m not the person you initially replied to.

> "They wouldn't be selling tokens directly ... They'd hoard them" But they can't do that because they aren't monopolies.

Hoarding them— not selling any of them, but instead using them internally and selling the products created by them — doesn’t at all seem like it would require a monopoly.

reply

upvote

by dclavijo19 hours ago|

[-]

Sligthly off topic: it seems that someone is in a dead/flag rampage killing all good links to Github in this post, why?

reply

upvote

by majicDave19 hours ago|

[-]

It will always be easier to find a single hole than it will be to seal every one. The hackers have all the same tools, so this is an arms race that cannot be won.

reply

upvote

by napoleond18 hours ago|

[-]

It seems clear that LLMs significantly change threat model math, but this observation alone does not explain how or why; the asymmetry that you’re describing is a property of pre-LLM software as well.

reply

upvote

by DrewADesign16 hours ago|

[-]

Same ratio of imbalance, just with matching multipliers distributed to each side, and everybody is probably worse off because of it: I cite post-LLM-ATS hiring/job hunting.

reply

upvote

by lateral_cloud18 hours ago|

[-]

Defenders have context that attackers don't though.

reply

upvote

by leetrout1 hours ago|

[-]

Ran this last night and it correctly identified a sql injection that could allow cross tenant data access via snowflake. It burnt A LOT of tokens to get there.

Like others I suspect this is exactly what they are going to paywall with product features going forward.

reply

upvote

by bobkb19 hours ago|

[-]

Very interesting.

I have working on and using a similar tool for a while now :

https://github.com/bobinson/vulture

I have been struggling with false positives and using Claude + MCP as a poor man’s audit tool. As of last few days found better result with nvidia hosted models.

reply

upvote

by cpard17 hours ago|

[-]

It’s clear that Anthropic is building harnesses for specific use cases now and turns them into products.

This is the equivalent of Claude Design but for security.

Different harness, different packaging and obviously different distribution because the persona is different.

It’s funny because from all the posts I’ve read from companies reporting on Mythos, everyone is building their own harness for it.

Cisco even published a specification for one.

But Anthropic is the one who has figured out how to package and distribute this. Great GTM!

reply

upvote

by ElijahLynn16 hours ago|

[-]

This post is misleading and so is the GitHub org. Anthropics vs Anthropic.

reply

upvote

by Zetaphor13 hours ago|

[-]

That is their actual account. We have this discussion every time they post something sadly

reply

upvote

by ElijahLynn3 hours ago|

[-]

Oh, bummer. That is really confusing.

reply

upvote

by sciencejerk13 hours ago|

[-]

This isn't as useful as it sounds, unless we know that Claude efficiently spends tokens using this harness

reply

upvote

by madduci13 hours ago|

[-]

"This repo is not maintained and is not accepting contributions."

Nice

reply

upvote

by newaccount1234418 hours ago|

[-]

Let's see how better it is in comparison to ZAP and Burp. I will test on https://github.com/SasanLabs/VulnerableApp which i built under SasanLabs

reply

upvote

by trilogic21 hours ago|

[-]

https://github.com/Mainframework/Anthropic-Cybersecurity-Ski...

Be aware: the .py/s will not pass the antivirus but basically they do the job.

reply

upvote

by 20 hours ago|

[-]

deleted

reply

upvote

by sylware3 hours ago|

[-]

I don't trust it and I cannot test it (gated by what ng cartel web engines).

reply

upvote

by LazyR3nR3n9 hours ago|

[-]

This is a good addition tool for people are in the security Practitioners. To save time for hunting vulnerability.

reply

upvote

by bigmattystyles21 hours ago|

[-]

I wonder how this sort of product is going over at Coverity and others like it. Proper SAST vendors I mean. Is it an existential threat?

reply

upvote

by rms2ds20 hours ago|

[-]

If I had to guess, they'l eventually just add it into their own product and hike the prices up to cover tokens lol.

reply

upvote

by ElijahLynn16 hours ago|

[-]

Anthropics vs Anthropic.

That repo is Anthropics.

This post title should clarify that it is not Anthropic (no "s").

reply

upvote

by olcay_16 hours ago|

[-]

Anthropics is Anthropic's user name on GitHub

reply

upvote

by edot16 hours ago|

[-]

Anthropic, no s, is owned by some Australian guy.

reply

upvote

by sumedh8 hours ago|

[-]

I wonder if he is using Anthropic's claude code to work on his Anthropic Github account.

reply

upvote

by ElijahLynn3 hours ago|

[-]

TIL, thank you

reply

upvote

by gulbanana16 hours ago|

[-]

`anthropics` is Anthropic's GitHub username.

reply

upvote

by SubiculumCode16 hours ago|

[-]

The last time this Anthropic GitHub got posted I made a similar comment.

reply

upvote

by euroderf19 hours ago|

[-]

Is Anthropic still majority French-owned? It would explain a lot about their entire approach to the wider ecosystem.

reply

upvote

by Yokohiii14 hours ago|

[-]

Seems like you are confusing crack and tar as a healthy breakfast.

reply

upvote

by eranation16 hours ago|

[-]

If anyone wonders how much it can cost to run scans like this on your entire codebase with SOTA models: https://ai-cost-calculator.arnica.io

tl;dr - not that it's surprising, but it's not cheap, especially if you want to do this continuously.

reply

upvote

by extr20 hours ago|

[-]

Interesting it's in python!

reply

upvote

by zoobab20 hours ago|

[-]

Open source crap to connect to an LLM blob.

reply

upvote

by bartoszcki20 hours ago|

[-]

> Anthropic engineers on average ship 8x as much code per quarter

Are they making 8x more features or the same amount just with more code?

reply

upvote

by crooked-v20 hours ago|

[-]

Going by the issues on their repos, it's 2x features and 6x regressions of bugs that were "already fixed".

reply

upvote

by crooked-v20 hours ago|

[-]

I still find it so weird that they haven't bought out whoever controls the `anthropic` github username.

reply

upvote

by napsterbr17 hours ago|

[-]

Or hacked them...

reply

upvote

by wslh20 hours ago|

[-]

Looking forward to trying this tomorrow (it's late here). Has anyone run it on a real codebase yet? Curious about setup friction, cost, and signal/noise.

reply

upvote

by 20 hours ago|

[-]

deleted

reply

upvote

by Maya_Andersson52 minutes ago|

[-]

[dead]

reply

upvote

by Xotic0073 hours ago|

[-]

[flagged]

reply

upvote

by volume_tech4 hours ago|

[-]

[flagged]

reply

upvote

by sspoisk6 hours ago|

[-]

[flagged]

reply

upvote

by eddysir5 hours ago|

[-]

[flagged]

reply

upvote

by xuzhenpeng17 hours ago|

[-]

[flagged]

reply

upvote

by aos_architect5 hours ago|

[-]

[flagged]

reply

upvote

by EvanXue15 hours ago|

[-]

[flagged]

reply

upvote

by notenkidev14 hours ago|

[-]

[flagged]

reply

upvote

by afford-ai16 hours ago|

[-]

[flagged]

reply

upvote

by 5 hours ago|

[-]

deleted

reply

upvote

by edgardurand18 hours ago|

[-]

[flagged]

reply

upvote

by continueops_com7 hours ago|

[-]

[flagged]

reply

upvote

by xinchen0314 hours ago|

[-]

[dead]

reply

upvote

by vladsiu13 hours ago|

[-]

[dead]

reply

upvote

by jungfty20 hours ago|

[-]

[dead]

reply

upvote

by dclavijo20 hours ago|

[-]

[dead]

reply

upvote

by zoobab20 hours ago|

[-]

'open source' crap to connect to their LLM blob.

reply