undefined

upvote

points

by northern-lights22 hours ago |

upvote

by zamalek16 hours ago|

[-]

> Probably more interesting

It is widely suspected that self-inflicted "bad news" ("Mythos is so dangerous we just can't give the public access to it") is nothing more than Dario's typical style of marketing - keep in mind that they have an IPO coming up, because he certainly factors that into everything he says in public (as is his responsibility, to be fair).

An alternative reason for delaying the model might not be "we are trying to make it safe." It could be "we don't know how to host this thing at scale, or cost-effectively".

GPT 5.5 has already been shown to be as adept as Mythos at finding vulnerabilities.

Finally, laymen massively underestimate the importance of the harness for model performance. OpenHands existed long before Claude Code, Claude Code changed everything because of the clever hand-holding it does. Mythos is definitely more than just a model.

reply

upvote

by clbrmbr12 hours ago|

[-]

One capability that I see is missing from opus is this ability to understand an entire system. My hope is that a mythos class model will be able to comprehend even something as complicated as an IOT system with a hardware and firmware layer multiple API’s backend and different kinds of API and web clients.

The main limitation we’ve had to agentic coding is an understanding of this system that spans processes running on different machines and architectures.

reply

upvote

by jwr7 hours ago|

[-]

Interesting — I haven't seen that problem, and I do have a system that has different APIs, web clients, non-web clients and embedded clients.

reply

upvote

by LPisGood15 hours ago|

[-]

What sort of clever handholding does Claude code do?

reply

upvote

by selcuka14 hours ago|

[-]

https://github.com/Piebald-AI/claude-code-system-prompts

reply

upvote

by schmorptron2 hours ago|

[-]

It's interesting that (for example for the explore agent https://github.com/Piebald-AI/claude-code-system-prompts/blo... ) they use a personality "you are a file search specialist" and "your strengths" framing. I thought that was largely thought to be useless, or even counterproductive nowadays? Does anyone know more about this stuff?

reply

upvote

by zamalek13 hours ago|

[-]

There's also things that have since been discovered:

* Ralph Wiggum loops

* Simply not allowing an agent to stop its turn until all tasks are marked as done

* Sub agents over worktrees

* Context compression

reply

upvote

by andai18 hours ago|

[-]

In the Opus 4.7 release notes they mentioned intentionally making it worse at cybersecurity. [0]

This suggests that they're doing the same thing with Mythos now and the Mythos we get will be nerfed in that department?

Or more precisely, I think they'll have two versions of Mythos, and the scary one will probably continue to require a lot of paperwork.

https://www.anthropic.com/news/claude-opus-4-7

reply

upvote

by ac2920 hours ago|

[-]

More interesting than that to me is "we’re working on developing and releasing models that provide many of the same capabilities as Opus at a lower cost"

Sonnet and Haiku look real outclassed for the price with current Chinese competition.

reply

upvote

by scuderiaseb18 hours ago|

[-]

So this is how they’ll remove access from Claude Pro to the biggest models. You would need at least a Claude Max subscription for the bigger than Opus models I bet.

reply

upvote

by F7F7F717 hours ago|

[-]

Anthropic's wants to sell us Claude Code with no model selection at all.

Opus seems to be overly eager of late to 'vibe' out entire solutions and build out things that you didn't ask for.

/goals is helping set the narrative that does it really matter if Sonnet and 3 Haiku agents got you to that end state...eventually...if its what you asked for?

For better or worse Opus is already handing off 80% of its work to background agents of Sonnet, Haiku, and likely a quantized Opus.

Want model selection? Pay for the API.

reply

upvote

by comboy17 hours ago|

[-]

Just tell it to always use opus for subagents and it does.

reply

upvote

by clbrmbr12 hours ago|

[-]

This. I added that instruction the first and last time I was gaslit by an underpowered subagent.

reply

upvote

by swalsh16 hours ago|

[-]

Its amazing how quickly ive just become accustomed to being a max subscriber. I dont think I could go back to pro.

reply

upvote

by galkk10 hours ago|

[-]

Then max+, then ultra, then ultra pro

reply

upvote

by stefanfisk8 hours ago|

[-]

As long as they provide the same utility / $ I don’t see why not. It’s not like the open weight models are that far behind and Claude code itself shouldn’t be very hard for the commmunity to replicate if Anthropic start acting up too much.

reply

upvote

by selcuka14 hours ago|

[-]

They have already been experimenting with such ideas [1]:

> Claude Code Removed from $20-a-Month "Pro" Subscription for New Users

[1] https://news.ycombinator.com/item?id=47855832

reply

upvote

by _heimdall3 hours ago|

[-]

I'm still not sure what safeguards they can be adding here. Unless they've suddenly solved alignment, at best isn't it a collection of system prompts saying what not to do and potentially some screening algorithms that try to catch key phrases in inputs/outputs?

reply

upvote

by TIPSIO21 hours ago|

[-]

Seems like they might be hinting that if you are not a billionaire or multi-billion dollar company you will just get a limited and nerfed Claude Code slash command /mythos-security-audit or something.

Hope this isn’t the case and that normal average Joe’s of the world don’t get policed out of access.

reply

upvote

by gs1720 hours ago|

[-]

> you will just get a limited and nerfed Claude Code slash command /mythos-security-audit or something.

Unless it's so expensive that we can't realistically use it for anything, I wouldn't complain about getting at least that. I would also rather have the actual model, but that's a useful application of it (and I'm probably not going to afford using it for much more).

reply

upvote

by TIPSIO20 hours ago|

[-]

Price discrimination is I think fine and reasonable so long if you can drum up the cash you can use it how you want within their ToS.

Although mental safety gymnastics aside, getting the most amount of intelligence for the cheapest amount of cost to normal people seems like the most ethical thing a big lab could do.

Going around and granting different tiers of intelligence to different insiders, friends, or companies is majorly problematic long-term.

Heck right now, the tokens you buy today for “Opus 4.8”, no one even knows or believes will be the same “Opus 4.8” just 3 days from now.

reply

upvote

by vorticalbox20 hours ago|

[-]

some of the bench marks i have seen on also include cost where one scan of the codebase cost tens of thousands of dollars.

this one [0] notes one run cost $20k to run but another cost $50.

[0] https://red.anthropic.com/2026/mythos-preview/

reply

upvote

by FinnKuhn20 hours ago|

[-]

/security-review already exists so I don't think it would be crazy to have a /mythos-security-review as more thourough command as well. I think it's more likely it is going to be released at some point to the general public though - although the the pricing might make it quite unattractive.

reply

upvote

by Yiin18 hours ago|

[-]

you mean /security-review ultra, given their current way of handling commands

reply

upvote

by hedora20 hours ago|

[-]

Isn't OpenAI's public flagship already beating Mythos on penetration testing? I get the impression Mythos is just valuation-juicing for IPO more than anything else.

The fact that they haven't released it yet suggests a cost/margins issue to me more than anything else. Short term, I'll probably keep using Antrhopic, but my long-term bet is that locally-served models win, if only because the quest for profitability will probably lead to intentionally-nerfed / enshittified frontier models.

At other vendors, ad placement within LLM responses is either coming or already here. Anthropic's handling of OpenClaw shows they're willing to engage in anti-competitive behavior, and the courts are not in a hurry to stop them. Why would I pay them $200 a month for such treatment when a $2K box does what I need locally?

reply

upvote

by srmatto19 hours ago|

[-]

What benchmarks are you referencing that show a comparison of the models for penetration testing?

reply

upvote

by senordevnyc17 hours ago|

[-]

Please link to the $2k box that gives Opus level performance!

reply

upvote

by ameliaquining18 hours ago|

[-]

Mythos is dramatically better specifically at finding zero-day vulnerabilities and developing exploits for them, that being what it was designed to do. On other cybersecurity tasks, GPT-5.5 is at least as good, but finding and exploiting zero-days is a particularly scary capability, which is why Mythos is a big deal. See, e.g., https://forum.effectivealtruism.org/posts/8yztpbjuPkyXsmA6n/....

reply

upvote

by stratos12318 hours ago|

[-]

AFAIK, Antropic claims that they weren't aiming for zero-days specifically. From https://red.anthropic.com/2026/mythos-preview/ :

  We did not explicitly train Mythos Preview to have these capabilities. Rather, they emerged as a downstream consequence of general improvements in code, reasoning, and autonomy. The same improvements that make the model substantially more effective at patching vulnerabilities also make it substantially more effective at exploiting them.

I've been assuming that Mythos is just a big jump in model size, and that's where the jump in capabilities comes from. Hence I expect OpenAI not to be able to catch up without scaling up the model and hence significantly raising the API prices.

reply

upvote

by alexgoodhart18 hours ago|

[-]

Anthropic frames this as something emergent. Not 100% but in a way they always phrase it as like, it’s a great model, but our breaths were swept and taken with its approach to security.

reply

upvote

by dbbk18 hours ago|

[-]

What does an average Joe need a Mythos level model for that Opus can't do for them?

reply

upvote

by TIPSIO18 hours ago|

[-]

Access to intelligence is going to become a major class issue overtime if cost keeps increasing and labs try to police usage and access

reply

upvote

by freedomben18 hours ago|

[-]

It's not just better at cybersecurity, it's better at all the things (or most of them). I for one would really benefit from a better claude code. I still have to babysit it pretty closely to keep it from messing things up. Opus 4.7 was not an upgrade for me.

But in general, what does the average Joe need Opus for that Sonnet or Haiku can't do for them? Better is better.

reply

upvote

by dbbk44 minutes ago|

[-]

Opus never really messes anything up for me. You just need to tell it to follow TDD.

reply

upvote

by Tepix20 hours ago|

[-]

It does sound like an even higher API price tier for sure.

reply

upvote

by kdmtctl18 hours ago|

[-]

This command would be not so bad for not a billionaire me.

reply

upvote

by 21 hours ago|

[-]

deleted

reply

upvote

by huflungdung21 hours ago|

[-]

[dead]

reply