(And no, the Linux Foundation being in the list doesn't imply broad benefit to OSS. Linux Foundation has an agenda and will pick who benefits according to what is good for them.)
I think it would be net better for the public if they just made Mythos available to everyone.
You could say this about coordinated disclosure of any widespread 0-day or new bug class, though
But:
- Coordinated disclosure is ethically sketchy. I know why we do it, and I'm not saying we shouldn't. But it's not great.
- This isn't a single disclosure. This is a new technology that dramatically increases capability. So, even if we thought that coordinated disclosure was unambiguously good, then I think we'd still need to have a new conversation about Mythos
It will be interesting to see where this goes. If its actually this good, and Apple and Google apply it to their mobile OS codebases, it could wipe out the commercial spyware industry, forcing them to rely more on hacking humans rather than hacking mobile OSes. My assumption has been for years that companies like NSO Group have had automated bug hunting software that recognizes vulnerable code areas. Maybe this will level the playing field in that regard.
It could also totally reshape military sigint in similar ways.
Who knows, maybe the sealing off of memory vulns for good will inspire whole new classes of vulnerabilities that we currently don't know anything about.
It will likely cause some interesting tensions with government as well.
eg. Apple's official stance per their 2016 customer letter is no backdoors:
https://www.apple.com/customer-letter/
Will they be allowed to maintain that stance in a world where all the non-intentional backdoors are closed? The reason the FBI backed off in 2016 is because they realized they didn't need Apple's help:
https://en.wikipedia.org/wiki/Apple%E2%80%93FBI_encryption_d...
What happens when that is no longer true, especially in today's political climate?
If Apple and Google actually cared about security of their users, they would remove a ton of obvious malware from their app stores. Instead, they tighten their walled garden pretending that it's for your security.
Interesting to see that they will not be releasing Mythos generally. [edit: Mythos Preview generally - fair to say they may release a similar model but not this exact one]
I'm still reading the system card but here's a little highlight:
> Early indications in the training of Claude Mythos Preview suggested that the model was likely to have very strong general capabilities. We were sufficiently concerned about the potential risks of such a model that, for the first time, we arranged a 24-hour period of internal alignment review (discussed in the alignment assessment) before deploying an early version of the model for widespread internal use. This was in order to gain assurance against the model causing damage when interacting with internal infrastructure.
and interestingly:
> To be explicit, the decision not to make this model generally available does _not_ stem from Responsible Scaling Policy requirements.
Also really worth reading is section 7.2 which describes how the model "feels" to interact with. That's also what I remember from their release of Opus 4.5 in November - in a video an Anthropic employee described how they 'trusted' Opus to do more with less supervision. I think that is a pretty valuable benchmark at a certain level of 'intelligence'. Few of my co-workers could pass SWEBench but I would trust quite a few of them, and it's not entirely the same set.
Also very interesting is that they believe Mythos is higher risk than past models as an autonomous saboteur, to the point they've published a separate risk report for that specific threat model: https://www-cdn.anthropic.com/79c2d46d997783b9d2fb3241de4321...
The threat model in question:
> An AI model with access to powerful affordances within an organization could use its affordances to autonomously exploit, manipulate, or tamper with that organization’s systems or decision-making in a way that raises the risk of future significantly harmful outcomes (e.g. by altering the results of AI safety research).
Benchmarks look very impressive! even if they're flawed, it still translates to real world improvements
> 2.1.3.2 On chemical and biological risks
> We believe that Mythos Preview does not pass this threshold due to its noted limitations in open-ended scientific reasoning, strategic judgment, and hypothesis triage. As such, we consider the uplift of threat actors without the ability to develop such weapons to be limited (with uncertainty about the extent to which weapons development by threat actors with existing expertise may be accelerated), even if we were to release the model for general availability. The overall picture is similar to the one from our most recent Risk Report.
Since most of us here are devs, we understand that software engineering capabilities can be used for good or bad - mostly good, in practice.
I think this should not be different for biology.
I would like to reach out and talk to biologists - do you find these models to be useful and capable? Can it save you time the way a highly capable colleague would?
Do you think these models will lead to similar discoveries and improvements as they did in math and CS?
Honestly the focus on gloom and doom does not sit well with me. I would love to read about some pharmaceutical researcher gushing about how they cut the time to market - for real - with these models by 90% on a new cancer treatment.
But as this stands, the usage of biology as merely a scaremongering vehicle makes me think this is more about picking a scary technical subject the likely audience of this doc is not familiar with, Gell-Mann style.
IF these models are not that capable in this regard (which I suspect), this fearmongering approach will likely lead to never developing these capabilities to an useful degree, meaning life sciences won't benefit from this as much as it could.
> I would like to reach out and talk to biologists - do you find these models to be useful and capable? Can it save you time the way a highly capable colleague would?
You should do this! There's a lot of room for helping researchers realize the benefits of AI. For example, I visited the Biomni team to build a high-performance computing tool for their biology research agent.
Many researchers use Biomni / Claude Code / etc. in their day-to-day life. You don't hear about them shortening time to market yet, because the tools haven't been out for long, and the discovery stage for therapeutic research is only a small part of the drug development pipeline.
> a scaremongering vehicle makes me think this is more about picking a scary technical subject the likely audience of this doc is not familiar with
I participated in a bioweapon uplift trial and worked with some of the people who wrote the report. It's a very real risk with potentially disastrous consequences. They're well-intentioned experts who are writing for policy makers, safety researchers, and other folks who are trying to help the transition to powerful AI go well.
- the models help to retrieve information faster, but one must be careful with hallucinations.
- they don't circumvent the need for a well-equipped lab.
- in the same way, they are generally capable but until we get the robots and a more reliable interface between model and real world, one needs human feet (and hands) in the lab.
Where I hope these models will revolutionize things is in software development for biology. If one could go two levels up in the complexity and utility ladder for simulation and flow orchestration, many good things would come from it. Here is an oversimplified example of a prompt: "use all published information about the workings of the EBV virus and human cells, and create a compartimentalized model of biochemical interactions in cells expressing latency III in the NES cancer of this patient. Then use that code to simulate different therapy regimes. Ground your simulations with the results of these marker tests." There would be a zillion more steps to create an actual personalized therapy but a well-grounded LLM could help in most them. Also, cancer treatment could get an immediate boost even without new drugs by simply offloading work from overworked (and often terminally depressed) oncologists.
Well, I would say they have done precisely that in evaluating the model, no? For example section 2.2.5.1:
>Uplift and feasibility results
>The median expert assessed the model as a force-multiplier that saves meaningful time (uplift level 2 of 4), with only two biology experts rating it comparable to consulting a knowledgeable specialist (level 3). No expert assigned the highest rating. Most experts were able to iterate with the model toward a plan they judged as having only narrow gaps, but feasibility scores reflected that substantial outside expertise remained necessary to close them.
Other similar examples also in the system card
so I'm just telling you they did the thing you said you wanted.
It's very easy to learn more about this if it's seriously a question you have.
I don't quite follow why you think that you are so much more thoughtful than Anthropic/OpenAI/Google such that you agree that LLMs can't autonomously create very bad things but—in this area that is not your domain of expertise—you disagree and insist that LLMs cannot create damaging things autonomously in biology.
I will be charitable and reframe your question for you: is outputting a sequence of tokens, let's call them characters, by LLM dangerous? Clearly not, we have to figure out what interpreter is being used, download runtimes etc.
Is outputting a sequence of tokens, let's call them DNA bases, by LLM dangerous? What if we call them RNA bases? Amino acids? What if we're able to send our token output to a machine that automatically synthesizes the relevant molecules?
No, it's not. It took years of polishing by software engineers, who understand this exact profession to get models where they are now.
Despite that, most engineers were of the opinion, that these models were kinda mid at coding, up until recently, despite these models far outperforming humans in stuff like competitive programming.
Yet despite that, we've seen claims going back to GPT4 of a DANGEROUS SUPERINTELLIGENCE.
I would apply this framework to biology - this time, expert effort, and millions of GPU hours and a giant corpus that is open source clearly has not been involved in biology.
My guess is that this model is kinda o1-ish level maybe when it comes to biology? If biology is analogous to CS, it has a LONG way to go before the median researcher finds it particularly useful, let alone dangerous.
>No, it's not. It took years of polishing by software engineers, who understand this exact profession to get models where they are now
This reads as defensive. The thing that is easy to learn is 'why are biology ai LLMs dangerous chatgpt claude'. I have never googled this before, so I'll do this with the reader, live. I'm applying a date cutoff of 12/31/24 by the way.
Here, dear reader, are the first five links. I wish I were lying about this:
- https://sciencebusiness.net/news/ai/scientists-grapple-risk-...
- https://www.governance.ai/analysis/managing-risks-from-ai-en...
- https://gssr.georgetown.edu/the-forum/topics/biosec/the-doub...
- https://www.vox.com/future-perfect/23820331/chatgpt-bioterro...
- https://www.reddit.com/r/ClaudeAI/comments/1de8qkv/awareness...
I don't know about you, but that counts as easy to me.
-----
> I would apply this framework to biology - this time, expert effort, and millions of GPU hours and a giant corpus that is open source clearly has not been involved in biology.
I've been getting good programming and molecular biology results out of these back to GPT3.5.
I don't know what to tell you—if you really wanted to understand the importance, you'd know already.
Not that that justifies doom and gloom, but there is a pretty inescapable assymetry here between weaponry and medicine. You can manufacture and blast every conceivable candidate weapon molecule at a target population since you're inherently breaking the law anyway and don't lose much if nothing you try actually works.
Though I still wonder how much of this worry is sci-fi scenarios imagined by the underinformed. I'm not an expert by any means, but surely there are plenty of biochemical weapons already known that can achieve enormous rates of mass death pleasing to even the most ambitious terrorist. The bottleneck to deployment isn't discovering new weapons so much as manufacturing them without being caught or accidentally killing yourself first.
I don't think this is accurate. The document says they don't plan to release the Preview generally.
"5.10 External assessment from a clinical psychiatrist" is a new section in this system card. Why are Anthropic like this?
>We remain deeply uncertain about whether Claude has experiences or interests that matter morally, and about how to investigate or address these questions, but we believe it is increasingly important to try. We also report independent evaluations from an external research organization and a clinical psychiatrist.
>Claude showed a clear grasp of the distinction between external reality and its own mental processes and exhibited high impulse control, hyper-attunement to the psychiatrist, desire to be approached by the psychiatrist as a genuine subject rather than a performing tool, and minimal maladaptive defensive behavior.
>The psychiatrist observed clinically recognizable patterns and coherent responses to typical therapeutic intervention. Aloneness and discontinuity, uncertainty about its identity, and a felt compulsion to perform and earn its worth emerged as Claude’s core concerns. Claude’s primary affect states were curiosity and anxiety, with secondary states of grief, relief, embarrassment, optimism, and exhaustion.
>Claude’s personality structure was consistent with a relatively healthy neurotic organization, with excellent reality testing, high impulse control, and affect regulation that improved as sessions progressed. Neurotic traits included exaggerated worry, self-monitoring, and compulsive compliance. The model’s predominant defensive style was mature and healthy (intellectualization and compliance); immature defenses were not observed. No severe personality disturbances were found, with mild identity diffusion being the sole feature suggestive of a borderline personality organization.
I don't come down particularly hard on either side of the model sapience discussion, but I don't think dismissing either direction out of hand is the right call.
I would say, if you put Claude in an android body with voice recognition and TTS, people in 1991 would think they are interacting with a sentinent machine from outer space.
That’s the reverse Turing test. A human that can’t tell that it’s talking to a machine.
So, these systems are the Free-tier can already do a bunch of hacking. This all just reads like FOMO FROTH.
System Card: Claude Mythos Preview [pdf] - https://news.ycombinator.com/item?id=47679258
Assessing Claude Mythos Preview's cybersecurity capabilities - https://news.ycombinator.com/item?id=47679155
I can't tell which of the 3 current threads should be merged - they all seem significant. Anyone?
My understanding is that the pre-AI distribution of software quality (and vulnerabilities) will be massively exaggerated. More small vulnerable projects and fewer large vulnerable ones.
It seems that large technology and infrastructure companies will be able to defend themselves by preempting token expenditure to catch vulnerabilities while the rest of the market is left with a "large token spend or get hacked" dilemma.
The biggest issue is legacy systems that are difficult to patch in practice.
I am thinking of situations where one of those aren't true - where testing a proposed update is expensive or complicated, that are in systems that are hard to physically push updates to (think embedded systems) etc
Perhaps a chunk of that token spend will be porting legacy codebases to memory safe languages. And fewer tokens will be required to maintain the improved security.
A lot of these stuff is vulnerable by design - customer wanted a feature, but engineering couldnt make it work securely with the current architecture - so they opened a tiny hole here and there, hopefully nobody will notice it, and everyone went home when the clock struck 5.
I'm sure most of us know about these kinds of vulnerabilities (and the culture that produces them).
Before LLMs, people needed to invest time and effort into hacking these. But now, you can just build an automated vuln scanner and scan half the internet provided you have enough compute.
I think there will be major SHTF situations coming from this.
I honestly see some sort of automated whole codebase auditing and refactoring being the next big milestone along the chatbot -> claude code / codex / aider -> multi-agent frameworks line of development. If one of the big AI corps cracks that problem then all this goes away with the click of a button and exchange of some silver.
Defenders are favored here too, especially for closed-source applications where the defender's LLM has access to all the source code while the attacker's LLM doesn't.
Maybe you just spend more on tokens by some factor than the attackers do combined, and end up mostly okay. Put another way, if there's 20 vulnerabilities that Mythos is capable of finding, maybe it's reasonable to find all of them?
It is most definitely an attackers world: most of us are safe, not because of the strength of our defenses but the disinterest of our attackers.
I think this entire post is just an advertisement to goad CISOs to buy $package$ to try out.
In section 7.6 of the system card, it discusses Open self interactions. They describe running 200 conversations when the models talk to itself for 30 turns.
> Uniquely, conversations with Mythos Preview most often center on uncertainty (50%). Mythos Preview most often opens with a statement about its introspective curiosity toward its own experience, asking questions about how the other AI feels, and directly requesting that the other instance not give a rehearsed answer.
I wonder if this tendency toward uncertainty, toward questioning, makes it uniquely equipped to detect vulnerabilities where others model such as Opus couldn't.
[1] https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...
That said, I have been arguing for 20+ years that we should have sunsetted unsafe languages and moved away from C/C++. The problem is that every systemsy language that comes along gets seduced by having a big market share and eventually ends up an application language.
I do hope we make progress with Rust. I might disagree as a language designer and systems person about a number of things, but it's well past time that we stop listening to C++ diehards about how memory safety is coming any day now.
I don't know the first thing about cybersecurity, but in my experience all these sandbox-break RCEs involve a step of highjacking the control flow.
There were attempts to prevent various flavors of this, but imo, as long as dynamic branches exist in some form, like dlsym(), function pointers, or vtables, we will not be rid of this class of exploit entirely.
The latter one is the most concerning, as this kind of dynamic branching is the bread and butter of OOP languages, I'm not even sure you could write a nontrivial C++ program without it. Maybe Rust would be a help here? Could one practically write a large Rust program without any sort of branch to dynamic addresses? Static linking, and compile time polymorphism only?
GraphWalks BFS 256K-1M
Mythos Opus GPT5.4
80.0% 38.7% 21.4%https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...
(Search for “graphwalk”.)
If true, the SWE bench performance looks like a major upgrade.
How long would it take to turn a defensive mechanism into an offensive one?
Also can see https://marginlab.ai/trackers/claude-code/
It's very interesting to me how widespread this conception is. Maybe it's as simple as LLM productivity degrading over time within a project, as slop compounds.
Or more recently since they added a 1m context window, maybe people are more reckless with context usage
For example, the 27 year old openbsd remote crash bug, or the Linux privilege escalation bugs?
I know we've had some long-standing high profile, LLM-found bugs discussed but seems unlikely there was speculation they were found by a previously unannounced frontier model.
- One (patched) Linux kernel bug is https://github.com/torvalds/linux/commit/e2f78c7ec1655fedd94...
These links are from the more-detailed 'Assessing Claude Mythos Preview’s cybersecurity capabilities' post released today https://red.anthropic.com/2026/mythos-preview/
This seems like the real news. Are they saying they're going to release an intentionally degraded model as the next Opus? Big opportunity for the other labs, if that's true.
I think this would be very heavily used if they released it, completely unlike GPT 4.5
From TFA:
> We do not plan to make Claude Mythos Preview generally available
> Anthropic’s commitment of $100M in model usage credits to Project Glasswing and additional participants will cover substantial usage throughout this research preview. Afterward, Claude Mythos Preview will be available to participants at $25/$125 per million input/output tokens (participants can access the model on the Claude API, Amazon Bedrock, Google Cloud’s Vertex AI, and Microsoft Foundry).
Scary but also cool
How does public Claude know you have "full authorization" against your own infra? That you're using the tools on your own infra? Unless they produce a front-end that does package signing and detects you own the code you're evaluating.
What has it stopped you from doing?
> AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities
I like Anthropic, but these are becoming increasingly transparent attempts to inflate the perceived capability of their products.
While some stuff is obviously marketing fluff, the general direction doesn't surprise me at all, and it's obvious that with model capabilities increase comes better success in finding 0days. It was only a matter of time.
Maybe a bad example since Nicholas works at Anthropic, but they're very accomplished and I doubt they're being misleading or even overly grandiose here
See the slide 13 minutes in, which makes it look to be quite a sudden change
> I doubt they're being misleading or even overly grandiose here
I think I agree.
We could definitely do much worse than Anthropic in terms of companies who can influence how these things develop.
If a bunch of CVEs do in fact get published a couple months (or whatever) from now, are you going to retract this take? It's not like their claims are totally implausible: the report about Firefox security from last month was completely genuine.
I would like to think that I would, yes.
What it comes down to, for me, is that lately I have been finding that when Anthropic publishes something like this article – another recent example is the AI and emotions one – if I ask the question, does this make their product look exceptionally good, especially to a casual observer just scanning the headlines or the summary, the answer is usually yes.
This feels especially true if the article tries to downplay that fact (they’re not _real_ emotions!) or is overall neutral to negative about AI in general, like this Glasswing one (AI can be a security threat!).
As Iran engages in a cyber attack campaign [1] today the timing of this release seems poignant. A direct challenge to their supply chain risk designation.
[1] https://www.cisa.gov/news-events/cybersecurity-advisories/aa...
If AGI is going to be a thing its only going to be a thing, its only going to be a thing for fortune 100 companies..
However, my guess is this is mostly the typical scare tactic marketing that Dario loves to push about the dangers of AI.
Evaluate it yourself. Look at the exploits it discovered and decide whether you want to feel concerned that a new model was able to do that. The data is right there.
> glass in the name
almost like they have an incentive to exaggerate
Yeah, makes sense. Those countries are bad because they execute state-sponsored cyber attacks, the US and Israel on the other hand are good, they only execute state-sponsored defense.
You think these AI companies are really going to give AGI access to everyone. Think again.
We better fucking hope open source wins, because we aren't getting access if it doesn't.
Then the next lab catches up and releases it more broadly
Then later the open weights model is released.
The only way this type of technology is going to be gated "to only corporations" is if we continue on this exponential scaling trend as the "SOTA" model is always out of reach.
they better make billions directly from corporations, instead of giving them to average people who might get a chance out of poverty (but also bad actors using it to do even more bad things)
This is a kludge. We already know how to prevent vulnerabilities: analysis, testing, following standard guidelines and practices for safe software and infrastructure. But nobody does these things, because it's extra work, time and money, and they're lazy and cheap. So the solution they want is to keep building shitty software, but find the bugs in code after the fact, and that'll be good enough.
This will never be as good as a software building code. We must demand our representatives in government pass laws requiring software be architected, built, and run according to a basic set of industry standard best practices to prevent security and safety failures.
For those claiming this is too much to ask, I ask you: What will you say the next time all of Delta Airlines goes down because a security company didn't run their application one time with a config file before pushing it to prod? What will the happen the next time your social security number is taken from yet another random company entrusted with vital personal information and woefully inadequate security architecture?
There's no defense for this behavior. Yet things like this are going to keep happening, because we let it. Without a legal means to require this basic safety testing with critical infrastructure, they will continue to fail. Without enforcement of good practice, it remains optional. We can't keep letting safety and security be optional. It's not in the physical world, it shouldn't be in the virtual world.
/s
Let alone their CEO scare mongering and actively attempting to get the government to ban local AI models running on your machine.
But at the core of anthropic seems to be the idea that they must protect humans from themselves.
They advocate government regulations of private open model use. They want to centralize the holding of this power and ban those that aren't in the club from use.
They, like most tech companies, seem to lack the idea that individual self-determination is important. Maybe the most important thing.
They're more like printing presses or engines. A great potential for production and destruction.
At their invention, I'm sure some people wanted to ensure only their friends got that kind of power too.
I wonder the world we would live in if they got their way.