While I don't agree with their actions here, I do think there's sufficient reason to hold that belief.
On some fronts (e.g. security, on which you've experienced more than me), I think there are surmountable challenges. But on other fronts (e.g. bio), a single errant actor could reasonably kill millions or billions of people with sufficiently powerful AI. We don't have good defenses here, and those actors do exist.
I still don't agree with these actions, but I do think I agree with their assumptions.
I participated in the internal uplift test for Sonnet 3.7, and even then, one non-expert got huge uplift from the model [1]. So, I'd consider those evals a lower bound of capabilities that can be elicited from a model.
The team behind Biomni, a biomedical agent that's widely used by researchers, has found consistent gains between models [2]. I trust them, because I visited them to build their HPC tool [3], which the model is quite capable of using – moreso than most grad students. They also care about real usage from real people.
SecureBio also has some public evals [4], which have continued to show increasing uplift.
And while synthesis monitoring is a part of the solution, I think you might underestimate how much goes under the radar. See the Reedley lab incident for an example [5].
Is Anthropic still effectively throttling beneficial biomedical research? Yes! And so is OpenAI. But the underlying capability is still actually dual use.
[1]: See page 25 in https://www-cdn.anthropic.com/9ff93dfa8f445c932415d335c88852...
[2]: Their benchmark has a preprint at https://www.biorxiv.org/content/10.64898/2026.05.12.724604v1...
[3]: https://x.com/phylo_bio/article/2029233694775624096
[5]: Public report for the Reedley lab incident is at https://chinaselectcommittee.house.gov/sites/evo-subsites/se... – but there are also many news articles about it