So while anthropic's marketing may be hype there just wasn't much left to find, a point he makes in the blog post.
Whether it's a big step forward for other kinds of projects is difficult to tell, but this highlights that everybody should be using AI code review tools to audit their existing code today, and not everybody is.
What it highlights, is that Mythos doesn't seem so much better than other LLM driven tooling at finding security issues, which was the strongest claim Anthropic made in the first place.
Funnily enough that was while Dario Amodei was their research director.
More seriously, so far I haven’t seen much indication that Mythos is more than Opus with a security focused code analysis harness. That said, the fact it can find these bugs in an automated fashion is the more important takeaway outside of the hype.
I’m curious what the error rate is on the detections, because none of that means much if it is wrong 90% of the time and we are only hearing about the examples that are useful marketing.
I remember when OpenAI was saying GPT-2 was too dangerous to release.
If I’m not mistaken, after the media cycle, he lost his job for breaking confidentiality.
That was the opposite of marketing, Google really didn’t get how to turn this into a product until ChatGPT happened.
The other guy worked on Google's AI safety team where one would expect he'd have a basic grasp of how the technology works before making outlandish claims.
They claim the huge advance is in exploiting the bugs.
The other alternative is that Curl is simply secure enough that there was far less to find than in other projects.
Marketing is not intentional.
Evidences: 10 years ago, when I interviewed Baidu AI with Andrew Ng and Dario, Dario is the kind of person is pure-hearted to the point being ideological. Given Dario's successful career so far, that essence has gradually grown into a conviction, and surrounded by a purposely built team which amplifies his ideology.
Humans are very convenient creature, a rare few small fraction of them are no doubt the master of convenience: they morph their mental manifold without a hint of contradiction in their own mental mechanisms.
Mythos put Anthropic back into the White House’s good graces. It also branded Anthropic as badass, something their softener image probably needed to win government contracts.
Maybe it wasn’t marketing. But the product’s configuration, and how Anthropic talked about and released it, sure as hell played beautifully. (The timing, while Musk and Altman are distracted with each other, also couldn’t have been better.)
Things change when you’re running a business like Anthropic, especially as the CEO. You have a responsibility to shareholders, and you just need to play the game.
Anthropic chose a great angle: focus on professionals / enterprise, safety, etc. Those can both be done by a genuine desire to make great technology, and for business purposes require you to position yourself in a bit “better” way than reality.
Just look at what their strategy is with Mythos, it’s almost perfection: the “it’s not ready to be released to the public” angle hits all the marks: they care about responsibility / safety, they have “the best” model, and “LLMs are dangerous, but we, as the guardians, can be trusted”. This also helps the industry as a whole with regulation: if they’re being constrained, China will develop even more dangerous models.
This is a result of how smart people treat business, it’s PR perfection, especially given how much the whole industry is talking about it.
(Yes, they fail in other PR areas, but that’s a different discussion)
That's an odd definition of "intentional". Evolution has filtered for people with certain views and the marketing has just emerged from their actions. ... So?
A deadly virus (naturally occurring one let's say) wasn't created intentionally. Evolution selected for it. It's still bad and kills people. Doesn't make it nice because of lack of intention.
Whether the person doing the marketing was sincere about it or not is immaterial, since marketing is experienced almost entirely by the people consuming it, and not the people communicating it. What matters is if the audience is sincerely concerned by the message, and it's transparently the case that they were sincerely concerned by it.
Curl uses all sorts of tools, including AI tools to find bugs. These tools, according to the article found hundreds of bugs including a dozen CVE.
Mythos found one vulnerability. It means the Mythos is just another tool, not the revolution it claims to be.
It is common that when a new tool is introduced that a bunch of bugs are found, with diminishing returns. Mythos finding one vulnerability is consistent to what I would expect for a major update to an existing tool, which Mythos is over existing LLM-based solutions.
And it is not overkill, the proof is that it found that vulnerability. It is like saying the new version of some static analyzer with some new rules is "overkill" because it only found only one more bug than the previous version. Deciding whether it is overkill or not is more about context. Using a very expensive model like Mythos for some little used non-critical software is overkill, but for Curl, it absolutely isn't.
If Mythos found loads of vulnerabilities in Firefox but not in Curl, I wouldn't say that's because of Mythos is so good, but rather that with the release of Mythos, they did some testing that could have been done before using the same tools Curl have used.
> Once the end-to-end pipeline is in place, it’s trivial to swap in different models when they become available. Building this pipeline early helped us find a number of serious bugs using publicly-available models, and it also helped us hit the ground running when we had the opportunity to evaluate Claude Mythos Preview. In our experience, model upgrades increase the effectiveness of the entire pipeline: the system gets simultaneously better at finding potential bugs, creating proof-of-concept test cases to demonstrate them, and articulating their pathology and impact.
that helps us to understand how much of Mythos is hype and how much is real
I've seen literally near word-for-word this exact chain of events multiple times previously
> Over the last few months, we have stopped getting AI slop security reports in the #curl project. They're gone.
> Instead we get an ever-increasing amount of really good security reports, almost all done with the help of AI.
> They're submitted in a never-before seen frequency and put us under serious load.
> I hear similar witness reports from fellow maintainers in many other Open Source projects.
> Lots of these good reports are deemed "just bugs" and things we deem not having security properties.
[1]: https://www.linkedin.com/posts/danielstenberg_hackerone-shar...
I've been running my own security scanning software (disclaimer: now starting a company @ zeroquarry.com) for this, and from what I've seen there's a huge value in prompts + adversarial LLM review. Without adversarial review, you get garbage (as this blog points out: 4/5 basically are nonsense) and with a good prompt, you can use almost any "near frontier" model from my experience as long as the prompt helps with the guardrails or the model doesn't protect in such a strict way
About as subtle as a personal injury lawyer's billboard
It's almost Trump-esque - "this model will change everything forever; we are doomed; we are saved; we will all be fired; we will all be rich", etc
They need the hype to pay off way more than we do. So many of us who still write code directly stand to lose nothing of our capabilities if the marketing claims cannot hold water.
I'm surprised you say that because it is all over Hacker News. Every single post is co-opted into promoting AI. Try finding a submission with fifty points or more than doesn't have AI or LLM's mentioned somewhere in the comments.
That’s not really the point though. I have no doubt AI is useful, I just don’t want to have it shoved in my face every five minutes.