upvote
> way worse than the average Cloudflare blog

How long has it been since you took your average? Lately all Cloudflare output has been heavily AI'd.

reply
> They claim it’s a different kind of tool and then describe using it the same way you’d use any other model. This really felt way worse than the average Cloudflare blog and really just rehashed the Mythos announcement which had already called out the key parts being chaining and crafting examples.

Hah, I was trying to parse this too.

Charitably perhaps they're being vague on exactly what's different because they're still under NDA.

reply
Sounds different because it’s hidden advertisement not a regular blog post
reply
But why would cloudflare advertise Anthropic? They are competing with Anthropic by hosting open weights models.
reply
reply
Can you elaborate? That just seems to be a Cloudflare's announcement from May 2025 that they'd be supporting MCP servers.
reply
It's circular financing. It's a circlejerk. It's a circular financing jerk.
reply
They got privileged early access to an unreleased frontier model to harden their systems, with Anthropic engineer support, and likely were able to use it to make other product optimizations tangential to security too. A blog article afterwards is a cheap price for unlocking that access, regardless of how well it paid off.
reply
I think they're saying it has qualitatively different capabilities that make certain kinds of security work more worth pursuing with the model, not that the model of human-AI interaction has changed.

You're right that they're using a harness like everyone else. The general idea of giving the model a harness is not going to change. I mean even humans need harnesses to accomplish some things.

reply
Google Maps is my favorite human harness.
reply
The post says they wrote a custom harness that orchestrates work between multiple separate model invocations. That is different from running Claude Code (which is a specific existing harness around the Claude models).

The post takes a while to get around to saying that, and could have included more detail besides the workflow diagram and table (which they flag as only "an example of" such a harness), but it does answer the question. It's a different kind of tool because it's a model rather than a harness+model pair.

reply
> the model has its own emergent guardrails that sometimes cause it to push back on legitimate security research requests. But as we found, these organic refusals aren’t consistent - the same task, framed differently or presented in a different context, could produce completely different outcomes as illustrated in the examples below.

This was new. I'm surprised that a model specifically designed for security research and gated to professionals is refusing legitimate requests

reply
There's pretty strong evidence that (mis)alignment in one area creates (mis)alignment in others. The "aligned behavior" vectors are not orthogonal from cybersecurity to bioweapons to prejudice, so having alignment in some will likely bleed into others.
reply
'Its not X, its Y' is also a common LLM trope.
reply
My guess is because it is a model trained specifically for security/hacking. So comparing it to Opus, trained for chat/code/etc., is apples-to-oranges.
reply
It is not, that's what surprised Anthropic employees too.
reply
I think what they might mean is:

Because of it's capabilities, a new kind of harness can be built for it, thus the entire system (model + harness) is a different kind of tool than say Claude code

reply
But did they build this different harness? And are they sure other models can't cope with it?
reply
Right I expected the piece to transition into “and here’s how we built a whole new thing for it” but it never did.
reply
They kind of have a little diagram explaining the steps I imagine every single step in that to basically be it's own Claude code session.
reply