The actual result is TLO, and "only 6 more steps" in OP misreads how sequential attack chains work. These aren't independent puzzles. Each step gates the next. Averaging 22 vs 16 means Mythos is consistently punching through bottlenecks that completely stop Opus 4.6. More importantly: Mythos completed the full chain 3/10 times. Opus 4.6 completed it 0/10 times. That's not a narrow margin. In any security-relevant framing, "achieves full network takeover" vs "does not achieve full network takeover" is a binary threshold, and exactly one model crossed it. A year ago the best models struggled with beginner CTFs. Now one autonomously replicates what AISI estimates takes human professionals 20 hours. Calling that unimpressive because the margin over second place is single digits is measuring the wrong gap.
re: compute, "requires lots of compute" and "scaling is a dead end" are near-opposite claims. If performance is still climbing at 100M tokens with no visible plateau, that's evidence scaling works. Whether it's cheap today is a different question, and not one that ages well. Compute costs fall reliably, so what matters is the capability at a given price point in 18 months, not today.
The underlying point still stands, namely that "more compute" as the default answer is not sustainable.
Why ?
Because even if we accept the unlikely dream that GPU prices will magically take a nose-dive, you still need somewhere to put all those servers stuffed with GPUs.
That means datacentres.
And "more datacentres" is absolutely not sustainable.
The cooling needs, the power needs, the land needs..... none of it is remotely sustainable.
The premise that inference compute must scale linearly with capability isn't supported by what's actually happening. Distillation, quantization, and architectural efficiency gains routinely let you run yesterday's frontier capability at a fraction of the cost and hardware. GPT-3.5 level performance runs on a phone now. The 100M token budget Mythos used here will not require 100M tokens worth of 2026 hardware forever.
On datacenters specifically: yes, they require power, cooling, and land. So does every other piece of industrial infrastructure humanity has ever built and then incrementally made more efficient. The energy per FLOP has been dropping for decades and continues to. You can argue the rate of buildout is concerning, and that's a reasonable discussion. But "not sustainable" as a flat declaration requires you to believe efficiency gains will stall, energy production won't expand, and cooling technology will stay static, all simultaneously. That's a much stronger claim than it sounds.
None of which has anything to do with whether the AISI results are significant. They are.