This hasn't been true in a long time.
In fact I'm more and more inclined to run my own benchmarks from now on, because I seriously distrust those I see online.
Even if the benchmarks are indeed valid, they just don't reflect my use cases, usages and ability to navigate my projects and my dependencies.
Maybe that's just CLAUDE.md and memory causing the difference of course.
As a matter of preference however I like the way Claude Code works just a lot better, instructing it to work with parallel subagents in work trees etc. just matches the way I think these things should work I guess.
Have they announced this?
No and indeed they have said they never do this at all.