upvote
Hard disagree. Opus reports to me like a student. Fable reported to me like a colleague (researcher). It genuinely seemed to pick up on nuance that the other models just don't, even when I tell them explicitly. It's been really frustrating that neither Codex nor Opus can make targetted edits to Fable's code without screwing something subtle up. For context, this is for computational geometry work, so your mileage may vary.
reply
Fable happened to be released after I had been experimenting with Claude Code for roughly two weeks. I had been trying to use Sonnet, and when I switched to Opus it was night and day. My understanding of geometry was maybe not as good as it should've been, and I kept seeing Sonnet say things I knew were wrong but didn't know enough about 6DOF camera positioning to ask it to fix. I finally asked the right questions, it couldn't answer them at all, I switched to Opus, it was night and day. But! Opus still couldn't really keep 6DOF "in its head." When I left it to its own devices it tended to come back having forgotten that it needed to keep 6 degrees of freedom in its head and collapsed the problem down to 3DOF or just a single angle.

Fable just understood what I was talking about and never needed me to stop it and say "you forgot this thing we talked about." The difference in spatial reasoning capability between the three models is very very palpable. I am curious to get more time with it because ultimately I feel like I sandbagged it by giving it problems that would've been within Opus' abilities, but required a lot more handholding.

reply
> It's been really frustrating that neither Codex nor Opus can make targetted edits to Fable's code without screwing something subtle up.

Reminds me of the old adage: don't try to be too smart when writing code. Otherwise, dumber people - including your future self - will have trouble working with it.

reply
Some problems are very hard to solve with stupid code. This can easily be the case (computational geometry)
reply
For reference:

if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it

reply
Ah thanks - I couldn't remember the original version.

For reference: it's called Kernighan's Law, and can be found in the Second Edition of "The Elements of Programming Style", page 10 [1].

The original phrasing is:

> Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?

[1] https://archive.org/details/the-elements-of-programming-styl...

reply
It seems I was not able to either, and I trusted google AI snippet. Thanks
reply
Yes, in my project I made so much more progress in 3 days of Fable that is not comparable to how Opus is working.
reply
To be fair, labs silently nerf models all the time.

Fable's probably objectively better at full power. I mean, I definitely felt the same difference in competency between Fable and current Opus. But Opus itself has definitely been nerfed, and Fable, even if it comes back the public forever (probably won't), will get nerfed.

reply
I remember a time where a product didn't suddenly get worse while you were blinking.

That was a nice time. Let us get back to that time. Use open weights models. Own stuff.

reply
That was before SaaS became a thing. Products didn't degrade over time because they couldn't easily reach out to your machine and remotely overwrite bytes on the CD-ROM the product came on.
reply
Wait, so..

This is interesting. The "reported to me like a colleague" part.

Is it just that anthropic gave Mythos even more of that Anthropic™ character, (incorrectly) radiating confidence?

Is that why people have been losing their minds over that thing? Is this just cheap social engineering?

I mean I bet it is also slightly more capable than opus, but that would all check out to me. Man.

Thanks for sharing I suppose.

reply
the primary difference i noticed is that fable didnt try to check in every minute

to an extent that might have done it, but i had been playkng around ahead of time trying to reverse engineer my ray bans case so i can make my own plastic insert, and fable to opus' work from mostly broken to mostly done, and then when fable went away, opus broke it again

reply
No, it’s just a fundamentally much better model. Going back to Opus feels like the model has been lobotomized. It makes much more frequent errors, especially of the “I claimed I tested x y and z, but actually only kinda half heartedly tested x, and assumed I understood what was wrong” variety.
reply
Wait but that has been the exact word-for-word complaint when comparing sonnet to opus

Or opus to opus

Or really any new thing to old thing

reply
You hear the same canard every time Anthropic releases a new model or version. I'm not convinced they're objective anecdotes. I wonder if it's simply the new model, while marginally better, has a different style and people find that new/refreshing. That is what makes it feel so much better than the previous release.
reply
When the agent is becoming more accurate and thorough what would you expect to be reported?
reply
Oh I am sure that it became somewhat more accurate, and with that, the labeling there is in fact technically correct. It just does not work as an explainer for the doomsday-ish hype that model has induced in a lot of people's brains.

The user here is right in what they said but wrong in why they said it, essentially.

reply
An analogy I keep coming back to with the current progress in LLMs is the progress in the 90s of 3D game engines.

Every upgrade made what came before it appear awful in comparison, to such an extent that every upgrade was called "photorealistic" and people kept forgetting that they'd been using that description for the previous engines that they were now dismissing.

https://archive.org/details/nextgen-issue-26

reply
That’s a rather bad faith framing, I think. Who are you to judge why I said something?
reply
A person with the exact kind of pattern matching brain disorder this tech has been modeled after.

I do make mistakes though. Please check results.

reply
Maybe I was getting downgraded to Opus 4.8 but I saw nothing even close to resembling this behavior when using Fable.
reply
It very much depends on the task. What were you trying it on?
reply
Funny, I find Codex to still be better at Coding than Opus or Fable.

I A/B tested on a whole array of prompts between Codex and Fable, and Fable almost always found that Codex had produced a better plan and covered more edge cases than it did itself.

For every problem I gave the exact same prompt to both models, then I had each analyze the other's output. For roughly 80% of the prompts, Fable acknowledged that Codex's output was an improvement on its own, for 20% the converse situation occurred.

There was one egregious case where Fable suggested deploying code which would have resulted in a production bug, an edge-case which Codex identified and proposed a fix.

Note: this is all for optimized Rust code designed to be highly CPU and memory efficient.

I do prefer Anthropic's models for any tasks with front-end/design work needed. But I don't do much of that kind of work usually.

reply
I've used them back to back as well. Codex is good at specific tasks; it doesn't try to go big, it does what it's told provided the task is relatively procedural. If Codex can make progress on a task, why would I give it to Fable?

Fable fumbled the one simple task that I gave it too. I gave it multiple very hard open-ended tasks (effectively math tasks) involving research code and it crushed them. It's the first model I've seen that can do that. The current Codex will never produce the type of code Fable gave me no matter how many times I run the same problem at it, because it won't stop trying naive rubbish. And if I tell Codex to try to improve the code, it can't figure out why trying the same classical tricks isn't making it work better, regardless of what I tell it. Opus is marginally better because it can at least recognize some subtleties over time, but still disappointing because it has no idea how to deal with them.

Most programmers want precision instruments for their workflow. That's fine, use the right tool for the job. In my line of work, I need crazy solutions because the obvious stuff doesn't work. That's where Fable shined for me.

reply
I found Fable to be both more intelligent and much better at pursuing complex goals than any previous model. I was impressed enough that I wrote up my experience – it's a little unusual because it was on open source code, so I could post the full session transcript and commits, if people want to judge for themselves https://tossrock.substack.com/p/36-hours-with-fable
reply
You might have found a use case on which both have same capabilities, but this is in general very not true. I’ve had Fable autonomously fix concurrency bugs by itself other models couldn’t even diagnose from logs.

Perhaps it is a lot of small improvements all over the place, but the sum is a step change in capability.

reply
In LLMs, much like in humans, agency and misalignment are two sides of the same coin.
reply
> agency and misalignment are two sides of the same coin.

The free will coin?

reply
In my experience "free will", like "consciousness" and "common sense", is not so much a concept with a universally agreed definition as it is a cognitive stop sign or an applause light, meaning different things to everyone who uses the term.

Do I have free will, or am I bounded by the laws of physics?

Even if you think my soul is completely independent of my body, there are theologians who argue that God being omniscient means that who goes to heaven and hell is predetermined before birth and therefore no action you take will ever change the afterlife you go to, and that to think God isn't omniscient would be blasphemy; do they think I have free will?

And then there's Thelma with "Do what thou wilt shall be the whole of the Law", which can be understood in terms of (amongst other things) "Don't let peer pressure manipulate you into thinking you want other things than you really want", though this is of course a simplification much as the omniscient example above: https://en.wikipedia.org/wiki/True_Will

reply
Of all of the concepts like "consciousness" and "agency", "free will" is probably the least useful and poorly defined.

It's a hand-me-down from Western beliefs about morality and individuality - including Thelema and Christianity.

So there's a lot of starting from the concept and working back to assumed conclusions.

Generally humans do not have free will, do have very limited political, economic, and psychological agency, usually selected from a small number of competing rule sets, and are also far more easily influenced than they suspect.

Culture is more like a cellular automaton or diffusion system. Occasionally a transformation ripples out from an individual cell, often for fairly random reasons, but the big patterns are emergent, and every so often the soup shakes itself up and settles into a new arrangement.

IMO LLMs are the most recent proto-version of that, running on a different substrate.

reply