Fable just understood what I was talking about and never needed me to stop it and say "you forgot this thing we talked about." The difference in spatial reasoning capability between the three models is very very palpable. I am curious to get more time with it because ultimately I feel like I sandbagged it by giving it problems that would've been within Opus' abilities, but required a lot more handholding.
Reminds me of the old adage: don't try to be too smart when writing code. Otherwise, dumber people - including your future self - will have trouble working with it.
if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it
For reference: it's called Kernighan's Law, and can be found in the Second Edition of "The Elements of Programming Style", page 10 [1].
The original phrasing is:
> Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?
[1] https://archive.org/details/the-elements-of-programming-styl...
Fable's probably objectively better at full power. I mean, I definitely felt the same difference in competency between Fable and current Opus. But Opus itself has definitely been nerfed, and Fable, even if it comes back the public forever (probably won't), will get nerfed.
That was a nice time. Let us get back to that time. Use open weights models. Own stuff.
This is interesting. The "reported to me like a colleague" part.
Is it just that anthropic gave Mythos even more of that Anthropic™ character, (incorrectly) radiating confidence?
Is that why people have been losing their minds over that thing? Is this just cheap social engineering?
I mean I bet it is also slightly more capable than opus, but that would all check out to me. Man.
Thanks for sharing I suppose.
to an extent that might have done it, but i had been playkng around ahead of time trying to reverse engineer my ray bans case so i can make my own plastic insert, and fable to opus' work from mostly broken to mostly done, and then when fable went away, opus broke it again
Or opus to opus
Or really any new thing to old thing
The user here is right in what they said but wrong in why they said it, essentially.
Every upgrade made what came before it appear awful in comparison, to such an extent that every upgrade was called "photorealistic" and people kept forgetting that they'd been using that description for the previous engines that they were now dismissing.
I do make mistakes though. Please check results.
I A/B tested on a whole array of prompts between Codex and Fable, and Fable almost always found that Codex had produced a better plan and covered more edge cases than it did itself.
For every problem I gave the exact same prompt to both models, then I had each analyze the other's output. For roughly 80% of the prompts, Fable acknowledged that Codex's output was an improvement on its own, for 20% the converse situation occurred.
There was one egregious case where Fable suggested deploying code which would have resulted in a production bug, an edge-case which Codex identified and proposed a fix.
Note: this is all for optimized Rust code designed to be highly CPU and memory efficient.
I do prefer Anthropic's models for any tasks with front-end/design work needed. But I don't do much of that kind of work usually.
Fable fumbled the one simple task that I gave it too. I gave it multiple very hard open-ended tasks (effectively math tasks) involving research code and it crushed them. It's the first model I've seen that can do that. The current Codex will never produce the type of code Fable gave me no matter how many times I run the same problem at it, because it won't stop trying naive rubbish. And if I tell Codex to try to improve the code, it can't figure out why trying the same classical tricks isn't making it work better, regardless of what I tell it. Opus is marginally better because it can at least recognize some subtleties over time, but still disappointing because it has no idea how to deal with them.
Most programmers want precision instruments for their workflow. That's fine, use the right tool for the job. In my line of work, I need crazy solutions because the obvious stuff doesn't work. That's where Fable shined for me.