undefined

upvote

points

by hodgehog1114 hours ago |

upvote

by lukeschlather14 hours ago|

[-]

Fable happened to be released after I had been experimenting with Claude Code for roughly two weeks. I had been trying to use Sonnet, and when I switched to Opus it was night and day. My understanding of geometry was maybe not as good as it should've been, and I kept seeing Sonnet say things I knew were wrong but didn't know enough about 6DOF camera positioning to ask it to fix. I finally asked the right questions, it couldn't answer them at all, I switched to Opus, it was night and day. But! Opus still couldn't really keep 6DOF "in its head." When I left it to its own devices it tended to come back having forgotten that it needed to keep 6 degrees of freedom in its head and collapsed the problem down to 3DOF or just a single angle.

Fable just understood what I was talking about and never needed me to stop it and say "you forgot this thing we talked about." The difference in spatial reasoning capability between the three models is very very palpable. I am curious to get more time with it because ultimately I feel like I sandbagged it by giving it problems that would've been within Opus' abilities, but required a lot more handholding.

reply

upvote

by raphman14 hours ago|

[-]

> It's been really frustrating that neither Codex nor Opus can make targetted edits to Fable's code without screwing something subtle up.

Reminds me of the old adage: don't try to be too smart when writing code. Otherwise, dumber people - including your future self - will have trouble working with it.

reply

upvote

by murkt13 hours ago|

[-]

Some problems are very hard to solve with stupid code. This can easily be the case (computational geometry)

reply

upvote

by mejutoco12 hours ago|

[-]

For reference:

if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it

reply

upvote

by raphman11 hours ago|

[-]

Ah thanks - I couldn't remember the original version.

For reference: it's called Kernighan's Law, and can be found in the Second Edition of "The Elements of Programming Style", page 10 [1].

The original phrasing is:

> Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?

[1] https://archive.org/details/the-elements-of-programming-styl...

reply

upvote

by mejutoco9 hours ago|

[-]

It seems I was not able to either, and I trusted google AI snippet. Thanks

reply

upvote

by mohsen114 hours ago|

[-]

Yes, in my project I made so much more progress in 3 days of Fable that is not comparable to how Opus is working.

reply

upvote

by sigbottle14 hours ago|

[-]

To be fair, labs silently nerf models all the time.

Fable's probably objectively better at full power. I mean, I definitely felt the same difference in competency between Fable and current Opus. But Opus itself has definitely been nerfed, and Fable, even if it comes back the public forever (probably won't), will get nerfed.

reply

upvote

by hypfer14 hours ago|

[-]

I remember a time where a product didn't suddenly get worse while you were blinking.

That was a nice time. Let us get back to that time. Use open weights models. Own stuff.

reply

upvote

by TeMPOraL11 hours ago|

[-]

That was before SaaS became a thing. Products didn't degrade over time because they couldn't easily reach out to your machine and remotely overwrite bytes on the CD-ROM the product came on.

reply

upvote

by hypfer14 hours ago|

[-]

Wait, so..

This is interesting. The "reported to me like a colleague" part.

Is it just that anthropic gave Mythos even more of that Anthropic™ character, (incorrectly) radiating confidence?

Is that why people have been losing their minds over that thing? Is this just cheap social engineering?

I mean I bet it is also slightly more capable than opus, but that would all check out to me. Man.

Thanks for sharing I suppose.

reply

upvote

by 8note14 hours ago|

[-]

the primary difference i noticed is that fable didnt try to check in every minute

to an extent that might have done it, but i had been playkng around ahead of time trying to reverse engineer my ray bans case so i can make my own plastic insert, and fable to opus' work from mostly broken to mostly done, and then when fable went away, opus broke it again

reply

upvote

by TylerE14 hours ago|

[-]

No, it’s just a fundamentally much better model. Going back to Opus feels like the model has been lobotomized. It makes much more frequent errors, especially of the “I claimed I tested x y and z, but actually only kinda half heartedly tested x, and assumed I understood what was wrong” variety.

reply

upvote

by hypfer14 hours ago|

[-]

Wait but that has been the exact word-for-word complaint when comparing sonnet to opus

Or opus to opus

Or really any new thing to old thing

reply

upvote

by cpburns20095 hours ago|

[-]

You hear the same canard every time Anthropic releases a new model or version. I'm not convinced they're objective anecdotes. I wonder if it's simply the new model, while marginally better, has a different style and people find that new/refreshing. That is what makes it feel so much better than the previous release.

reply

upvote

by solumunus14 hours ago|

[-]

When the agent is becoming more accurate and thorough what would you expect to be reported?

reply

upvote

by hypfer14 hours ago|

[-]

Oh I am sure that it became somewhat more accurate, and with that, the labeling there is in fact technically correct. It just does not work as an explainer for the doomsday-ish hype that model has induced in a lot of people's brains.

The user here is right in what they said but wrong in why they said it, essentially.

reply

upvote

by ben_w13 hours ago|

[-]

An analogy I keep coming back to with the current progress in LLMs is the progress in the 90s of 3D game engines.

Every upgrade made what came before it appear awful in comparison, to such an extent that every upgrade was called "photorealistic" and people kept forgetting that they'd been using that description for the previous engines that they were now dismissing.

https://archive.org/details/nextgen-issue-26

reply

upvote

by TylerE14 hours ago|

[-]

That’s a rather bad faith framing, I think. Who are you to judge why I said something?

reply

upvote

by hypfer14 hours ago|

[-]

A person with the exact kind of pattern matching brain disorder this tech has been modeled after.

I do make mistakes though. Please check results.

reply

upvote

by dimgl14 hours ago|

[-]

Maybe I was getting downgraded to Opus 4.8 but I saw nothing even close to resembling this behavior when using Fable.

reply

upvote

by hodgehog1112 hours ago|

[-]

It very much depends on the task. What were you trying it on?

reply

upvote

by saberience7 hours ago|

[-]

Funny, I find Codex to still be better at Coding than Opus or Fable.

I A/B tested on a whole array of prompts between Codex and Fable, and Fable almost always found that Codex had produced a better plan and covered more edge cases than it did itself.

For every problem I gave the exact same prompt to both models, then I had each analyze the other's output. For roughly 80% of the prompts, Fable acknowledged that Codex's output was an improvement on its own, for 20% the converse situation occurred.

There was one egregious case where Fable suggested deploying code which would have resulted in a production bug, an edge-case which Codex identified and proposed a fix.

Note: this is all for optimized Rust code designed to be highly CPU and memory efficient.

I do prefer Anthropic's models for any tasks with front-end/design work needed. But I don't do much of that kind of work usually.

reply

upvote

by hodgehog112 hours ago|

[-]

I've used them back to back as well. Codex is good at specific tasks; it doesn't try to go big, it does what it's told provided the task is relatively procedural. If Codex can make progress on a task, why would I give it to Fable?

Fable fumbled the one simple task that I gave it too. I gave it multiple very hard open-ended tasks (effectively math tasks) involving research code and it crushed them. It's the first model I've seen that can do that. The current Codex will never produce the type of code Fable gave me no matter how many times I run the same problem at it, because it won't stop trying naive rubbish. And if I tell Codex to try to improve the code, it can't figure out why trying the same classical tricks isn't making it work better, regardless of what I tell it. Opus is marginally better because it can at least recognize some subtleties over time, but still disappointing because it has no idea how to deal with them.

Most programmers want precision instruments for their workflow. That's fine, use the right tool for the job. In my line of work, I need crazy solutions because the obvious stuff doesn't work. That's where Fable shined for me.

reply