undefined

points

by mcintyre19949 hours ago|

[-]

TBH this is the main thing that made me start trusting Claude enough to actually find it useful, and I'm surprised other models haven't caught up. I assumed they had and I just wasn't aware because I'm not using them in the same way.

by Supermancho7 hours ago|

prev|

[-]

> I found my interactions with Fable to be extremely impressive; it made other models, including GPT 5.5 and Opus 4.8, feel small and dumb.

> Anthropic models have consistently been top-scoring in BullshitBench[0]

eyeroll I find that Anthropic models feel big and dumber.

https://www.endorlabs.com/research/ai-code-security-benchmar... puts Fable 5th, which seems about right to me.

I'm interested in code utility and correctness, even if the majority of AI use is not focused on that.

by airstrike5 hours ago|

parent|

[-]

I think this just proves anyone can pick a benchmark that supports their point so maybe we shouldn't use treat them as evidence at all.