What degree of predictability is required? I imagine the bar is pretty low if you trust the previous models in the same contexts.
People used to wait in line all night to buy an iPhone. This isn’t that different.
Small sample size, but if Mythos/Fable was that much better, I feel like it should’ve given me an obviously better answer than Opus.
I, for one, have tried using it several times today and the guardrails kept switching the model back to Opus, so I have no clue if it's impressive or not.