upvote
can we plese make the bluey bench the gold standard for all models always
reply
Can you compare it to Opus 4.6 with thinking disabled? It seems to have very impressive benchmark scores. Could also be pretty fast.
reply
Added a thinking-disabled Opus 4.6 timing. It took 1m 4s – coincidentally the same as 5.3-codex-low.
reply
I wonder why they named it so similiarly to the normal codex model while it much worse, while cool of course.
reply