upvote
SWE-bench pro is ~20% higher than the previous .1 generation which was released 2 months ago. For their SWE benchmark, the token consumption iso-performance is down 2x from the model they released 2 months ago.

If this is a plateau I struggle to imagine what you consider fast progress.

reply
Your comment doesn't make any sense, opus 4.6 was release two months ago, what jump would you expect?
reply
Every night praying for tomorrow
reply
The generations are two months apart now though…
reply