Hacker News
new
past
comments
ask
show
jobs
points
by
yanis_t
16 hours ago
|
comments
by
solenoid0937
15 hours ago
|
[-]
https://marginlab.ai/trackers/claude-code-historical-perform...
reply
by
taylorfinley
12 hours ago
|
parent
|
[-]
Surely they are testing their optimizations against common benchmarks internally? I bet the "real world task" degradation is larger by some multiple than it appears when measured through a benchmark that is part of the target.
reply