Hacker News
new
past
comments
ask
show
jobs
points
by
mi_lk
7 hours ago
|
comments
by
leerob
2 hours ago
|
[-]
(I work at Cursor) We score well on Terminal-Bench and SWE-bench Multilingual. DeepSWE, not so great yet, as it's more for very long-horizon tasks. We're planning to include more public benchmarks in our next model release.
reply