upvote
I expect it's a model problem and not a harness problem, purely because some of the best harnesses (including OpenAI Codex itself) are open source and can be very easily tried against a new model.
reply
and I'm saying all the harnesses in the world arn't going to solve the myopic ability.

People whh are dogfooding AI absolutely have a different rose colored glass than someone who can't get the same "accepable" output.

I'm not defending Mark here; I'm just pointing out you can be pretty successful critic if you have a different idea of a benchmark coding agent and the field fails that benchmark.

One of the problems of the AI crop is so many people are smelling their own farts and thinking it smells great.

reply