upvote
the pessimistic take is their harness is no better than thise available and he thinks they all suck together.

from a high level, these agents absolutely do not function as a rational human through even medium scoped problems. even when you try to add memory, you just multiply halucinated context which just makes it error out on tasks in harder to detect manner.

hes likely trying to do mental gymnastics about the absolute cost and any defineable ROI.

reply
I expect it's a model problem and not a harness problem, purely because some of the best harnesses (including OpenAI Codex itself) are open source and can be very easily tried against a new model.
reply
and I'm saying all the harnesses in the world arn't going to solve the myopic ability.

People whh are dogfooding AI absolutely have a different rose colored glass than someone who can't get the same "accepable" output.

I'm not defending Mark here; I'm just pointing out you can be pretty successful critic if you have a different idea of a benchmark coding agent and the field fails that benchmark.

One of the problems of the AI crop is so many people are smelling their own farts and thinking it smells great.

reply