undefined

points

[-]

I'm confused on the scenario here. There is human in the loop, it's the feedback part...there is business context, it is either seeded or maintained by the human and expanded by the agent. The agent can make inferences about the world, especially when embodiment + better multimodal interaction is rolled out [embodiment taking longer].

Benchmarks ==> it's absolutely not a given that humans can't be involved in the loop of performance measurement. Why would that be the case?