The word harness brings the truth of LLMs back down to Earth.
it really felt like between 2018 and 2022ish like LLMs had this magical aura, like the orchestration layer was intelligent, maybe even recursive, beyond what simple functions could do. It was assumed that this was a solved problem. The word "orchestration" denoted it, the words we used were full of optimism. When you lift the veil, it really is just regex, and cool tricks sure, but it's a harness it's a utility, there's no magic here, there's realism.
Maybe the labs even had a part to play in this as well; attempting to make themselves look magical. I mean just look at the choice of name for "Mythos", it's about bringing back that feeling of myth and magic after we saw under the veil.
The reality is that the labs have produced magical models yes, but are locking them into ecosystems that leave a lot to be desired, and are easily reproducible, and essentially are cron jobs, regex.. things we've seen in traditional cloud for decades. It feels like an attempt to create a moat where there is none.
Maybe I'm wrong but this has been my impression
"harness engineering" is the term claimed by that article to have originated in February. It does seem obvious in retrospect and I don't remember an origination point, but there's at least one hn comment predating that in December[2] and it doesn't treat it as novel.
I will admit that my bias is against any self congratulatory buzzword fads (I'm still not over "MCP is the USB of LLMs" or whatever and that's been a year now too). "Who coined the term harness engineering?" -> who cares? It was already widely being done.
[1] https://www.lesswrong.com/posts/7mqp8uRnnPdbBzJZE/is-gemini-...
The Pokémon article you linked is basically about benchmarking. In that context, the harness functions as part of the benchmark setup: the controlled environment around the model, the available inputs, tools, and assistance.
The current usage of “harness,” at least in the agent engineering discussion, seems closer to a lower-level runtime layer, almost like an OS around the agent.
So I see this as a transition: from “harness” as a narrower benchmark/control-variable layer to “harness” as the broader operating environment of the agent.
That does not mean I think your point is wrong. With topics like this, the interpretation depends on which part of the lineage one emphasizes. The first appearance of the idea may go back to 2022 or earlier, while the usage that looks closer to the current meaning may have emerged at a different point.
I am probably giving more weight to the SIG article, while you are giving more weight to a different point in the lineage. Both seem reasonable to me.