I suspect that more could be done in terms of translating semi-naive user requests into the steps that a senior developer would take to enact them, maybe including the tools needed to do so.
It's interesting that the author believes that the best open source models may already be good enough to complete with the best closed source ones with an optimized agent and maybe a bit of fine tuning. I guess the bar isn't really being able to match the SOTA model, but being close to competent human level - it's a fixed bar, not a moving one. Adding more developer expertise by having the agent translate/augment the users request/intent into execution steps would certainly seem to have potential to lower the bar of what the model needs to be capable of one-shotting from the raw prompt.
Do you have a source? Claude Code is the only genetic system that seems to really work well enough to be useful, and it’s equipped with an absolutely absurd amount of testing and redundancy to make it useful.
For a preview of what it'd be like, just tell your AI chat app that you'll run bash commands for it, and please change the app in your "current directory" to "sort the output before printing it", or some such request.
So, yes, it can work.
Context management, both within and across sessions, seems the bigger issue. Without the agent supporting this, you are at the mercy of the model compacting/purging the context as needed, in some generic fashion, as well as being smart enough to decide to create notes for itself tracking what it is doing, etc.
Apparently CC is 512K LOC, which seems massively bloated, but I do think that things like tools, skills, context management and subagents are all needed to effectively manage context and avoid the issues that might be anticipated by just telling the model it's got a bash tool, and go figure.
I just asked Claude, and apparently CC makes it's bash tool available on all platforms it runs on (Linux, macOS, Windows WSL, Git for Windows), and doesn't do platform-specifc filtering of bash commands, which would seem to make for some interesting incompatibilities - GNU utils (sed, grep, find) on Linux and Windows, but BSD variants on macOS.
Okay sure it’s technically more than just bash, but my own for-fun coding agent and pi-coding-agent work this way. The latter is quite useful. You can get surprisingly far with it.