upvote
> Moat seems to be shrinking fast.

It's been a moving target for years at this point.

Both open and closed source models have been getting better, but not sure if the open source models have really been closing the gap since DeepSeek R1.

But yes: If the top closed source models were to stop getting better today, it wouldn't take long for open source to catch up.

reply
Can you talk more about how you leverage higher quality models for the stuff that counts? Anywhere I can read more on the philosophy of when to use each?
reply
Sure happy to share. It’s been trial and error, but I’ve learned that for agents to reliably ship a large feature or refactor, I need a good spec (functional acceptance criteria) and I need a good plan for sequencing the work.

The big expensive models are great at planning tasks and reviewing the implementation of a task. They can better spot potential gotchas, performance or security gaps, subtle logic and nuance that cheaper models fail to notice.

The small cheap models are actually great (and fast) at generating decent code if they have the right direction up front.

So I do all the spec writing myself (with some LLM assistance), and I hand it to a Supervisor agent who coordinates between subagents. Plan -> implement -> review -> repeat until the planner says “all done”.

I switch up my models all the time (actively experimenting) but today I was using GPT 5.4 for review and planning, costing me about $0.4-$1 for a good sized task, and Kimi for implementation. Sometimes my spec takes 4-5 review loops and the cost can add up over an 8 hour day. Still cheaper than Claude Max (for now, barely).

Each agent retains a fairly small context window which seems to keep costs down and improves output. Full context can be catastrophic for some models.

As for the spec writing, this is the fun part for me, and I’ve been obsessing over this process, and the process of tracking acceptance criteria and keeping my agents aligned to it. I have a toolkit cooking, you can find in my comment history (aiming to open source it this week).

reply
How are you managing context?

I'm building a full stack web app, simple but with real API integrations with CC.

Moving so fast that I can barely keep a hold on what I'm testing and building at the same time, just using Sonnet. It's not bad at all. A lot of the specs develop as I'm testing the features, either as an immediate or a todo / gh issue.

How can you manage an agentic flow?

reply
The moat is having researchers that can produce frontier models. When OpenCode starts building frontier models, then I'd be worried; otherwise they're just another wrapper
reply
Of course, my point is that these trailing models are close behind, and cost me a lot less, and work great with harnesses like OpenCode.
reply
"OpenCode Go" (a subscription) lets you use lots of hosted open-weights frontier AI models, such as GLM-5 (currently right up there in the frontier model leaderboards) for $10 per month.
reply