Otherwise you should look into running e.g. Qwen3.5-35B-A3B or Qwen3.5-27B on your own computer. They're not Opus-level but from what I've heard they're capable for smaller tasks. llama.cpp works well for inference; it works well on both CPU and GPUs and even split across both if you want.
otherwise check the list of providers on openrouter and you can see the pricing, quantisation, sign up directly rather than via a router. ensure to get caching prices, do not get input/output API prices.
GLM 5 is a frontier model, Kimi 2.5 is similar with vision support, Minimax M2.7 is a very capable model focused on tool calling.
If you need server side web search, you could use the Z AI API directly, again ZDR; or Friendli AI; or just install a search mcp.
For the harness opencode is the normal one, it has subagents and parallel tool calling; or just use claude code by pointing it at the anthropic APIs of various providers like fireworks.
And no, they're not as capable as SOTA models. Not by far.
However they can help reduce your token expenditure a lot by routing them the low-hanging fruit. Summaries, translations, stuff like that.