undefined

points

by nsoonhui11 hours ago |

comments

by taspeotis11 hours ago|

[-]

The harness is super important, what tools are available and the system prompts vary from harness to harness.

Anthropic seems to have a modest lead on their harness and models, so it’s a best-of-both-worlds scenario.

> I'm not sure what Microsoft is doing behind the scenes

It’s probably the exact same model, but the tools and the prompts around it are worse, so you get worse results.

by irthomasthomas8 hours ago|

parent|

[-]

Claude in Claude code has been shown to perform persistently worse in evals than claude + a minimal harness.

by kilburn9 hours ago|

parent|

prev|

[-]

The harness was absolutely not an issue in my case.

The new pricing model where I got banned from using Opus entirely and half a day of work (with weaker models) consumed the 10$ plan was.

I'm now using a Claude Max subscription and I can get close to the daily limits but I'm fairly happy with the overall plan consumption.

by Vinnl11 hours ago|

parent|

prev|

[-]

So if you use Claude via Copilot in Zed... You use Zed's harness, I think? What does Copilot do, at that point?

by acpdev10 hours ago|

parent|

[-]

I believe you are using https://github.com/github/copilot-cli or potentially this https://github.com/github/copilot-language-server-release#ag... via the Agent Client Protocol https://github.com/agentclientprotocol/agent-client-protocol which means you are indeed using Copilot's harness

ACP is just a standard that bridges harnesses easily into IDEs, Text Editors, or whatever consumes it (I wrote a TUI that consumes them)

The registry for all the agents (tool harnesses) is here https://github.com/agentclientprotocol/registry if you ever are curious to what Zed or IntelliJ are really hooking into

by Vinnl7 hours ago|

parent|

[-]

Ah OK, so the ACP connector ensures tool calls work with Zed, and communicates the available tools and their results to the harness, and then the harness mainly provides a system prompt and the API calls?

by pantulis10 hours ago|

parent|

prev|

[-]

It’s providing the inference of Anthropic models

by arikrahman11 hours ago|

prev|

[-]

I had a similar experience moving away from Copilot within Zed. Now using the reasonix harness for Deepseek that makes cache hits almost free. And that's with unsubsidized American providers like Digital Ocean or Cloudflare.

by toyg10 hours ago|

parent|

[-]

I tried using Zed but with local models it constantly breaks on tool calls. I wanted to like it but the smell of vibing is just too much.

by arikrahman1 hours ago|

parent|

[-]

Likewise, and that's with state of the art technology. I wish a true self-contained binary for Reasonix Desktop was released, for now I have to settle for providing a Flake.nix environment. It isn't nearly as fickle as Zed, but I wish they leveraged that power of the Go toolset more.

by arcanemachiner10 hours ago|

parent|

prev|

[-]

You using models released this year? I hear this complaint a lot, and it's often due to using an old model which is not as good at tool calling as newer models.

by spockz8 hours ago|

parent|

[-]

What I noticed is that when the conversation starts the agent is pretty able to read from and write to files. As the conversation continues (and maybe sub agents are spawned) it forgets how to do this, complains, tries to resort to running shell or python code, sometimes it works. Sometimes it asks me to execute the code. If I refuse and point out it worked before than sometimes it remembers how to write, but mostly not and I need to start a new session.

When using Zed with the CoPilot integration I use Claude Opus and never had this issue.

by toyg8 hours ago|

parent|

prev|

[-]

Qwen 3.6 and 3.5...

by sydneypan7 hours ago|

parent|

prev|

[-]

Yep reasonix is an absolute case study of caching. They literally compiled byte level cache in their design and it is insane. i can one shot many workflows, apps in under 0.05 cents.

by k__11 hours ago|

parent|

prev|

[-]

Nice.

I paid $6 yesterday for DeepSeek V4 Flash on OpenRouter. That's like $120 dollar for a month, and it's not even a good model.

by bel811 hours ago|

parent|

[-]

For DS4 it's much cheaper and reputable to use OpenCode Go $10/mo subscription, or directly with DeepSeek API.

by arikrahman1 hours ago|

parent|

[-]

Sometimes $10 is more than I'll do with API tokens. I prefer the top up scheme for peace of mind, but the deal does sound generous. The only concern is sustainability, similar to subsidized copilot pricing having to change.

by k__10 hours ago|

parent|

prev|

[-]

Thanks!

I'll try that.

by epolanski11 hours ago|

parent|

prev|

[-]

That's quite an achievement, I managed to spend only 2$ on 16 different tasks of v4 pro.

by k__10 hours ago|

parent|

[-]

Yeah, v4 flash is dirt cheap, but it's running in circles quite often.

Might very well be that a better model is cheaper if it gets things right the first try.

Maybe I should route to a better model when v4flash hasn't solved after a specific number of tokens.

by russelg7 hours ago|

parent|

[-]

I'm having great success with DS4 Pro as my main model, while using DS4 Flash for subagents.

by VortexLain8 hours ago|

parent|

prev|

[-]

What is the average monthly token price for daily reasonix use?

by arikrahman1 hours ago|

parent|

[-]

For me it's about $5 of work, where I've done equivalent work for about $200.

by happyweasel10 hours ago|

prev|

[-]

Same ,I switched to cursor. I told it how to invoke msbuild and it can edit away without needing a native Visual studio plugin.. no problems at all. Target language c++

by seanieb10 hours ago|

prev|

[-]

GitHub Copilot costs have ballooned in recent week, what once took $100 requires $300. I like using Claude with VS Code through Copilot and I feel it’s given me much better code, that I can control the quality. It’s much more transparent than Claude Code. It’s open source but and the IDE interface gives so many more features to have you context and control over whats generated. The increase in cost isn’t purely due to their price increases but also the Opus models agents use more tokens. So I’ve moved to Claude Code and I’m happily still using Opus 4.6. Fable and 4.7 seem to do much larger units of work, go off on tangents and make assumptions that frequently results in slop.

by altmanaltman11 hours ago|

prev|

[-]

My copilot quota finished in maybe 2-3 prompts with claude 4.8 opus. i was expecting it to suck but not this bad. it was good while it lasted though