undefined

upvote

points

by ctoth4 hours ago |

upvote

by bcherny2 hours ago|

[-]

Christopher, would you be able to share the transcripts for that repo by running /bug? That would make the reports actionable for me to dig in and debug.

reply

upvote

by quietsegfault4 hours ago|

[-]

I’m not sure being confrontational like this really helps your case. There are real people responding, and even if you’re frustrated it doesn’t pay off to take that frustration out on the people willing to help.

reply

upvote

by ctoth3 hours ago|

[-]

Fair point on tone. It's a bit of a bind isn't it? When you come with a well-researched issue as OP did, you get this bland corporate nonsense "don't believe your lyin' eyes, we didn't change anything major, you can fix it in settings."

How should you actually communicate in such a way that you are actually heard when this is the default wall you hit?

The author is in this thread saying every suggested setting is already maxed. The response is "try these settings." What's the productive version of pointing out that the answer doesn't address the evidence? Genuine question. I linked my repo because it's the most concrete example I have.

reply

upvote

by enraged_camel1 hours ago|

[-]

I read the entire performance degradation report in the OP, and Boris's response, and it seems that the overwhelming majority of the report's findings can indeed be explained by the `showThinkingSummaries` option being off by default as of recently.

reply

upvote

by wonnage3 hours ago|

[-]

Just use a different tool or stop vibe coding, it’s not that hard. I really don’t understand the logic of filing bug reports against the black box of AI

reply

upvote

by geysersam1 hours ago|

[-]

People file tickets against closed source "black box" systems all the time. You could just as well say: Stop using MS SQL, just use a different tool, it's not that hard.

reply

upvote

by malfist4 hours ago|

[-]

Is somebody saying "you're holding it wrong" a "people willing to help"?

reply

upvote

by TeMPOraL2 hours ago|

[-]

They are if you are, in fact, holding it wrong.

As was the usual case in most of the few years LLMs existed in this world.

Think not of iPhone antennas - think of a humble hammer. A hammer has three ends to hold by, and no amount of UI/UX and product design thinking will make the end you like to hold to be a good choice when you want to drive a Torx screw.

reply

upvote

by Retr0id3 hours ago|

[-]

[flagged]

reply

upvote

by throwaway6137463 hours ago|

[-]

[dead]

reply

upvote

by BigTTYGothGF3 hours ago|

[-]

The stated policy of HN is "don't be mean to the openclaw people", let's see if it generalizes.

reply

upvote

by lambda3 hours ago|

[-]

I guess one of the things I don't understand: how you expect a stochastic model, sold as a proprietary SaaS, with a proprietary (though briefly leaked) client, is supposed to be predictable in its behavior.

It seems like people are expecting LLM based coding to work in a predictable and controllable way. And, well, no, that's not how it works, and especially so when you're using a proprietary SaaS model where you can't control the exact model used, the inference setup its running on, the harness, the system prompts, etc. It's all just vibes, you're vibe coding and expecting consistency.

Now, if you were running a local weights model on your own inference setup, with an open source harness, you'd at least have some more control of the setup. Of course, it's still a stochastic model, trained on who knows what data scraped from the internet and generated from previous versions of the model; there will always be some non-determinism. But if you're running it yourself, you at least have some control and can potentially bisect configuration changes to find what caused particular behavior regressions.

reply

upvote

by dev_l1x_be2 hours ago|

[-]

The problem is degradation. It was working much better before. There are many people (some example of a well know person[0]), including my circle of friends and me who were working on projects around the Opus 4.6 rollout time and suddenly our workflows started to degrade like crazy. If I did not have many quality gates between an LLM session and production I would have faced certain data loss and production outages just like some famous company did. The fun part is that the same workflow that was reliably going through the quality gates before suddenly failed with something trivial. I cannot pinpoint what exactly Claude changed but the degradation is there for sure. We are currently evaling alternatives to have an escape hatch (Kimi, Chatgpt, Qwen are so far the best candidates and Nemotron). The only issue with alternatives was (before the Claude leak) how well the agentic coding tool integrates with the model and the tool use, and there are several improvements happening already, like [1]. I am hoping the gap narrows and we can move off permanently. No more hoops, you are right, I should not have attempted to delete the production database moments.

https://x.com/theo/status/2041111862113444221

https://x.com/_can1357/status/2021828033640911196

reply

upvote

by stavros2 hours ago|

[-]

Same as how I expect a coin to come up heads 50% of the time.

reply

upvote

by muyuu50 minutes ago|

[-]

If you get consistently nowhere near 50% then surely you know you're not throwing a fair coin? What would complaining to the coin provider achieve? Switch coins.

*typo

reply

upvote

by stavros48 minutes ago|

[-]

Well I'm paying the coin to be near 50% and the coin's PM is listening to customers, so that's why.

reply

upvote

by muyuu15 minutes ago|

[-]

The coin's PM is spamming you trivial gaslighting corporate slop, most of it barely edited.

reply

upvote

by randomNumber71 hours ago|

[-]

> how you expect a stochastic model [...] is supposed to be predictable in its behavior.

I used it often enough to know that it will nail tasks I deem simple enough almost certainly.

reply

upvote

by malfist4 hours ago|

[-]

It also completely ignores the increase in behavioral tracking metrics. 68% increase in swearing at the LLM for doing something wrong needs to be addressed and isn't just "you're holding it wrong"

reply

upvote

by alchemist1e92 hours ago|

[-]

I’m think a great marketing line for local/selfhosted LLMs in the future - “You can swear at your LLM and nobody will care!”

reply

upvote

by dang1 hours ago|

[-]

Please don't post this aggressively to Hacker News. You can make your substantive points without that.

https://news.ycombinator.com/newsguidelines.html

reply

upvote

by iwalton34 hours ago|

[-]

[dead]

reply