upvote
> I'm either in a minority or a silent majority. Claude Code surpasses all my expectations.

I looked at some stats yesterday and was surprised to learn Cursor AI now writes 97% of my code at work. Mostly through cloud agents (watching it work is too distracting for me)

My approach is very simple: Just Talk To It

People way overthink this stuff. It works pretty good. Sharing .md files and hyperfocusing on various orchestrations and prompt hacks of the week feels as interesting as going deep on vim shortcuts and IDE skins.

Just ask for what you want, be clear, give good feedback. That’s it

reply
Right - I have a ton of coworkers who obsess over "skills" and different ways to run agents and whatnot but I just... spend some time to give very thorough, detailed instructions and it just Does The Thing. I rarely fight with Claude Code these days.
reply
We probably need something like the WET principle for skills. If you need to explain the same thing to an agent more than twice, turn it into a skill (or add it to AGENTS.md, or CLAUDE.md, or to you docs folder, or your guides folder, or whatever method you use). If you haven't needed to explain it more than twice, it's probably fine. The context pollution from the skill would likely be worse than not having the skill

Of course exceptions apply. Some basic information that will reliably be discovered is still worth adding to your AGENTS.md to cut down on token use. But after a couple obvious things you quickly get into the realm of premature optimization (unless you actually measure the effects)

reply
Same here. For me, this means a spec doc split into features/UX, technical requirements, and language-specific requirements, iterated before the model touches code.
reply
The trick is to "just use it", BUT every few weeks grab the logs (you do keep them, right?) and have a session with the model to find out if there are any repeated patterns.

If you find any, consider making them into skills or /commands or maybe even add them to AGENTS.md.

reply
Which logs do you use for that?
reply
I would assume those in ~/.claude/projects/**/*.jsonl. They contain full conversation history, including the tool calls that were made, how man tokens were consumed, etc
reply
Claude has a built-in /insights feature for this, but you can replicate it with any other tool that keeps the session logs on disk.
reply
I agree it works nicely for me. From my experience it’s not realistic to expect one-shot each time. But asking it to build chunks and entering a review cycle with nudging works well. Once I changed my mindset from it « didn’t do a one-shot so it’s crap » and took it as an iterative tool that build pieces that I assemble it’s been working nicely without external frameworks or anything. Plan-review, iterate, split, build, review iterate
reply
You're wasting a ton of tokens doing that though. Right now you don't realize it because they're being heavily subsidized, but you will understand the point of have good orchestration and memory files when you will have to pay the real cost of your use.
reply
> You're wasting a ton of tokens doing that though.

My time is worth more than tokens. I’m thinking of maybe creating some .md files to save me time in code review. If I do it right, it’s going to cost more in tokens because the robots will do more.

reply
Cost cannot go up, only down with time (with occasional short term fluctuations). Competition, including open weight models and consumer hardware (ie upcoming M5 Ultra) keeps moving ceiling of what you can charge down.
reply
If the cost is subsidized by another cash source (e.g. VC money) when the source stops prices can definitely go up.
reply
deleted
reply
Company pays for company’s tokens, so company’s problem, not mine. I am happy to skill up and avoid overusing tokens for my personal sub, but if it’s getting results then I couldn’t care less how much my employer has to pay for it. They’re begging me to use it in the first place anyway.
reply
My experience as well on non trivial stuff for personal projects, just talk... It makes mistakes but considering the code I see in professionnal settings, I rather deal with an agent than third parties.
reply
I love the IDE skins analogy. Very true.
reply
Everyone knows that a red UI skin goes faster
reply
How do you collect these stats?

Is it by characters human typed vs AI generated, or by commit or something?

reply
> How do you collect these stats?

Cursor dashboard. I know they're incentivized to over-estimate but feels directionally accurate when I look at recent PRs.

reply
Are you mostly using the Composer model?
reply
> Are you mostly using the Composer model?

Don’t really think about it. I think when I talk to it through Slack, cursor users codex, in my ide looks like it’s whatever highest claude. In Github comments, who even knows

reply
It's interesting how variable people's experiences seem to be.

Personally, I tend to get crap quality code out of Claude. Very branchy. Very un-DRY. Consistently fails to understand the conventions of my codebase (e.g. keeps hallucinating that my arena allocator zero initializes memory - it does not). And sometimes after a context compaction it goes haywire and starts creating new regressions everywhere. And while you can prompt to fix these things, it can take an entire afternoon of whack-a-mole prompting to fix the fallout of one bad initial run. I've also tried dumping lessons into a project specific skill file, which sometimes helps, but also sometimes hurts - the skill file can turn into a footgun if it gets out of sync with an evolving codebase.

In terms of limits, I usually find myself hitting the rate limit after two or three requests. On bad days, only one. This has made Claude borderline unusable over the past couple weeks, so I've started hand coding again and using Claude as a code search and debugging tool rather than a code generator.

reply
> Very branchy. Very un-DRY.

I've found this can be vastly reduced with AGENTS.md instructions, at least with codex/gpt-5.4.

reply
What sorts of instructions?
reply
Usually I just put something like "Prefer DRY code". I like to keep my AGENTS.md DRY too :)
reply
also add "no hallucinations" and "make it works this time pretty please" while also say Claude will go to jail if does not do it right should work all the time (so like 60%)
reply
There are of course limits to what prompting can do, but it does steer the models.

In TFA they found that prompting mitigates over-editing up to about 10 percentage points.

reply
When I see people talking about Claude Code becoming "unusable" for them recently, I believe them, but I don't understand. It's a deeply flawed and buggy piece of software but it's very effective. One of the strangest things about AI to me is that everyone seems to have a radically different experience.
reply
> One of the strangest things about AI to me is that everyone seems to have a radically different experience.

Because it is that uneven. Some problems it nails at first go or with very little cosmetic changes.

In others it decides on solution, hallucinates parts that do not exist like adding API calls or config options that do not exists and gets the basics wrong.

Similarly you do something that's somewhat common pattern, it usually nails it. If you do something that subtly differs in certain way from a common pattern, it will just do the common pattern and you get something wrong.

reply
>One of the strangest things about AI to me is that everyone seems to have a radically different experience.

I've thought about this and I think the reason is as follows: we hold code written by ourselves to a much higher standard than code written by somebody else. If you think of AI code as your own code, then it probably won't seem very acceptable because it lacks the beauty (partly subjective as all beauty tends to be) that we put into our own code. If you think of it as a coworker's code, then it's usually alright i.e. you wouldn't be wildly impressed with that coworker but it would also not be bad enough to raise a stink.

It follows from this that it also depends on how you regard the codebase that you're working on. Do you think of it as a personal masterpiece or is it some mishmash camel by committee as the codebases at work tend to be?

reply
> everyone seems to have a radically different experience

What people have is radically different expectations.

I noticed engineers will review Claude's output and go "holy crap that's junior-level code". Coders will just commit because looking at the code is a waste of time. Move fast, break things, disrupt, drown yourself into tech debt: the investors won't care anyways.

And no, telling the agent to "be less shit" doesn't work. I have to painstakingly point every single shit architectural decision so Claude can even see and fix it. "Git gud" didn't work for people and doesn't work for LLMs.

It's not that the code isn't DRY, it's just DRY at the wrong points of abstraction, which is even worse than not being DRY. I manage to find better patterns in each and every single task I tell Claude or Copilot to autonomously work on, dropping tons of code in the process (DRY or not). You can't prompt Claude out of making these wrong decisions (at best from very basic mistakes) since they are too granular to even extract a rule.

This is what separates a senior from a junior.

If you think Claude writes good code either you're very lucky, I'm very bad at prompting, or your standards are too low.

Don't get me wrong. I love Claude Code, but it's just a tool in my belt, not an autonomous engineer. Seeing all these "Claude wrote 97% of my code" makes me shudder at the amount of crap I will have to maintain 5 years down the line.

reply
deleted
reply
My workflow is to just use LLMs for small context work. Anything that involves multiple files it truly doesn't do better than what I'd expect from a competent dev.

It's bitten me several times at work, and I rather not waste any more of my limited time doing the re-prompt -> modify code manually cycle. I'm capable of doing this myself.

It's great for the simple tasks tho, most feature work are simple tasks IMO. They were only "costly" in the sense that it took a while to previously read the code, find appropriate changes, create tests for appropriate changes, etc. LLMs reduce that cycle of work, but that type of work in general isn't the majority of my time at my job.

I've worked at feature factories before, it's hell. I can't imagine how much more hell it has become since the introduction of these tools.

Feature factories treat devs as literal assembly line machines, output is the only thing that matters not quality. Having it mass induced because of these tools is just so shitty to workers.

I fully expect a backlash in the upcoming years.

---

My only Q to the OP of this thread is what kind of teacher they are, because if you teach people anything about software while admitting that you no longer write code because it's not profitable (big LOL at caring about money over people) is just beyond pathetic.

reply
I use it through the desktop app, which has a lot of features I appreciate. Today it was implementing a feature. It came across a semi-related bug that wasn’t a stopper but should really be fixed before go live. Instead of tackling it itself or mentioning it at the final summary (where it becomes easy to miss), it triggered a modal inside the Claude app with a description of the issue and two choices: fix in another session or fix in current session. Really good way to preserve context integrity and save tokens!
reply
How to you get CC to connect to your dev container? I have the CC app but it’s kinda useless as I’m not have it barebacking my system, so I’m left with the cli and vs code extension.
reply
I just run CC in a VM. It gets full control over the VM. The VM doesn't have access to my internal networks. I share the code repos it works on over virtiofs so it has access to the repos but doesn't have access to my github keys for pushing and pulling.

This means it can do anything in the VM, install dependencies, etc... So far, it managed to bork the VM once (unbootable), I could have spent a bit of time figuring out what happened but I had a script to rebuild the VM so didn't bother. To be entirely fair to claude, the VM runs arch linux which is definitely easier to break than other distros.

reply
I think on HN atleast. People enamoured by Claude are the vocal majority.

The view of Claude on HN is extremely positive and nearly every thread will have highly positive comment "that is not an ad".

I think people are seeing others just irked by the constant stream what feels like ads and reading it as Claude being somehow disliked.

reply
Same. It's surprisingly good as a labour saving device. It produces code that I would accept without reservations from a coworker. I still read every line and make tweaks, but they're the same tweaks I would ask for in a code review.

I don't measure my productivity, but I see it in the sort of tasks I tackle after years of waiting. It's especially good at tedious tasks like turning 100 markdown files into 5 json files and updating the code that reads them, for example.

reply
I am genuinely interested to know some details:

1. Is a product/software you develop novel? As in does it do something useful and unique? Or it's a product that already exists in many varietes and yours is just "one of ..."?

2. What if one day, LLMs will get regulated/become terrible/raise prices above your budget. Do you have plans for that?

reply
1. Fairly - I definitely don't see any training material about the stuff I do on the internet:D it's really far from your avg front-end app. And of course you can't let any of those make decisions automatically. Remember the IBM quote, "a computer can not be held accountable therefore a computer must not make any management decisions"... Even on completely greenfield and groundbreaking projects there's lots of throwaway code, scaffolding and so on. You contribute the value-add, you use the flanker to speed up the boring and grey parts.

2. Regulation? I'm sceptical that the cat can be put back into the bag. It's already out there. More realistic problem is the business model part - openweight/local provides a counterpoint to that.

reply
I'm in a similar situation

1. Even really novel projects have large chunks of glue code and boring infrastructure that the novel bits depend on. claude means I spend 10% of my time on the borng stuff and 90% of time on stuff I previously onky had 10% of my day to work on. In my experience the software picked up our idioms fast and for context, we have a skill file explaining code standards.

2. codex and gemini are comparable when paired with a good harness (pi.dev). if things ever get really bad, I'll drop 8k on a dedicated agent coding server and run it locally. I tried it recently with my current system and it was sub par but I was running a drasticly simpler model.

reply
Are you writing code that gets reviewed by other people? Were code reviews hard in the past? Do your coworkers care about "code quality" (I mean this in scare quotes because that means different things to different people).

Are you working more on operational stuff or on "long-running product" stuff?

My personal headcanon: this tooling works well when built on simple patterns, and can handle complex work. This tooling has also been not great at coming up with new patterns, and if left unsupervised will totally make up new patterns that are going to go south very quickly. With that lens, I find myself just rewriting what Claude gives me in a good number of cases.

I sometimes race the robot and beat the robot at doing a change. I am "cheating" I guess cuz I know what I want already in many cases and it has to find things first but... I think the futzing fraction[0] is underestimated for some people.

And like in the "perils of laziness lost"[1] essay... I think that sometimes the machine trying too hard just offends my sensibilities. Why are you doing 3 things instead of just doing the one thing!

One might say "but it fixes it after it's corrected"... but I already go through this annoying "no don't do A,B, C just do A, yes just that it's fine" flow when working with coworkers, and it's annoying there too!

"Claude writes thorough tests" is also its own micro-mess here, because while guided test creation works very well for me, giving it any leeway in creativity leads to so many "test that foo + bar == bar + foo" tests. Applying skepticism to utility of tests is important, because it's part of the feedback loop. And I'm finding lots of the test to be mainly useful as a way to get all the imports I need in.

If we have all these machines doing this work for us, in theory average code quality should be able to go up. After all we're more capable! I think a lot of people have been using it in a "well most of the time it hits near the average" way, but depending on how you work there you might drag down your average.

[0]: https://blog.glyph.im/2025/08/futzing-fraction.html [1]: https://bcantrill.dtrace.org/2026/04/12/the-peril-of-lazines...

reply
> My personal headcanon: this tooling works well when built on simple patterns, and can handle complex work. This tooling has also been not great at coming up with new patterns, and if left unsupervised will totally make up new patterns that are going to go south very quickly. With that lens, I find myself just rewriting what Claude gives me in a good number of cases.

I've been doing a greenfield project with Claude recently. The initial prototype worked but was very ugly (repeated duplicate boilerplate code, a few methods doing the same exact thing, poor isolation between classes)... I was very much tempted to rewrite it on my own. This time, I decided to try and get it to refactor so get the target architecture and fix those code quality issues, it's possible but it's very much like pulling teeths... I use plan mode, we have multiple round of reviews on a plan (that started based on me explaining what I expect), then it implements 95% of it but doesn't realize that some parts of it were not implemented... It reminds me of my experience mentoring a junior employee except that claude code is both more eager (jumping into implementation before understanding the problem), much faster at doing things and dumber.

That said, I've seen codebases created by humans that were as bad or worse than what claude produced when doing prototype.

reply
You hinted at an aspect I probably haven't considered enough: The code I'm working on already has many well-established, clean patterns and nearly all of Claude's work builds on those patterns. I would probably have a very different experience otherwise.
reply
I legit think this is the biggest danger with velocity-focused usage of these tools. Good patterns are easy to use and (importantly!) work! So the 32nd usage of a good pattern will likely be smooth.

The first (and maybe even second) usage of a gnarly, badly thought out pattern might work fine. But you're only a couple steps away from if statement soup. And in the world where your agent's life is built around "getting the tests to pass", you can quickly find it doing _very_ gnarly things to "fix" issues.

reply
I’ve seen ai coding agents spin out and create 1_000 line changesets that I have to stop before they are 10_000. And then I look at the problem and change one line instead.
reply
This is it right here. Claude loves to follow existing patterns, good or bad. Once you have a solid foundation, it really starts to shine.

I think you're likely in the silent majority. LLMs do some stupid things, but when they work it's amazing and it far outweighs the negatives IMHO, and they're getting better by leaps and bounds.

I respect some of the complaints against them (plagiarism, censorship, gatekeeping, truth/bias, data center arms race, crawler behavior, etc.), but I think LLMs are a leap forward for mankind (hopefully). A Young Lady's Illustrated Primer for everyone. An entirely new computing interface.

reply
We noticed this and spent a week or two going through and cleaning up tests, UI components, comments, and file layout to be a lot more consistent throughout the codebase. Codebase was not all AI written code - just many humans being messy and inconsistent over time as they onboard/offboard from the project.

Much like giving a codebase to a newbie developer, whatever patterns exist will proliferate and the lack of good patterns means that patterns will just be made up in an ad-hoc and messy way.

reply
You haven't answered the question though. Are your code peer reviewed? Are they part of client-facing product? No offense, I like what you are doing, but I wouldn't risk delegation this much workload in my day job, even though there is a big push towards AI.
reply
I feel the same way. Doesn't make sense economically or even in good faith for me to use company paid time writing code for line of business apps at anymore and I'm 28 years into this kind of work.
reply
To people stating these high commit numbers: What is your average changeset size? I have found that having agent do large changes (few hundred lines or more) results in a lot of friction for me and it feels like at some point I leave a happy path where instead of moving quickly I get dragged down.
reply
> I intend for this post to be a question: what am I doing that makes Claude profoundly effective?

I'm fascinated by this question.

I think the first two sections of this article point towards an answer: https://aphyr.com/posts/412-the-future-of-everything-is-lies...

I've personally had radically different experiences working on different projects, different features within the same project, etc.

reply
I used Claude to help me with a function once and it added a memory leak, it wouldn’t have been noticeable to most people but I saw. I still write my own code and find LLMs frustrating because they almost get it right and it’s just more efficient for me to just write the code correctly instead of having an LLM write something that’s almost correct and me fixing it after the fact.

I can’t wait for all the future vibe coded projects to be exploited by the black hats waiting in the shadows for things to reach a critical state. I don’t believe in anthropic because they love to lie.

reply
The article has a benchmark and Opus has best score in two categories and the second-best in another (there are only three categories). Opus is probably the best choice when it comes to producing readable code right now. GPT (for example) lags way behind.
reply
Anecdotally it’s the exact opposite for me: gpt 5.4 is leagues ahead of opus for the kind of backend work I do. Opus keeps making stupid mistakes while overengineering the irrelevant parts. However when I have to work on the backoffice ui, I still pick opus.
reply
I think a lot of use have implemented our own ad hoc self-improvement checks into our agentic workflows. My observations are the same as yours.
reply
The silent majority of GenAI praise reaches the top of the thread again.

Edit: The lurkers and the commenters must be a pretty different set of people I suppose.

reply
Is your claude.md, skills or other settings that you have honed public?
reply
Sorry, no, and they're highly project specific anyway. I just started with the "/init" skill a few weeks ago and gradually improved it from there.
reply
Which subscription tier are you using?
reply
I'm on the $200/month max plan.
reply
Makes sense, maybe it is worth it...
reply
How much does it cost though?

This is the problem.

I think there is a huge gap between people on salaries getting effectively more responsibility by being given spend that they otherwise would not have had and people hustling on projects on their own.

Yes it is 100% what I use but I am never happy with usage. It burns up by sub fast and there is little feelings of control. Experiments like using lower tier models are hard to understand in reality. Graphify might work or it might not. I have no idea.

reply
Wait till you try codex so you don’t have to keep saying ‘don’t be lazy’
reply