It's a little bit like being T2/T3 customer support [or support engineer], but internal. You're there to catch the dangerous spots, the weird edge cases, and to make sure that everything is set up correctly, rather than to solve 100% of the routine problems yourself.
There's also plenty of room for cross-cutting-concerns, of course
I use Claude Code (Opus 4.6 at max effort) all day long, and I genuinely don't understand how this is possible. Is that usage paying off?
This is very likely due to my lack of understanding, but... how?
Wait... when some Claude 5x/20x users say they are getting "$2000 of tokens for $100," does the 2k value include cached tokens, counted at the same $/token either way?
We cannot be this dumb as a community, can we? I must be wrong/misunderstanding..
One spent 200,000 tokens, to produce 10,000.
The other spent 1.9 million.
It could have been a single LLM call (10k tokens). lmao
(I note that the latter was designed by a company whose main source of revenue is token spend...)
I have been "agentic coding" since Sonnet 3.5 and after this paper came out, it became my bible:
https://github.com/adobe-research/NoLiMa
Last I checked, all models suck as you fill the context window. "Context engineering" is how you do this whole thing.
That said, they do make excellent tools to quickly try out new ideas and dive into them; they can even be great learning accelerators if you have a curious mind.
I very recently looked at the codebase of a vibe-coded app made by someone with domain expertise but no software dev experience.
It was very clear to me that he had described it from his POV to an AI, and the AI had implemented features in a manner that technically worked, but made future maintenance or expansion extremely tricky, which is why he was now looking for a dev.
For example, in his data schema, for every item on a menu, instead of simply having an array property like so for ingredients:
items["latte"]["ingredients"] = ["water", "milk", "sugar", ...]
He had individual flags for every item for every possible ingredient it could have or not have: items["latte"]["has_milk"] = true
items["latte"]["has_nutmeg"] = false
items["latte"]["has_cinnamon"] = false
items["latte"]["has_sugar"] = true
...
This technically worked and passed tests from his POV at an MVP level. But added a lot of complications when actually trying to build more features or when a new menu item had ingredients the founder hadn't thought to include in the schema beforehand.I totally get how he ended up where he did though. While describing it to the AI, he probably said something like "store info on each menu item's ingredients, they might have milk or coffee or sugar", and the AI created individual flags for them and he didn't think to question it, because he didn't know what's "right" or "wrong", but then as he kept building the AI stuck with keeping individual flags instead of swapping it out with an array mechanism, and he couldn't have known the correct way to implement it.
Only a dev with experience would know how to describe the system to an AI model to get an output that works well, and how to assess the quality of its output beyond what can be assessed through the basic UI. This wasn't a QA failure, it was a design failure.
I found AI to be pretty bad with like a bare bones code base without solid patterns in place already. It works but it's just monolithic files galore. use effects hooks everywhere. Nasty state situations with poor data practices. Security vulnerabilities up the wazoo.
It's weird to have this conversion with them. Like yeah your code works but it's so tangled up it's hard to reason about where to start to begin to unwind it all sometimes.
It can be done but cleaning up someone else's slop is the exact reason why I hate AI. It was hard enough to review great code and be critical, honest, and fair but we knew it was an essential part of the process, helped build shared understanding, and was a way to learn from one another.
Whereas throwing in jumbled garbage to review just feels like a waste of our brain cells we spent decades earning by embracing the craft.
Wrestling with a code generator also creates a sunk cost fallacy where progress grinds to a halt but you still try and use the tools to fix the problems the tools created. Or you go in and fix things yourself, in a codebase you don't truly understand. A single developer can recreate the contextual nightmare miasma of a large corporation all by themselves.
There's also an emerging market consideration: MVP are easy to build so time to market is no longer hard to achieve. It's not a differentiator.
X was built in 3 days but is slow and riddled with bugs and security errors. There are also A, B, C, D and E which are effectively the same thing built just as fast.
Z was built over six months and is rock solid and performant.
Who wins the market share?
Time and time again, the market proves worse is better, from the format wars of the 80's and 90's, to Microsoft Windows still being dominant (and oh yeah, Teams). Sometimes quality does win, but if being built in 3 days means they can make a profit charging 1/100th the price of Z, I wouldn't count the cheap ones out of the game just because Z is better.
Though the market so far has had a lower limit on "worse". We're finding out how low we can go before consumers start valuing quality again.
Probably similar to hand writing notes (while digesting + synthesizing and not just being a scribe) vs reading notes somebody else took.
It's similar to the 80/20 rule. When you're coding and designing from the hip, you'll do pretty well for awhile, but as you near completion, you can't quite tie up all the loose design ends. That's the part where it's probably better to just design fully to 100% first and then build, which is closer to what happens when the roles are separate. At least in my experience. I will say though that that part where you're designing in code (productively or wastefully) is pretty fun. At least until you hit the wall and get frustrated with how often you've deleted and rewrote the same thing ten times.
I've heard this story at least 3 times already:
- Domain expertise combined with outsource could replace expensive US SWE
- Domain expertise combined with SWE could replace QA
- Domain expertise combined with SWE could replace infra engineers
Why is everyone so preoccupied with replacing someone with someone instead of doing their fucking job?
The dunning kruger effect is in full swing as people think AI replaces the domain expert need.
Most of the value in the expert isnt the 80% but the tail 20% or 10% where AI fails. For a one of personal app or website, 80% is plenty but only that.