undefined

upvote

points

by azuanrb20 hours ago |

upvote

by jaggederest19 hours ago|

[-]

Where I am headed, I think, is to basically be a platform engineer. The job is to create the guardrails, validation, prompt library, and both agent and manual reviews; that keeps the domain experts safe when they start using coding agents.

It's a little bit like being T2/T3 customer support [or support engineer], but internal. You're there to catch the dangerous spots, the weird edge cases, and to make sure that everything is set up correctly, rather than to solve 100% of the routine problems yourself.

There's also plenty of room for cross-cutting-concerns, of course

reply

upvote

by brandensilva16 hours ago|

[-]

Eventually infrastructure will be more simple to orchestrate too without faults I suspect from well developed devops harnesses. The risk and scale companies are willing to accept will still fall on humans for some time even then. I don't see most people vibe coding a million user app that has deeper needs than the basics we see now.

reply

upvote

by trojans129012 hours ago|

[-]

Can you elaborate more on this type of role? Stack? Etc.

reply

upvote

by jaggederest29 minutes ago|

[-]

I honestly just go with whatever the company is using - these days often typescript, and I build tooling and systems that catch errors and review PRs produced partially or completely by the domain experts. Nothing fancy about it, just good old engineering, where when an issue arises you create a test for it and make sure it can never reoccur, educate users, set up the correct infra, and lock down permissions (it's never been easier or more fun to set up an incredibly draconian role in e.g. aws IAM)

reply

upvote

by consumer45119 hours ago|

[-]

> I used billions of tokens last month alone.

I use Claude Code (Opus 4.6 at max effort) all day long, and I genuinely don't understand how this is possible. Is that usage paying off?

This is very likely due to my lack of understanding, but... how?

reply

upvote

by letitgo1234519 hours ago|

[-]

Long codex sessions lead to a lot of cached token hits, esp when you resume them after a few hours.

reply

upvote

by consumer45117 hours ago|

[-]

I personally don't count cached hits as $used... Neither in my harnesses, nor in the LLM-enabled apps I create. A cached token cannot be counted 1:1 as to a non-cached token, that would be silly.

Wait... when some Claude 5x/20x users say they are getting "$2000 of tokens for $100," does the 2k value include cached tokens, counted at the same $/token either way?

We cannot be this dumb as a community, can we? I must be wrong/misunderstanding..

reply

upvote

by SatvikBeri15 hours ago|

[-]

I'm a fairly moderate user, never hit any kind of usage limits, but I used 44 million cache create tokens and 1.5 billion cache read tokens, which ccusage estimates would have cost $990, and calculates the different categories separately.

reply

upvote

by andai19 hours ago|

[-]

Vibe coded a simple game (10,000 tokens of source code) with two popular coding agents. (Once each, to compare.)

One spent 200,000 tokens, to produce 10,000.

The other spent 1.9 million.

It could have been a single LLM call (10k tokens). lmao

(I note that the latter was designed by a company whose main source of revenue is token spend...)

reply

upvote

by crab_galaxy18 hours ago|

[-]

What about the other 998 million tokens?

reply

upvote

by andai3 hours ago|

[-]

Ya got me there. Maybe he's running OpenClaw?

reply

upvote

by stronglikedan17 hours ago|

[-]

lots and lots of simple games

reply

upvote

by skeptic_ai18 hours ago|

[-]

Don’t forget context. Basically I have 2 billion input and 1 million output. Every prompt you do, sends back the whole thing again and again. Let’s say you have 500k context used, you send 10 messages is 5 million. 100 messages 50 million. Use 5 threats is 250 million.

reply

upvote

by consumer45118 hours ago|

[-]

But how is it even possible (bad harness?), or wise, to send 500k or 1M tokens per call? Regarding cache, how are you not hitting the 1hr cache? Also, start new chats early and often!

I have been "agentic coding" since Sonnet 3.5 and after this paper came out, it became my bible:

https://github.com/adobe-research/NoLiMa

Last I checked, all models suck as you fill the context window. "Context engineering" is how you do this whole thing.

reply

upvote

by azuanrb9 hours ago|

[-]

[dead]

reply

upvote

by jonkoops19 hours ago|

[-]

Honestly, this is my experience as well. LLMs make it easier to explore other domains, but they do not make you the master of one; you still need expert domain knowledge.

That said, they do make excellent tools to quickly try out new ideas and dive into them; they can even be great learning accelerators if you have a curious mind.

reply

upvote

by xkcd-sucks19 hours ago|

[-]

Domain expertise combined with a QA mindset could replace SWE, but consistent QA mindset is rare

reply

upvote

by notRobot19 hours ago|

[-]

I agree that a consistent QA mindset is rare, but I'm not sure even if present if it's enough to replace an SDE.

I very recently looked at the codebase of a vibe-coded app made by someone with domain expertise but no software dev experience.

It was very clear to me that he had described it from his POV to an AI, and the AI had implemented features in a manner that technically worked, but made future maintenance or expansion extremely tricky, which is why he was now looking for a dev.

For example, in his data schema, for every item on a menu, instead of simply having an array property like so for ingredients:

    items["latte"]["ingredients"] = ["water", "milk", "sugar", ...]

He had individual flags for every item for every possible ingredient it could have or not have:

    items["latte"]["has_milk"] = true
    items["latte"]["has_nutmeg"] = false
    items["latte"]["has_cinnamon"] = false
    items["latte"]["has_sugar"] = true
    ...

This technically worked and passed tests from his POV at an MVP level. But added a lot of complications when actually trying to build more features or when a new menu item had ingredients the founder hadn't thought to include in the schema beforehand.

I totally get how he ended up where he did though. While describing it to the AI, he probably said something like "store info on each menu item's ingredients, they might have milk or coffee or sugar", and the AI created individual flags for them and he didn't think to question it, because he didn't know what's "right" or "wrong", but then as he kept building the AI stuck with keeping individual flags instead of swapping it out with an array mechanism, and he couldn't have known the correct way to implement it.

Only a dev with experience would know how to describe the system to an AI model to get an output that works well, and how to assess the quality of its output beyond what can be assessed through the basic UI. This wasn't a QA failure, it was a design failure.

reply

upvote

by brandensilva13 hours ago|

[-]

I have found this to be the case as well. As developers we are just really good stewards of the code because we obviously have knowledge to make sure that the code is engineered in a way that it can scale and grow without tech debt becoming unwieldy.

I found AI to be pretty bad with like a bare bones code base without solid patterns in place already. It works but it's just monolithic files galore. use effects hooks everywhere. Nasty state situations with poor data practices. Security vulnerabilities up the wazoo.

It's weird to have this conversion with them. Like yeah your code works but it's so tangled up it's hard to reason about where to start to begin to unwind it all sometimes.

It can be done but cleaning up someone else's slop is the exact reason why I hate AI. It was hard enough to review great code and be critical, honest, and fair but we knew it was an essential part of the process, helped build shared understanding, and was a way to learn from one another.

Whereas throwing in jumbled garbage to review just feels like a waste of our brain cells we spent decades earning by embracing the craft.

reply

upvote

by michaelchisari19 hours ago|

[-]

I disagree. At some point of complexity, building it yourself is faster, better and (as we're finding out) cheaper. And more fun, although that varies person to person.

Wrestling with a code generator also creates a sunk cost fallacy where progress grinds to a halt but you still try and use the tools to fix the problems the tools created. Or you go in and fix things yourself, in a codebase you don't truly understand. A single developer can recreate the contextual nightmare miasma of a large corporation all by themselves.

There's also an emerging market consideration: MVP are easy to build so time to market is no longer hard to achieve. It's not a differentiator.

X was built in 3 days but is slow and riddled with bugs and security errors. There are also A, B, C, D and E which are effectively the same thing built just as fast.

Z was built over six months and is rock solid and performant.

Who wins the market share?

reply

upvote

by fragmede17 hours ago|

[-]

Who's got better marketing? Is it even a product that customers care about rock solid and performant? Which ones cheaper and has the least friction to getting started? Which one's CEO golfs with your company's CEO?

Time and time again, the market proves worse is better, from the format wars of the 80's and 90's, to Microsoft Windows still being dominant (and oh yeah, Teams). Sometimes quality does win, but if being built in 3 days means they can make a profit charging 1/100th the price of Z, I wouldn't count the cheap ones out of the game just because Z is better.

reply

upvote

by michaelchisari16 hours ago|

[-]

My comment was more "all things being equal."

Though the market so far has had a lower limit on "worse". We're finding out how low we can go before consumers start valuing quality again.

reply

upvote

by eggplantemoji6919 hours ago|

[-]

Personally my ability to understand atrophies / is reduced when compared to writing code ‘myself’ rather than fully being a reviewer.

Probably similar to hand writing notes (while digesting + synthesizing and not just being a scribe) vs reading notes somebody else took.

reply

upvote

by cm1118 hours ago|

[-]

I'm guessing there's some science or research behind this, but I agree. Similarly, I've had projects where I did everything fairly solo—programmed, designed ux/ui, maybe validated with users, etc. It was significantly harder, particularly in the phase where you're working between the first two and the idea isn't perfectly set. It worked much better to design, then build in explicit steps, but it was so easy to start coding, have the design looking and feeling okay, then start iterating on the design—but iterating in code rather than Figma or wherever. It's fine for a little while, but you realize you've spent a day (maybe more) doing it in this less efficient way.

It's similar to the 80/20 rule. When you're coding and designing from the hip, you'll do pretty well for awhile, but as you near completion, you can't quite tie up all the loose design ends. That's the part where it's probably better to just design fully to 100% first and then build, which is closer to what happens when the roles are separate. At least in my experience. I will say though that that part where you're designing in code (productively or wastefully) is pretty fun. At least until you hit the wall and get frustrated with how often you've deleted and rewrote the same thing ten times.

reply

upvote

by wiseowise8 hours ago|

[-]

> Domain expertise combined with a QA mindset could replace SWE, but consistent QA mindset is rare

I've heard this story at least 3 times already:

- Domain expertise combined with outsource could replace expensive US SWE

- Domain expertise combined with SWE could replace QA

- Domain expertise combined with SWE could replace infra engineers

Why is everyone so preoccupied with replacing someone with someone instead of doing their fucking job?

reply

upvote

by nradov18 hours ago|

[-]

You can't test quality into a product. Regardless of how much of a "QA mindset" you have, you can only ever find a fraction of defects and technical debt through external testing. This can be good enough for a throwaway app that will only be used by a limited customer base for a limited time. But that approach quickly bogs down if you try to scale it into a product that will be used indefinitely by a huge set of external customers. At some point velocity drops to near zero because the code base is such a mess that it becomes impossible to add new features without causing regression defects or breaking backward compatibility.

reply

upvote

by goosejuice18 hours ago|

[-]

The engineering part of software engineering is the hard part for LLMs. How is that replaceable with these skills?

reply

upvote

by rustystump19 hours ago|

[-]

I don’t think so. Most things are sufficiently complicated enough to require multiple domain experts working together to achieve a goal.

The dunning kruger effect is in full swing as people think AI replaces the domain expert need.

Most of the value in the expert isnt the 80% but the tail 20% or 10% where AI fails. For a one of personal app or website, 80% is plenty but only that.

reply

upvote

by aaronbrethorst19 hours ago|

[-]

Totally agree.

reply