I personally don’t know any colleagues who were good engineers just because they wrote code faster. The best engineers I know were ones who drew on experience and careful consideration and shared critical insights with their team that steered the direction of the system positively.
> Claude, engineer a system for me, but do it good. Thanks!
I don't know if good engineers can necessarily continue to be good. There is limit to how much careful consideration one can give if everything is on an accelerated timeline. Regardless good or not, there is limit on how much influence you have on setting those timelines. The whole playing field is changing.
There's a cycle that is needed for good system design. Start with a problem and an approach, and write some code. As you write the code, you reify the design and flesh out the edge cases, learning where you got the details wrong. As you learn the details, you go back to the drawing board and shuffle the puzzle pieces, and try again.
Polished, effective systems don't just fall out of an engineers head. They're learned as you shape them.
Good engineers won't continue to be good when vibe-coding, because the thing that made them good was the learning loop. They may be able to coast for a while, at best.
A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.
When there's a lot of complexity, it's often repetitive translation layers, and not something fundamental to the problem being solved.
We mocked these "architects" from experience. We knew that if you weren't feeling the friction yourself, you wouldn't learn enough to do good design.
Maybe you don't care about engineering great systems. Most companies don't. It's good for profit. This isn't new, though AI enables less care.
In my experience, in a lot of organizations, a lot of people either lacked the ability or the willingness to achieve any level of technical competence.
Many of these people played the management game, and even if they started out as devs (very mediocre ones at best), they quickly transitioned out from the trenches and started producing vague technical guidance that usually did nothing to address the problems at hand, but could be endlessly recycled to any scenario.
People who care about craft will care about the quality of what they produce whether they use AI or not.
The code I ship now is better tested and better thought through now than before I used AI because I can do a lot more. That extra time goes into additional experiments, jumping down more rabbit holes, and trying out ideas I previously couldn’t due to time constraints. It’s freeing to be able to spend more time to improve quality because the ROI on time spent experimenting has gone up dramatically.
I couldn't get exercises done where there were tricks/shortcuts which are learned by doing a lot of exercises, but for many, these are still the same tricks/shortcuts used in proofs.
This was indeed rare among students, but let's not discount that there are people who _can_ learn from well systemized material and then apply that in practice. Everyone does this to an extent or everyone would have to learn from the basics.
The problem with SW design is that it is not well systemized, and we still have at least two strong opposing currents (agile/iterative vs waterfall/pre-designed).
- I've taken a controversial new pill that accelerates my brain.
-- So you're smart now?
- I'm stupid faster!
That being said, being stupid faster can work if validation is cheap (and exists in the first place).
Turns out "eh close enough" for AGI is just stupidity in an "until done" loop. (Technically referred to as Ralphing.)
I've optimized my game's code and it finally runs at 1000 FPS.
--So your game is good now?
It's shit faster.
That has always been the case. That is why weeks or even months of programming and other project busy work could replace a couple of days of time getting properly fleshed out requirements down.
Good engineers are also capable of managing expectations. They can effectively communicate with stakeholders what compromises must be made in order to meet accelerated timelines, just as they always have.
We’ve already had conversations with overeager product people what the ramifications are for introducing their vibe coded monstrosities:
- Have you considered X?
- Have you considered Y?
Their contributions are quickly shot down by other stakeholders as being too risky compared to the more measured contributions of proper engineers (still accelerated by AI, but not fully vibe-coded).If that’s not the situation where you work, then unfortunately it’s time to start playing politics or find a new place to work that knows how to properly assess risk.
I estimate that I'm now spending about 10 to 30 hours less time a week in the mechanical parts of writing and refactoring code, researching how to plumb components together, and doing "figure out how to do unfamiliar thing" research.
All of those hours are time that can now be spent doing "careful consideration" (or just being with my family or at the gym or reading a book, which is all cognitively valuable as well).
Now, I suppose I agree that if timelines accelerate ahead of that amount of regained time, then I'm net worse off, but that's not the current situation at the moment, in my experience.
What you said: "figure out how to do unfamiliar thing" -- is correct, and will get things done, but overall quality, maintainability or understanding how individual pieces work...that's what you don't get. One can argue who care about all that as AI can take care of that or already can. I don't think its true today at-least.
What I find is actually necessary for me to have a mental model of the system is not typing out the definitions of the classes and such, but rather operating and debugging the system. I really do need to try to do things, and dig into logs, and figure out what's going on when something is off. And pretty much always ends up requiring reading and understanding a bunch of the implementation. But whether I personally typed out that implementation, or one of my colleagues, or an AI, is less important.
I mean, I already had to be able to build a mental model of a system that I didn't fully implement myself! I essentially never work on anything that I have developed in its entirety on my own.
10 to 30 hours saved on not learning new things! Hurray!
What do you mean by "barely working"? I can now put more iterations into getting things working better, more quickly, with less effort. That seems good to me.
10 to 30 hours a week is 25% to 75% of my time working. Seems like a pretty good trade?
I do understand that the calculation is different for people who are new to this. And I worry a lot about how people will build their skills and expertise when there is no incentive to put in all the tedious legwork. But that just isn't the phase of my career that I'm in...
My time is spent more on editing code than writing new lines. Because code is so repetitive, I mostly do copy-pasting, using the completion and the snippets engine, reorganize code. If I need a new module, I just copy what’s most similar, remove everything and add the new parts. That means I only write 20 lines of that 200 lines diff.
Also my editor (emacs) is my hub where I launch builds and tests, where I commit code, where I track todo and jot notes. Everything accessible with a short sequence of keys. Once you have a setup like this, it’s flow state for every task. Using LLM tools is painful, like being in a cubicle reading reports when you could be mentally skiing on code.
Or at least, the limit is increasing by the day.
Same, if anything, the opposite seems to be true, the ones that I'd call "good engineers" were slower, less panicked when production was down and could reason their way (slowly) through pretty much anything thrown at them.
Opposite experience, I've sit next to developers who are trying their fastest to restore production and then making more mistakes to make it even worse, or developers who rush through the first implementation idea they had for a feature, missing to consider so many things and so on.
Unfortunately, a lot of workplaces are ignoring this, believing their engineers are assembly line workers, and the ones who complete 10 widgets per minute are simply better than the ones who complete 5 widgets per minute.
Companies want workflows that work with mediocre programmers because they are more like interchangeable parts. This is the real secret to why AI programming will work in a lot of places. If you look at the externalities of employing talented people, shitty code actually looks better than great code.
This is the earworm the leaders of these companies have allowed into their minds. Like Agent Mulder, they Want To Believe in this so badly...
If you assume they are not idiots and analyze the FOMO incentives via a little game-theory, it becomes clear why.
Assuming the competition has adopted AI, leadership can ignore it, or pursue it. If they adopt it, then they are level with the completion whether AI actually succeeds or fails - they get to keep their executive job.
If leadership ignores AI, and it actually delivers the productivity gains to the competition, they will be fired. If they ignore AI and it's a bust, they gain nothing.
The company does better than the money-burning competition, but the executives personally gain nothing; there are no bonuses just because the competition took a misstep.
To me, none of this feels like "going faster", it feels like "opening up possibilities to try more things, with a lot less tedious work".
For things that have a visual elements like UI and UX, you can start with sketches (analog or digital) and eliminate the bad ideas, refine the good ones with higher quality rendering. Then choose one concept and inplement it. By that time, the code is trivial. What I found with LLM usage is that people will settle on the first one, declaring it good enough, and not exploring further (because that is tedious for them).
The other type of problem are mostly three categories (mathematical, logical, or data/information/communication). For the first type you have to find the formula, prove it is correct, and translate it faithfully to code. But we rarely have that kind of problem today unless you’re in a research lab or dealing with floating-point issues.
The second type is more common where you enacting rules based on some axioms originating from the systems you depend on. That leads to the creation of constraints and invariants. Again I’m not seeing LLM helping there as they lack internal consistency for this type of activity. (Learning Prolog helps in solving that kind of problem)
The third type is about modelizing real world elements as data structures and designing how they transform overtime and how they interact with each other. To do it well, you need deep domain knowledge about the problem. If LLM can help you there that means two things: a) Your knowledge is lacking and you ought to talk to the people you’re building the system for; b) The problem is solved and you’d do well to learn from the solution. (Basically what the DDD books are all about)
Most problems are a combination of subproblems of those three categories (recursively). But from my (admittedly small amount of) interactions with pro LLM users, they don’t want to solve a problem, they want it to be solved for them. So it’s not about avoiding tediousness, it’s sidestepping the whole thing.
Unfortunately I have seen some really good software engineering peers regress into bad engineers through a increasing reliance on AI.
Conversely some very bad engineers (undeserving of the title) have been producing better outputs than I ever expected possible of them.
For someone with 3-4 kids who lives far from the city, WFH and time flexibility can be important motivators.
However, the best engineers I know are usually among the quickest to open an editor or debugger and use it fluently to try something out. It's precisely that speed that enables a process like "let's try X, hmm, how about Y, no... ok, Z is nice; ok team, here are the tradeoffs...". Then they remember their experience with X, Y, and Z, and use it to shape their thinking going forward.
Meanwhile, other engineers have gotten X to finally mostly work and are invested in shipping it because they just want to be done. In my experience, this is how a lot of coding agents seem to act.
It's not obvious to me how to apply the expert loop to agentic coding. Of course you can ask your agent to try several different things and pick the best, or ask it to recommend architectural improvements that would make a given change easier...
> Of course you can ask your agent to try several different things and pick the best, or ask it to recommend architectural improvements that would make a given change easier
The ideal solution increasingly seems to be encoding everything that differentiates a good engineer from a bad engineer into your prompt.
But at that point the LLM isn’t really the model as much as the medium. And I have some doubts that LLMs are the ideal medium for encoding expertise.
The way you apply the expert loop is to be the expert. "Can we try this...", "have you checked that...", "but what about...".
To some degree you can try to get agents to work like this themselves, but it's also totally fine (good, actually) to be nudging the work actively.
The Pragmatic Programmer book has whole chapters about this. Ultimately, you either solve the problem analogously (whiteboard, deep thinking on a sofa). Or you got fast as trying out stuff AND keeping the good bits.
That's not my experience... mostly it's about first interrogating the actual problem with the customer and conditions under which it occurs. Maybe we even have appropriate logging in our production application? We usually do, because you know, we usually need to debug things that have already happened.
(If it's new/unreleased code, sure fine, let's find a debugger.)
Unfortunately thoughtful design and engineering doesn't get recognised
The risk isn't that agents write bad code. It's that developers lose the sense that tells them where code is bad. Code review is perception. Writing code is proprioception. They're different senses and one doesn't substitute for the other.
The question for the agent era isn't "is the code good enough to ship" — it's "do I still have enough coupling to the codebase to know when it isn't?"
I figure if it cant code when it has all of the necessary context available and when obscure failures are easily detected then why would i trust it when building features and fixing bugs?
It never did get good enough at refactoring.
Loss of discipline can be a result of panic or greed.
Perhaps believing that your own costs or your competitors' costs are suddenly becoming 10x lower could inspire one of those conditions?
(Also for greenfield projects specifically, it can plausibly be an experiment just to verify what happens. Some orgs are big enough that of course they can put a couple people on a couple-month project that'll quite likely fall flat.)
I do this too, but then I sit and observe how agent gets very creative by going around all of these layers just to get to the finish line faster.
Say, for example, if I needlessly pass a mutable reference and the linter screams at me, I know it's either linter is wrong in this case, or I should listen to it and change the signature. If I make the lazy choice, I will be dissatisfied with myself, I might even get scolded, or even fired if I keep making lazy choices.
LLM doesn't get these feelings.
LLM will almost always go for silencing it because it prevents it from reaching the 'reward'. If you put guardrails so that LLM isn't allowed to silence anything, then you get things like 'ok, I'll just do foo.accessed = 1 to satisfy the linter'.
Same story with tests. Who decides when it's the test that should be changed/deleted or the implementation?
Claude is remarkably good at figuring this is out. I asked it to look at a failing test in a large and messy Python codebase. It found the root cause and then asked whether the failure was either a regression or an insufficiently specified test, performed its own investigation, and found that the test harness was missing mocks that were exposed by the bug fix.
It has become amazingly good at investigating.
I can generate a lot of tests amounting to assert(true). Yeah, LLM generated tests aren't quite that simplistic, but are you checking that all the tests actually make sense and test anything useful? If no, those tests are useless. If yes, I don't actually believe you.
It's the typical 10 line diff getting scrutinized to death, 1000 line diff: Instant LGTM.
Pay attention to YOUR OWN incentives.
As models get better, they seem to be biased to doing most of these things without needing to be told. Also, coding tools come with built in skills and system prompts that achieve similar things.
Two years ago I was copy pasting together a working python fast API server for a client from ChatGPT. This was pre-agentic tooling. It could sort of do small systems and work on a handful of files. I'm not a regular python user (most of my experience is kotlin based) but I understand how to structure a simple server product. Simple CRUD stuff. All we're talking here was some APIs, a DB, and a few other things. I made it use async IO and generate integration tests for all the endpoints. Took me about a day to get it to a working state. Python is simple enough that I can read it and understand what it's doing. But I never used any of the frameworks it picked.
That's 2 years ago. I could probably condense that in a simple prompt and achieve the same result in 15 minutes or so. And there would be no need for me to read any of that code. I would be able to do it in Rust, Go, Zig, or whatever as well. What used to be a few days of work gets condensed into a few minutes of prompt time. And that's excluding all the BS scrum meetings we'd have to have about this that and the other thing. The bloody meetings take longer than generating the code.
A few weeks ago I did a similar effort around banging together a Go server for processing location data. I've been working against a pretty detailed specification with a pretty large API surface and I wanted an OSS version of that. I have almost no experience with Go. I'd be fairly useless doing a detailed code review on a Go code base. So, how can I know the thing works? Very simple, I spent most of my time prompting for tests for edge cases, benchmarking, and iterating on internal architecture to improve the benchmark. The initial version worked alright but had very underwhelming performance. Once I got it doing things that looked right to me, I started working on that.
To fix performance, I iterated on trying to figure out what was on the critical path and why and asking it for improvements and pointed questions about workers, queues, etc. In short, I was leaning on my experience of having worked on high throughput JVM based systems. I got performance up to processing thousands of locations per second; up from tens/hundreds. This system is intended for processing high frequency UWB data. There probably is some more wiggle room there to get it up further. I'm not done yet. The benchmark I created works with real data and I added generated scripts to replay that data and play it back at an accelerated rate with lots of interpolated position data. As a stress test it works amazingly well.
This is what agentic engineering looks like. I'm not writing or reviewing code. But I still put in about a week plus of time here and I'm leaning on experience. It's not that different from how I would poke at some external component that I bought or sourced to figure out if it works as specified. At some point you stop hitting new problems and confidence levels rise to a point where you can sign off on the thing without ever having seen the code. Having managed teams, it's not that different from tasking others to do stuff. You might glance at their work but ultimately they do the work, not you.
Lead engineer says something is not workable? Pm overrides saying that Claude code could do it. Problems found months later at launch and now the engineers are on the hook.
New junior onboardee declares that their new vision is the best and gets management onto it cuz it’s trendy -> broken app.
It’s made collaboration nearly unbearable as you are beholden to the person with the lowest standards.
Exactly right.