Once this is done, the mechanical coding parts are mostly routine (for codex)
I'm sure there's an interesting study on how users 'leak' their preference unintentionally to the LLM; perhaps when users list their options, they often put their prefered option first; but not showing the cards on my hand has been very useful when thinking through a problem with LLMs.
Same experience. Claude rarely pushes back once you give a plausible/logical reason for your initial decision, even if it flagged concerns at first.
An example from earlier, Claude strongly suggested a migration that would run a full vacuum on postgres. However, in production this would lock tables which would grind the application to a halt. After I informed Claude that there were millions of rows in production, it accepted that and helped me get to the right thing.
Another example, I'm developing a TOTP authentication app because I'm dissatisfied with all those that I've tried. I want something strictly local, and with a very easy use case when you have dozens or even a hundred or more accounts on there, that is also efficient when left open for long periods of time. Claude strongly suggested that we force users to encrypt their vault with a passphrase all the time. However this makes the CLI extremely painful to use if you are using a strong passphrase. I told Claude about the user experience impacts and that I wanted to allow users to optionally use a vault with no passphrase encryption, and it accepted that and suggested as a medium that we have a checkbox for the user to explicitly acknowledge that they're creating an unencrypted vault on disc. This is the right thing IMHO.
I have a linus-reviewer skill that focuses on architectural integrity, no bs, etc modeled on Torvald's code preferences.
And I have an enrico-reviewer one (I'm Enrico), that focuses on correct design, strict typing, simplification.
They have different prios, but they both push back on feedback, till you convince them.
No affiliation.
I find it useful to let it generate benchmarks comparing the approaches. Turns out AI is terrible at guessing whats faster or allocates less
s/AI/a human being/ would work equally well, lol.
Jokes aside, I do like the approach of letting the AI build something deterministic and make decisions based on that.
the cynic in me would say that a good engineer should fully understand the code you write.
I'm not suggesting that AI is the problem here - you could vibe code with the AI have have it explain the reasoning and patterns - or else tell it to use 'simpler' patterns from the outset. For any one problem in software engineering, there are always multiple solutions; some slower, some faster, some more flexible etc. The code you produce should, imo, but at the level that you can understand it.
How can you reason about code you don't fully understand? How can you judge the future impact (technical debt and the cost of maintenance) of your projects?
A.I makes it easier to get yourself into problems early on.
We all do, though. It takes months for a human to really get to know a project and, unless you’re working at a small startup, you’ll probably never know most of the code outside the corner you work in.
Finally feel like I have a good workflow where I can fully benefit from these things without sacrificing my understanding of what they're doing.
When it comes to the actual implementation I prefer to work through it in small steps, where the AI explains to me exactly what it's about to do and why (and I approve) along the way. This enables me to catch it if it's about to do something I disagree with beforehand. And reduces the time I need to spend reviewing in the end.
What I've tried to do is make the bot write detailed spec documents, slowly building it over time as I explain the full problem.
It works for the most part but it's you have some non standard requirement, the agent seems to skip over that part of the spec document when it starts to code. Or it would have needless checks for situations that I said will never happen
I would also recommend explaining the specs and doing a lot of your back and forth with a lower end model and set it to a higher end model only once the conversation history has all the context you feel the higher end model needs.
You will outgrow it at some point.
Try and learn at every point.
[0] At least, in my experience, "micromanaging" the AI is what gives me the best results. Iterating on the initial design, then iterating on the plan, then reviewing the proposed code changes (including tests), then getting an independent code review from another LLM, etc. If you give an LLM too much latitude that's when the really shitty code and ill-considered breaking changes/obliteration of existing functionality starts to creep in.
You can wait and see, but that's what'll happen. If we stop it stops.
I was more annoyed than anything that I didn't hit this moment until my 40s.
Except it's not just reddit (I quit reddit 15 years ago). It's the whole internet.
Oh. I am aware. It is not that deep. But who you argues with still matter. There was a point where I have abandoned Reddit and HN. I came back to HN because people here also seem to have grown up. Reddit stays mostly the same.
I credit the moderation here for that, I mean allowing people to grow out of the echo chamber.
Getting past that is problem we face now.
Yes, I thought the same as well because that was the same line of thought that made me write my comment.
>Except it's not just reddit (I quit reddit 15 years ago). It's the whole internet.
Yea, they are like a slingshot. You need to let go at some point or else it will drag you back.
AI is an excellent rubber duck and test writer. Maybe I sniff my farts too much but I like my code just the way I want it lol
This is what I tell people (including non-programmers interested in vibe coding), the results you get are product of... process. Formal process.
From this naturally emerges the other thing I tell people: domain expertise (or at least, familiarity and or capacity for learning) is still determinate of outcome.
I don't touch the code. But I do push back on expedience, laziness, inconsistency, and all the other recurring unsolved problems of generated code... and continue to play whack-a-mole in pursuit of process that whacks the moles.
I have my own skill: 5 rounds of research/planning/test-planning. Interactive with me in loop for all important decisions. Starts with high level shape, then details. Planning can take 2-3 days of my time, then the implementation agent can take many hours (Opus 4.7). It splits the implementation across many phases/commits, each with its own code-review fix loop. Deep code review at the end can take another hour or two. It opens a PR, Gemini reviews, it reads out and resolves those issues.
Projects still take days or weeks, but 5x faster than doing it all myself.
Edit: the skill - https://github.com/scosman/vibe-crafting
Because this version of AI is worth 10 trillion dollars.
While the pragmatic versions from realists you can find all over this thread are ultimately probably less of a speed boost than just having your CEO/local micromanager be conveniently on vacation during critical periods when the work actually gets done.
i wonder how much the real version of AI is worth. I've got a hinch we're going to find out pretty soon.
As a result I've abandoned the idea of having LLMs generate code except for very small, localized and tightly scoped things. They really can't produce much more than a function or a small module without shitting the bed (last time I vibecoded was with Opus 4.6, Composer 2 and GPT-5.4). I use it almost entirely as another signal in analysis, which naturally makes it fit in better because all the other signals (reading the code, stepping through the code, writing the code myself) are already there so when the LLM points things out the information it actually renders can be taken in much more easily (and seen through more easily when it's false or irrelevant).
I think it's neat that people find fun ways to develop, but I think dressing up vibecoding in a fancy dress and layering SpecLang, sometimes in multiple steps, on top of it, is an exercise in trying to use the tool more instead of trying to use it in its most useful capacity.
This has been my experience every time I've suggested that there are any sort of inherent ontological/conceptual or computational limits to the sophistication of LLM mimicry.
IMO if you are not shipping out faster then the faster work gains are meaningless.
If you are shipping faster, you’re probably picking up more work and shipping everything too fast leading to burnout.
And if you are, it's bad for the employee.
Is what the above comment actually said.
1. Some bad idea gets embedded into the context that you just can't argue away
2. Some important idea gets lost in compression and the ai wheres off into funland without recourse.
In both cases if is often better to start over or just do it yourself. I sometimes find myself asking for a summary, editing it and then using the edited one to seed a new session.
Edit: s/Finland/funland/
You say “all that time” babysitting AIs but in my experience it isn’t that much time, if anything the back and forth at the planning stages is more productive than when I’m doing it by myself because I’m being asked questions and having to think things through from different angles.
Define 'aware'. The volume of code for a feature/system to make it worth using a more complex workflow such as this one, is definitely larger than what a human can even briefly review and build a mental model about the inner workings within a reasonable amount of time. Reasonable meaning not considerable delaying the process. When deadlines loom and management adds pressure, this 'awareness' is the first thing that goes out the window.
Maybe it’s just me, but I’ve never understood how one understands from reading code. Yes you can understand what that code does, but not why it was done that way instead of a different way. In the end I only understand it deeply if I end up writing it. Chatting through it is helpful to me, but having AI crank out code loses all of that context pretty quickly.
I’m not disagreeing. Just curious how you think about this, and if there are key parts of your process that help you stay contexted in.
Even code you write yourself, given enough time, you will forget the why unless you wrote comments. In a way comments are as much for you as they are for others.
Even before AI, understanding code you didn't write is essential to working on a team of other developers. If you can't understand the code from reading it, then that's part of the feedback loop - too complex, needs comments, etc..
On large teams you'll spend as much time reading code as you do writing it. And long term when it comes to writing maintainable code - the ability for others to read and understand it, including the why of it, is paramount. Your code could literally be around for decades.
Code is never missing contexts. If what your code is doing is not obvious to the reader, it is bad code that needs to be fixed. Things like cryptic low-level expressions should be extracted to helper functions with descriptive names or even extracted into a class, and classes need to comply with the single responsibility principle.
No, not really. You get spaghetti code by being unable to refactor your code to follow inconsistent level of detail across calls. That's the textbook definition.
Once you start to follow basic code quality and software engineering principles, you'll notice right away that your code becomes both easier to understand and to test.
And if you already know the material explained by the book, yes i don't need to write it to understand it.
As a result there's a whole universe of code where the how of it, the elegance, is the main thing, and what it's doing is putting characters on the screen a bit slower than the next thing but there are some amazing concepts that are supposed to make it all an axiomatic synthesis of how to think about code forever, replacing all precious concepts of thinking about code.
Now AI can think about code forever while doing nothing.
I do wonder how much of how people approach coding is shaped by the games they played when younger.
Keeping that many tasks in parallel, running all the time will kill you.
Your LLM has forgotten whatever shit it wrote when you opened a new tab, and that responsibility is now on you. And it wrote absolute dog shit
Either you follow everything it does, revise the plans, do the code review, manual adjustments, etc, or you run sessions in parallel, not being that attentive and constantly context-switch (also resulting in less attention I guess).
I fail to see the benefits honestly.
A calm attentive alternative of vibe coding: restful coding.
It's much easier to read and review code after a refreshing cat nap, especially with a real cat.
Too bad that's not usually acceptable to do that in the office. It should be! Slacking off by sword fighting all day is too exhausting.
Multitasking does not mean burnout. It just means you are not wasting time while idling. Multitasking was not invented for AI coding assistants. What do you think feature branches are used for?
Your feature branch is to put things aside and send them to CI, or wait and think on them. Not to have four of them running in parallel in your head frying you.
After you put together a plan, today's models can take well over a minute to execute it. Also, your work shifts to code review and executing acceptance tests, followed by either tweaking your current change or moving on to the next change.
This is really not about context changes. This is about not having to switch contexts because your focus stays on architecture+review instead of having to do deep dives to type code around.
> Your feature branch is to put things aside and send them to CI, or wait and think on them.
No, not really. Feature branches, as well as most types of branches, is to set aside work fronts that are in progress and run in parallel.
A full, whole, entire _minute_ ?! Sixty seconds ! Oh no, they must be optimized away, we do not deserve our free time like so, we should toil until we fall over because... Growth?
It's still context switching. Either what you're doing is surface enough that you don't give a shit, it doesn't matter and you don't review it anyways (so the only context is basically the prompt you wrote or the nth SELECT * FROM table CRUD piece of crap), or you're context switching and it's fucking you over. The context isn't about remembering how you write if err != nil, it's the expected behaviour of what you're working on.
You're not getting a promotion from doing this, you're getting burnout.
> Feature branches, as well as most types of branches, is to set aside work fronts that are in progress and run in parallel
They're not running in parallel, unless you use work trees. They were put to the side, because you can't continue or finish the work they're about. Even just three branches in parallel in a modestly active repo that happen to be long lived drift enough that just keeping them up to date with develop makes it a waste of time.
Focus on one or two things, and do them well.
That, or get checked for ADHD.
The scientific study of multitasking over the past few decades has revealed important principles about the operations, and processing limitations, of our minds and brains. One critical finding to emerge is that we inflate our perceived ability to multitask: there is little correlation with our actual ability. In fact, multitasking is almost always a misnomer, as the human mind and brain lack the architecture to perform two or more tasks simultaneously. By architecture, we mean the cognitive and neural building blocks and systems that give rise to mental functioning. We have a hard time multitasking because of the ways that our building blocks of attention and executive control inherently work. To this end, when we attempt to multitask, we are usually switching between one task and another. The human brain has evolved to single task.
If you honestly had any concern about loosing focus and being forced to context switch, a 1 minute pause idling while waiting for something to happen would represent the root cause of your context switch problems.
Yak driven development.
The majority of jobs are still paid on a 40 hour per week basis. Disappearing for a day each week (20%) won't fly when you're full time.
Now if it’s my job then I can’t have a knowledge debt and if Claude is down I’ll continue working manually because I know and understand and can continue without having to understand a lot of logic before continuing
failing that, most of the APIs i use are open source, so i can read the code anyway.
then demand some lack-of-uptime compensation for a lack of uptime
I pretty significant number of their engineers flat out refused to work. Like publicly said so. "Increase our plan or I'm taking the week off."
Style can be as important as substance.
I still do a lot of back and forth about the plan - have it written to a file. Read through the file, make changes by hand and have claude read my changes and on and on. But starting with the basic architecture there's less ambiguity.
Ingest big project, comment on it gets expensive. I'm not sure how expensive.
This helps keeps the other players honests: there's a limit to which they can raise prices when there are already alternatives today and when there's zero lock in.
That those companies can make revenues but only at the cost of burning investors money: that's not my problem.
My take on it is simple: "Give me something MUCH better than the best open-weight models at a price that's not crazy or you're not getting my money".
And it happens to be the take of many devs.
I'm still paying Anthropic, Google and OpenAI (OpenAI because I didn't manage to cancel my subscription and now their model is competitive vs Anthropic's models again) but eye'ing a "Pi + open weights" solution.
Raise the prices too much and those companies selling access to private models aren't getting my money anymore.
Lately I’ve been experimenting with adding an explicit reward function so the models optimize for measurable output quality.
This creates a generate, critique, revise loop where candidate answers compete for a higher score. It feels promising because it reduces the amount of handholding for every task. It is also more fun because part of the review process is embedded in the scoring function, which simplifies the review effort.
It's actually happened a few times where I need to back out entire features because AI went too far and I lost control/understanding of what the code is doing. Many people will give up at that point and let AI do everything - that is a mistake, at least right now and how you end up with unmaintainable vibe spaghetti slop.
Often depending on how complex the feedback, I'll do it one at a time addressing each one individually. And after the feedback is addressed, I'll go back to the AI that generated the feedback and say like, "I handled 4/5 items you found, can you double check."
It's similar to handling PR feedback, where you do it, validate it, but then still have to submit it for peer review.
And maybe don't use tools that lock you into one model?
1. Have claude form the plan and converse with a simple "Note any concerns with this plan" type plan-critic agent.
2. Let it run.
3. After (with everything in context) have it make a future_recommendations.md.
4. Have it make a plan.md to implement those future recommendations, conversing with the plan critic..
5. Clear context. Repeat with 1. Do this loop a few times, with some feedback from actual review thrown in.
But, most importantly, because Claude will aggressively try to maintain code "as is", and happily build on it's previous crap, while preferring to hand roll implementations of everything, add something like this to memories/directives:
* When evaluating designs, default to "pull in the library" over "hand-roll it." Hand-rolling is much worse than a dependency.
* "Precedent" / "matches house style" / "reuses existing pattern" / "consistent with what we already do" are not valid engineering arguments.
* This project is still in the development stage with no real deployments. Mitigation costs and existing precedence are not a concern.
With these, in the last week that I've started using them (after inspecting the insane justifications for leaving crap design decisions in the plans), Claude went from junior level slop that required more oversight than it was worth to something very reasonable, using standard libraries, requiring nudges for architecture rather than pure "wtf!?".
I think they've fine tuned heavily towards "don't rewrite the codebase" tuning, which completely rational from multiple perspectives, but also not appropriate for new code.
I do enjoy a considerable daily token allowance, so this may not apply to everyone.
This stuff works so much better when you just tell it what to do
So, people do know how to design a feature, but they also know it takes a lot of time and effort. They want AI to do that work for them.
- aimless AI wandering, leading to pretty, frankly, useless design docs
- using AI to "expand" upon a bullet pointed/shorthanded design doc. To which I feel like saying "the bullet points are already a good design doc!"
I understand that teams sometimes have specific formats that they have to make deliverables for, but having a nice 5 point bulletpoint list turn into 5 paragraphs... all for me to turn the 5 paragraphs back into 5 bullet points in my notes is depressing.
I do think you can get a lot of value in the mechanics, I just have had so much success leaving the thinking to me and the rote stuff to the AI. I'm going to have to think about the design eventually anyways right?
We may need some sort of paradigm shift - like more powerful frameworks or even higher level languages that allow us to review less, but more functional code blocks.
In my experience, even on a relatively trivial task, you can ask an LLM at least 20 times:
Is this actually done, or only partially implemented? Did you finish x, y, z?
And the LLM will say, no, I'm not done and keep working.
After that, I'll feed the branch to a different LLM, and ask if the implementation matched the design, where it's weak and needs improvements.
Same thing - that feedback will usually only be partially finished for several rounds.
When they all agree it's done - I'll finally look at the code, and there's still typically glaringly obvious problems - duplicate systems that reinvent the wheel, etc - that will take typically more than one prompt to get right...
Getting things right takes almost ~100x as long as getting things almost right with LLMs.
You can tell an LLM to "make me Rust, but easier. Make no mistakes," and it'll plan out a 100 commit process and get something that - somehow - sort of works... but isn't even close to complete.
Still, on a cost basis, you're still able to get features that would take yourself several times longer and cost orders of magnitude more money, and - if you're doing it right - they'll probably do a better job than you would've done (at least for me).
Before AI, myself and everyone else I knew was drowning in tech debt. And now with AI we are treading water.
Still better than dealing with people, but only just.
In my experience, software engineering is a matter of knowledge. Understanding it and then coming up with a solution. The latter is a flash of insight that comes mostly from experience. Then you gather more information to flesh it out, or brainstorm it with your colleagues.
What you're describing sounds more like a ritual of doing busy work than anything practical. Because tasks vary so much. A feature may be huge, but you take care of it in a day with copy pasting because you already have all the building blocks in other files. And something may be twenty lines of code, but you spent the whole week sweating on it (concurrency stuff maybe). Those ritualistic workflows sounds more like someone imagining software development than actually doing it.
Lost you in the last paragraph - features are not "copy pasting because you already have all the building blocks" and "something may be twenty lines of code". Mid sized features often mean tearing up many layers of code across the stack to add in some sort of new capability. Tearing up existing code means there are all sorts of add-on considerations in addition to feature you are working on.
What? No, it shouldn't. I've worked on a lot of codebases and if you have to do this, something is very, very wrong.
There's such a thing as under engineering, and if you find yourself changing "all the layers" for a feature, your codebase is poorly designed.
Even with clean architecture, you only have 4 fundamental layers. And once you have v1, you’re mostly doing tweaking and copy pasting. Any huge refactoring is the business switching its main strategy.
Take an OS like OpenBSD. It has three main layers. The syscall layer, the kernel layer, and the machine dependent code. But an OS is more spread horizontally with various subsystems (process and memory, io and other device, ipc,…)
If you’ve captured your problem’s domain and adopted a pragmatic architecture, you will rarely have to change across all layers. That’s costly and happens mostly due to business reasons.
And then the each of the service layers can be broken into layers themselves depending on the complexity of the business logic can be broken into layers as well. So yea a change in a worker can potentially bubble up through all the layers.
The mental model is still in my head, my brain is overloaded, but only from the amount of code reviews - like I said, I'm building v3 of a feature in the time it takes to build v1, but I am in a way doing 3x the code reviews going back and forth. That's the fall out of the iteration speed enabled by AI.
Between submitting PRs, getting feedback, iterating, re-submitting, repeat - there used to be breathing room. Now it's all compressed into an afternoon. Productivity is through the roof, but it can be draining.
If the feature isn't released, it's not a new version.
In the new world there is no time to put out v1 quality code and it is borderline reckless given how easily things are getting hacked now. You need to be putting out heavily reviewed code that covers all the corner cases on the first release.
There's no such thing as "v1 quality code", you just haven't finished it yet.
Maybe I'm too far gone down the AI rabbit hole, but that seems a really strange take to have. If you replaced 'back and forth with the AI' with 'pair programming' or 'brainstorming' this phrase would be really strange, after all these are all techniques to sharpen your ideas. Even 'rubber ducking' is widely accepted as an effective way to go through a problem, and you can definitely use AI as a rubber duck.
For me the idea of chatting with the AI about a problem/solution is just another tool to help us work. It's not the best solution because it has a lot of downsides you should be aware while using it, but that is true for any technique including 'writing the code yourself'.
I will say, it does help me get over procrastination lol. I get annoyed by the robot doing dumb shit and finish it myself.
I’ve found it useful to write out a list of feedback / issues and have a bunch of sub agents work on them in worktrees with a loop bringing them all back together. That way it can work for a few hours while I just can review a bulk at a time.
Also I never multitask with multiple agents doing other stuff. Meh I focus on just the one task.
If only things were so! If only code was discussed, reviewed, iterated on! If only the "manager" actually read the code, provided actionable feedback, and disseminated PRs to multiple people with diverse skill sets.
(If you can't tell, I'm a jaded consultant desperately trying to make the horse drink the water.)
So AI definitely changes the game. I feel like we almost need something higher level for reviewers to review changes faster. Todays code is starting to feel like assembler. Too much of it, too low level. We need even higher level constructs to be able to more in less time. I'm just not sure what that is.
Don't get me wrong I used to enjoy writing code by hand, but I don't think I would anymore. I don't like writing code for the sake of writing code - I like building things, I like being productive.
hahahahaha
1. I write a list of things I want to have without AI support
2. I discuss the list with an LLM, which occasionally reveals obviously missing things I hadn't thought about or just things that would be smart to have. Or sometimes the LLM doesn't get it and wants to funnel me down a commonly walked path, which is a non-goal
3. From that list I draft an implementation plan containing things like how the code shall be structured, which language, libraries, build systems, etc to use. This may even contain some data models and considerations that are more detailed, like for example ideas about how a specific interaction shall be event sourced. I work on that, till I feel a satisfactory level of clarity has been reached
4. Actual writing of code as a back and forth between manual writing, letting an LLM write something and so on. LLMs suck at writing CSS that feels like good UX design to me, so usually templates, layout and CSS will be (re)written entirely by hand
5. Bug-hunting and guessing potential edge cases is one thing where LLMs really shine. Often if the work before that was quality the LLM has an okay time coming up with fixes that are no worse than what I would have done.
The In-Laws (1979): Getting off the plane in Tijuara: