I think this goes against what a lot of developers want AI to be (not me, to be clear).
With the right docs, I can lift every developer of every skill level up to a minimum "floor" and influence every line of code that gets committed to move it closer to "perfect".
I'm not writing every prompt so there is still some variation, but this approach has given us very high quality PRs with very minimal overhead by getting the initial generation passes as close to "perfect" as reasonably possible.
If they aren't willing to read what I put effort into, why should I be expected to read the ill-conceived and verbose response? I really don't want to get into a match of my AI arguing with your AI, but that's what they've told me I should be doing...
There's an asymmetry of effort in the above, and when combined with the power asymmetry - that's a really bad combo, and I don't think I'm alone.
I'm glad to see the appreciation of the enormous costs of complexity on this forum, but I don't think that has ascended to the managerial level.
> ...a manager who responds in the form of Claude guided PRs
I think the job of a dev in this coming era is to produce the systems by which non-engineers can build competently and not break prod or produce unmaintainable code.In my current role, I have shifted from lead IC to building the system that is used by other IC's and non-IC's.
From my perspective, if I can provide the right guardrails to the agent, then anyone using any agent will produce code that is going to coalesce around a higher baseline of quality. Most of my IC work now is aligned on this directionality.
That's the classic 2nd-system effect - "let's rewrite it from scratch, now that we know what we want". And you throw away all the hard-learned lessons.
> The general tendency is to over-design the second system, using all the ideas and frills that were cautiously sidetracked on the first one. The result, as Ovid says is a "big pile". For example, consider the IBM 709 architecture, later embodied in the 7090. This is an upgrade, a second system for the very successful and clean 704. The operation set is so rich and profuse that only about half of it was regularly used. (p.55)
>
> The second-system effect has another manifestation somewhat different from pure functional embellishment. That is a tendency to refine techniques whose very existence has been made obsolete by changes in basic system assumptions. (p.56)
It's the exact opposite: by explicitly dictating what is correct, perfect, and standard in this codebase, we achieve very high consistency and quality with very little "embellishment" and excess because the LLM is following a set of highly curated instructions rather than the whims of each developer on the team.Suggest that you re-read what Brooks meant by "second system effect".
I don't think there's perfect code.
Code is automation - it automates human effort and humans themselves have error, hence not perfect.
So as long as code meets or exceeds the human output, it's "good enough" and meets expectations. That's what a typical customer cares about.
A customer will happily choose a tent made of tarp and plastic sticks that's available at their budget, right now when it's raining outside, over an architectural marvel that will be available sometime in the future at some unknown pricepoint.
Put another way, I don't think if you built CharlieAI CharlieGPT today, where the only differentiating factor over ChatGPT was that CharlieGPT was written using perfect code, you would have any meaningful edge.
I am yet to see any evidence where everything else being equal, one company had an edge over another simply due to superior code.
Infact, I have overwhelming evidence of companies that had better code succumb and vanish against companies that had very little, if any code, because those dollars were instead invested in better customer discovery, segmentation and analytics ("what should we build?", "if we did one thing that would give our customers an unfair advantage, what would be that thing?")
Software history is full of perfect OS, editors, frameworks, protocols that is lost over time because a provably inferior option won marketshare.
You are using a software controlled SMPS to power your device right now. You have 0 idea what the quality of that code is. All you care about is whether that SMPS drains your battery prematurely and heats up your device unnecessarily. It's extremely unlikely that such an efficient, low overhead control system was written using well abstracted modules. It's more likely that control system is full of gotos and repeated violations of DRY that would make a perfectionist shudder and cry.
> I don't think there's perfect code
Note I used "perfect" in my text. In this context, meaning it passes human PR reviews following our standard guidelines with minimal feedback/correction required. > So as long as code meets or exceeds the human output, it's "good enough" and meets expectations. That's what a typical customer cares about.
Why settle for this when "perfect" is "free"? I understand this dichotomy when writing "perfect" code requires more expensive, more experienced human resources or more time so you settle for "good enough"; but this is no longer the case, is it? The cost of "perfect" is only perhaps a few fractions of a cent higher than shitty.You only need to accurately describe what "perfect" is to the LLM instead of allowing it to regress to the mean of its training set. There really is no cost difference between writing shitty code and "perfect" code now; its just a matter of how good you are at describing "perfect" to the LLM.
For example, we very specifically want our agents to write code using C# tuple return types for private methods that return more than 1 value instead of creating a class. The tuple return type is a stack allocated value type and has a default deconstructor. We also always want to use named tuple fields every time because it removes ambiguity for humans and increases efficiency for agents when re-reading the code.
We want the code to make use of pattern matching and switch expressions (not `switch-case`) because they help enforce exhaustive checks at compile time and make the code more terse.
If we simply tell the agent these rules ahead of time, we get "perfect", consistent code each time. Being able to do so requires "taste" and understanding why writing code one way or using a specific language construct or a specific design pattern is the "right" way.
The consequent is at odds with the antecedent. It's a performative contradiction (if the output were truly "free", the skill of the operator would be a zero-value variable - yet, by requiring skill, you acknowledge a cost) as I prove below
> The cost of "perfect" is only perhaps a few fractions of a cent higher than shitty.
Is your cost model accounting for the cost of specification, of review and additional cycles required if review fails or the specification itself needs to be adjusted?
> If we simply tell the agent these rules ahead of time, we get "perfect", consistent code each time
No, in the simplest case, your cost of perfection is simply moving up the chain of abstraction from implementation (coding) to design and specification. In reality it also splits and moves a part of that cost downstream to verification.
This isn't some special, magical insight I have, I'm reiterating Tesler's Law right back to you.
I also encourage you to read software history - for decades it has been trivial to split out perfectly working CRUD from an ER and UML diagram, no LLM necessary. The insight is understanding why we continue to hire cheap human labor to spit out CRUD instead of using those tools.
The cost of software is, and always has been, in the figuring out the intent, not the generation of syntax.
I wish pg was more active on HN - I expect this is one of the reasons why he wanted founders to have and share the painpoints of their (potential) customers. Figuring out the intent is expensive. Mistake the intent and the best case scenario is a pivot.
> I wish pg was more active on HN - I expect this is one of the reasons why he wanted founders to have and share the painpoints of their (potential) customers. Figuring out the intent is expensive. Mistake the intent and the best case scenario is a pivot.
Your mistake is that you think the point is that only engineers participate in the production of code. In fact, the point is that the product team and the people closest to the customer can generate the code. And for that reason, the goal is to produce a framework on top of which "perfect" code can be produced with relative ease and consistency regardless of whether the user is part of engineering or product. > Is your cost model accounting for the cost of specification
This is the same cost no matter what. The LLM does not generate code on its own; some operator must provide some instruction and specification regardless so you might as well give it good ones. But here, I would point out that there is a high level of broader general instructions that incur a one-time cost of specification ("Always write this code this way").