“An experienced programmer told me he's now using AI to generate a thousand lines of code an hour.“
https://x.com/paulg/status/2026739899936944495
Like if you had told pg to his face in (pre AI) office hours “I’m producing a thousand lines of code an hour”, I’m pretty sure he’d have laughed and pointed out how pointless that metric was?
pg wrote a Lisp dialect, Arc, with Morris. The Morris from "the Morris worm". These people are at the very least hackers and they definitely know how to code.
I don't think a "not good programmer" can write a Lisp dialect. At least of all the "not good" programmers I met in my life, 0% of them could have written a Lisp dialect.
It's not because Arc didn't reach the level of fame of Linux or Quake or Kubernetes or whatever that pg is not a good programmer.
You can write a lisp in 145 lines of Python: https://norvig.com/lispy.html
Wasn't Arc just a collection of Scheme macros?
(Also, writing a Scheme dialect was a first-semester CS problem set - if you're in a 1980s academic CS environment it was more effort to not accidentally write a lisp interpreter into something, something in the water supply...)
When I see PG write something like that, it signals to me that he has embraced AI hype to the point that he is displaying poor taste and embracing a risky technical practice.
It's unsurprising he would believe LLM coding tools are a productivity boon, but using code quantity as a measure of software development progress is one of the most famously wrong ideas in the software world. Either he wrote carelessly, or he believes that LLM tools have changed that reality.
I'm inclined to think LLM tools haven't substantially changed that reality. LLMs perform better when more of the problem fits in context, so succinctness remains valuable.
The act of "typing" code was technically mixed in with researching solutions, which means that code often took a different shape or design based on the outcome of that activity. However, this nuance has been typically ignored for faff, with the outcome that management thinks that producing X lines of code can be done "quickly", and people disagreeing with said statements are heretics who should be burned at the stake.
This is why, in my personal opinion, AI makes me only 20% productive, I often find disagreeing with the solution that it came up with and instead of having to steer it to obtain the outcome I want, I just end up rewriting the code myself. On the other hand, for prototypes where I don't care about understanding the code at all, it is more of a bigger time saver.
I could not care about the code at all, and while that is acceptable to management, not being responsible for the code but being responsible for the outcomes seems to be the same shit as being given responsibilities without autonomy, which is not something I can agree with.
Even worse, whole generation of devs are being trained to not care of learn about that last 20% because the AI does it """all""" for them. That last bit is an unknown unknown for the neo developer nee prompter.
Perhaps over half of engineering managers unconsciously or admittedly take the amount of PR and code additions as a rough but valid measure of productivity.
I recall a role in architecture, senior director asking me how come a principal engineer didn't commit any code in 2 weeks, that we pay principals a fortune.
I asked that brilliant mind whether we paid principal engineers to code or to make sure we deliver value.
Needless to say the with question went unanswered, so called Principal was fired a few months later. The entire company in fact was sold for a bargain too given it had thousands of clients globally.
The LLM can replace engineers is a phenomenon that converge from two simple facts, we haven't solved the misconception of the engineering roles. And it's the perfect scapegoat to justify layoffs.
Leaders haven't all gone insane, they answer to difficult questions with the narrative of least resistance.
Brilliantly said. I’d like to add - a distorted narrative actively, intentionally established and maintained by the entities profiting from the technology. Quite similar to the crypto scam hype cycle.
"Adding manpower to a late software project makes it later -- unless that manpower is AI, then you're golden!"
https://openai.com/index/harness-engineering/
> This translates to an average throughput of 3.5 PRs per engineer per day, and surprisingly the throughput has increased as the team has grown to now seven engineers.
We will see if this continues to scale up!
One pathological example: if you’re running a server-based product, quite often what stands between you and a new feature launch is literally couple of thousands of lines of Kubernetes YAML. Would adding someone who’s proficient in Kubernetes slow you down? Of course not.
One may say, hey, this is just the server-side Kubernetes-based development being insane, and I’ll say, the whole modern business of software development is like this.
Yes, they know how the feature they work on relates to other features, but actually implementing that feature is very often mostly involves fighting with technology, wrangling the entire stack into the shape you need.
In Brooks’s times the stack was paper-thin, almost nonexistent. In modern times it’s not, and adding someone who knows the technology, but doesn’t have the domain knowledge related to your feature still helps you. It doesn’t slow you down.
One may argue that I’m again pointing to the difference between accidental and incidental complexity, and my argument is essentially “accidental complexity takes over”, but accidental complexity actually does influence your feature too, by defining what’s possible and what’s not.
Some good thoughts (not mine) on the modern boundary between accidental and incidental complexity: https://danluu.com/essential-complexity/
Including the author, who brags he doesn't read his own code. Indeed, it would be physically impossible for him to do so!
https://steipete.me/posts/2025/shipping-at-inference-speed
As mentioned elsewhere in the thread, there is very clearly an obsession with quantity over quality. Not a new phenomenon by any means: people were already complaining about this in the 19th century! But it has reached a new absurd height with this latest trend.
Just as an example, I should easily be able to give each program an allowlist of network endpoints they’re allowed to use for inbound and outgoing traffic and sandbox them to specific directories and control resource access EASILY. Docker at least gets some of those right, but most desktop OSes feel like the Wild West even when compared to the permissions model of iOS.
If you are careful and specific you can keep things reasonable, but even when I am careful and do consolidattion / factoring passes, have rigid separation of concerns, etc I find that the LLM code is bigger than mine, mainly for two reasons:
1) more extensive inline documentation 2) more complete expression of the APIs across concerns, as well as stricter separation.
2.5 often, also a bit of demonstrative structure that could be more concise but exists in a less compact form to demonstrate it’s purpose and function (high degree of cleverness avoidance)
All in all, if you don’t just let it run amok, you can end up with better code and increased productivity in the same stroke, but I find it comes at about a 15% plumpness penalty, offset by readability and obvious functionality.
Oh, forgot to mention, I always make it clean room most of the code it might want to pull in from libraries, except extremely core standard libraries, or for the really heavy stuff like Bluetooth / WiFi protocol stacks etc.
I find a lot of library type code ends up withering away with successive cleanup passes, because it wasn’t really necessary just cognitively easier to implement a prototype. With refinement, the functionality ends up burrowing in, often becoming part of the data structure where it really belonged in the first place.
Also, AI is better at reading code than writing it, but the overhead to FIND code is real.
My personal anecdote: I used an LLM recently to basically vibe code a password manager.
Now, I’ve been a software engineer for 20 years. I’m very familiar with the process of code review and how to dive in to someone else’s code and get a feel for what’s happening, and how to spot issues. So when I say the LLM produced thousands of lines of working code in a very short time (probably at least 10 times faster than I would have done it), you could easily point at me and say “ha, look at ninkendo, he thinks more lines of code equals better!” And walk away feeling smug. Like, in your mind perhaps you think the result is an unmaintainable mess, and that the only thing I’m gushing about is the LOC count.
But here’s the thing: it actually did a good job. I was personally reviewing the code the whole time. And believe me when I say, the resulting product is actually good. The code is readable and obvious, it put clean separation of responsibilities into different crates (I’m using rust) and it wrote tons of tests, which actually validate behavior. It’s very near the quality level of what I would have been able to do. And I’m not half bad. (I’ve been coding in rust in particular, professionally for about 2 years now, on top of the ~20 years of other professional programming experience before that.)
My takeaway is that as a professional engineer, my job is going to be shifting from doing the actual code writing, to managing an LLM as if it’s my pair programming partner and it has the keyboard. I feel sad for the loss of the actual practice of coding, but it’s all over but the mourning at this point. This tech is here to stay.
(wow funny how these vibe code apps always are copies of something theres many open source versions of already)
https://github.com/ninkendo84/kenpass
I'm not saying it's perfect, there's some things I would've done differently in the code. It's also not even close to done/complete, but it has:
- A background agent that keeps the unsealed vault in-memory
- A CLI for basic CRUD
- Encryption for the on-disk layout that uses reasonably good standards (pbkdf2 with 600,000 iterations, etc)
- Sync with any server that supports webdav+etags+mTLS auth (I just take care of this out of band, I had the LLM whip up the nginx config though)
- A very basic firefox extension that will fill passwords (I only did 2 or 3 rounds of prompting for that one, I'm going to add more later)
Every commit that was vibe-coded contains the prompt I gave to Codex, so you can reproduce the entire development yourself if you want... A few of the prompts were actually constructed by ChatGPT 5.2. (It started out as a conversation with ChatGPT about what the sync protocol would look like for a password manager in a way that is conflict-free, and eventually I just said "ok give me a prompt I can give to codex to get a basic repo going" and then I just kept building from there.)
Also full disclosure, it had originally put all the code for each crate in a single lib.rs, so I had it split the crates into more modules for readability, before I published but after I made the initial comment in this thread.
I haven't decided if I want to take this all the way to something I actually use full time, yet. I just saw the 1password subscription increase and decided "wait what if I just vibe-coded my own?" (I also don't think it's even close to worthy of a "Show HN", because literally anybody could have done this.)
Did you investigate prior art before setting out on this endeavor? https://www.google.com/search?q=site%3Agithub.com+password+m...
I ask because engineers need to be clever and wise.
Clever means being capable of turning an idea into code, either by writing it or recently by having the vocabulary and eloquence to prompt an LLM.
Wisdom means knowing when and where to apply cleverness, and where not to. like being able to recognize existing sub-components.
Lol no, I had no idea there was any other password managers! Thanks for the google search link! I didn't know search engines existed either!
> Wisdom means knowing when and where to apply cleverness, and where not to. like being able to recognize existing sub-components.
It says literally in the README that part of this is an exercise in seeing what an LLM can do. I am in no way suggesting anyone use this (because there's a bazillion other password managers already) nor would I even have made this public if you hadn't baited me into doing it.
The fact that there's a literal sea of password managers out there is why I'm curious enough to think "maybe a one that I get to design myself, written to exactly my tastes and my tastes alone could be feasible", and that's what this exercise is about. It literally took me less time to vibe-code what I have right now, than to pour through the sea of options that already exist to decide which one I should try. And having it be mine at the end means that I can implement my pet features the way I want, without having to worry one bit about fighting with upstream maintainers. It's also just fun. I thoroughly enjoy the process of thinking about the design and iterating on it.
> it actually did a good job.
applies when there is a sea of "prior art" on the topic requested. And that request (prompt) is actually framed/worded properly to match that prior art.
Which may be perfect if the target is reduceable to prior-art. Re-use, Mix-and-match, from opensource or stackoverflow, into my-own-flavour-hot-water, finally!
No, this is not sarcasm. i hate to (catch myself a month later) reinventing hot-water. Let something else do it.
The question that stays with me is, How to keep the brain-bits needed for that inventing / making new stuff , alive and kicking.. because they will definitely deteriorate towards zero or even negative. Should we reinvent each 10th thing? just for the mental-gym-nastics?
What would you say is your multiplier, in terms of throughly reviewing code vs writing it from scratch?
The impressive thing isn't merely that it produces thousands of lines of code, it's that I've reviewed the code, it's pretty good, it works, and I'm getting use out of the resulting project.
> What would you say is your multiplier, in terms of throughly reviewing code vs writing it from scratch?
I'd say about 10x. More than that (and closer to 100x) if I'm only giving the code a cursory glance (sometimes I just look at the git diff, it looks pretty damned reasonable to me, and I commit it without diving that deep into the review. But I sometimes do something similar when reviewing coworkers' code!)
My impression is that, as someone else wrote, we do not have an actual metric for such things as productivity or quality or what have you, but some people do want to communicate that they feel (regardless of if that matches reality) using an LLM is better/faster/easier and they latch to the (wrong) assumption about more LoC == better/faster that non-programmers already believed for years (intentionally or not, they may also deluding themselves) as that is an easy path to convince them that the new toys have value that applies to the non-programmers too (note that i explicitly ignore the perspective of the "toymakers" as those have further incentives to promote their products).
Personally i also have about 2 decades of professional experience (more if counting non-professional) and i've been toying with LLMs now and then. I do find them interesting and when i use them for coding tasks, i absolutely find useful cases for them, i like to have them (where possible) write all sorts of code that i could write myself but i just don't feel like doing so and i do find them useful for stuff i'm not particularly interested in exploring but want to have anyway (usually Python stuff) and i'm sure i'll find more uses for them in the future. Depending on the case and specifics i may even say that in very particular situations i can do things faster using LLMs (though it is not a given and personally that is not much of a requirement nor something i have anywhere high in my interest when it comes to using LLMs - i'd rather have them produce better code slower, than dummy/pointless/repetitive code faster).
However one thing i never thought about was how "great" it is that they generate a lot of lines of code per whatever time interval. If anything i'd prefer it if they generated less line of code and i'd consider an LLM (or any other AI-ish system) "smarter" if they could figure out how to do that without needing hand holding from me. Because of this, i just can't see LoCs as anything but a very bad metric - which is the same as when the code is written by humans.
How can you say that when all these models are externally sourced by companies that actively make a loss per token? When they finally need to make a profit, how can we be sure these models as well as their owners will remain as reliable and not enshittified? Anthropic has been blacklisted in the last 24 hours so its a turbulent industry to say the least
Even with supposedly expert human hand written software powering our products for the last decades, they frequently crash, have outages, and show all sorts of smaller bugs.
There are literally too many examples to count of video games being released with nigh-unplayable amounts of bugs and still selling millions and producing sequels.
Windows 95 and friends were famously buggy and crash prone yet produced one of the most valuable companies in the world.
For staff engineers it’s obviously completely nonsense, many don’t code and just ship architecture docs. Or you can ship a net negative refactor. Etc.
So this should tell you that LLMs are still in “savant JD” territory.
That said, being given permission to ship more lines of code under existing enterprise quality bars _is_ a meaningful signal.
I also use AI this way, periodically achieving a net negative refactor.