upvote
Here's a fun one: firefox lists its current count at about 2.5M LOC, from roughly 1M commits during the years.

You end up with about 3 lines added per commit, which is not ridiculous when you consider that most would be editions rather than full additions.

Here, we have 1500 PRs and 1M LOC, which is about 650 added LOC per PR. Remember, not 650 lines total in the PR, but +650 balance after additions-removals.

Fun questions for attentive readers:

- What does a project growing at a rate of one full firefox-codebase worth of LOC per year look like, a decade down the line?

- What does the line count say about the verbosity of the tool, and what does it say about outcomes that the purpose of the project isn't clearly disclosed?

- Do we have reasons to care about LOC in a world where we don't write code manually? What happens to token usage numbers when the codebase is significantly larger?

- If it was confirmed that LLM usage blows up your line count, what's the implication for codebases that want to return to manual coding after months of usage? (Say, because the tool gets expensive).

reply
> - Do we have reasons to care about LOC in a world where we don't write code manually? What happens to token usage numbers when the codebase is significantly larger?

Yes, at least to the extent that we care about context windows and tokens consumed by coding agents processing code that is ultimately irrelevant to their assigned task.

Anecdotally, I've found keeping file sizes small has been important for agentic coding not just to maintain human readability, but also for optimizing agent performance, precisely because it limits the amount of incidental context they load while working a problem, because they generally load entire files rather than just parsing the part relevant to their current assignment as a human might. That smaller file size thus reduces input noise and the LLM generates a tighter solution, which in turn reduces input noise for future solutions. Or at least this strategy avoids a death spiral into exploding context length.

I expect (but cannot currently prove) that keeping overall LOC down yields similar benefits even when file sizes are kept small because it spares the LLM from parsing potentially relevant files that prove irrelevant to its current task.

reply
Seconded on smaller files. I feel like I tend to get better responses faster.

A notable flaw here is that I’ve not tried large vs small files in a large codebase. Most of my experimentation there has been on personal projects where even a small file contains a significant part of the project. I could see degradation when it has to load 5 files to figure out how something works.

Total LOC (tokens, really, literal lines probably don’t matter) is interesting as a factor. That might go some way towards explaining why LLMs are weirdly good at Clojure.

Eg last I checked Anthropics one-shot performance on Clojure was about the same as Python or Go despite almost certainly being less represented in training data. The combination of density and simple primitives might be easier for an LLM to wrangle, ameliorating the impact of a less popular language.

reply
>Eg last I checked Anthropics one-shot performance on Clojure was about the same as Python or Go despite almost certainly being less represented in training data. The combination of density and simple primitives might be easier for an LLM to wrangle, ameliorating the impact of a less popular language.

There might be tons of confounding factors there. One that comes to mind is the quality of of data, it might perfectly be that the average clojure snippet is higher quality, due to the users demographics. Very few people start writing code with clojure, whether in college or during bootcamps.

reply
Does the Firefox LOC include ALL forms of text: infrastructure (Firefox doesn’t have), documentation, developer scripts,tests, etc? How is the test coverage of Firefox?
reply
When I got to the 1M LOC I involuntarily paused feeling like this must be satire.
reply
We've known for decades that output metrics like LOC/day are very bad measures of real productivity in software. But they seem to be back in vogue in the age of AI, because AI is so good at maxing these useless metrics, and we need to show how impressive our AI is and how impressive our usage of AI is.
reply
They never specified what exactly the product was, without which it's impossible to judge the post.

For some reason most of the uses of "agents" are to build yet other AI products, it's turtles all the way down. Maybe that says more about the field of harnesses than it does about the power of "agents".

reply
There is a sense in which it doesn’t matter at all; many of the limitations of agents in large codebases are just the context management challenges. So proving that you can cohere and progress at O(1m) is a useful scale observation. “Can I use agents in my 1m line codebase?”

There is of course another sense in which the output quality is the only thing that matters. “Can I use agents to build a 1m line codebase that I want to maintain going forward.”

I take this as being exclusively a tech demo of the former. Quality (feature velocity, bugs, scalability) is not demonstrated.

reply
Feels like the active discovery going on is trying to understand what is computer vs what is AI, for every product.

Agents help a ton with the discovery, but the act of building a product needs a deeper level of thought and validation to make it actually better than what came before. So IMO what you see is people still learning what needs to be understood and crafted first hand to make a product better (including economics)

We’ll get there if more of us try

reply
It feels like the update cadence has indeed sped up. But not necessarily quality.

Looking at MS Office I notice a lot of small changes recently that are mostly annoying. Things like Word comments losing the focus after you @-tagged a colleague, needing to click the Outlook search field twice before you can enter text, Outlook mobile date picker losing its ability to show your and attendee's availability.

So it looks like lots of throughput, but unfortunately breaking features that work. Or wasting time on things that don’t matter such as the status bar of OneDrive search circling around the input field.

reply
I’ve been vibe coding a lot over the past year or so, and I think I’m going to stop. In fact, I sort of want to challenge myself to see, can I go back to a sort of the fork in the road with the old copilot autocomplete workflow and really maximize that. Be in the drivers seat for most of the code being written, but find ways to use AI to really enhance the flow state / remove blockers. Tools only minimal actual code generation.
reply
One workflow I like is writing a comment for what I’m about to do and then waiting a few seconds and then tab through the auto-completions. Then I check what the agent came up with, make some edits, and then on to the next block. That works well, I feel in control but don’t have to type as much.

I do use claudecode totally hands off too however. Mostly for UI tasks. Like themifying css or data grids and CRUd with all the bells and whistles, I hate that stuff and cc gets it done in minutes and mostly right. It’s also super nice to say things like “user profile in the upper right hand corner” without having to fight css.

/if it’s not clear, I hate dealing with css and related frameworks.

reply
I would be very impressed with someone who's been vibecoding "a lot" for about a year who could then go back to being fully in the loop for even 50%. I would even say I'd expect withdrawal symptoms at that point.

The dopamine hits are core to why people even do vibecoding (or vibecoding-in-a-dress/spec-driven development) and why they tend to overestimate its output so much. Hell, it's core to all forms of LLM-assisted development (because it feels like magic), but most of the other forms are more value, less delusion.

reply
The dopamine hit is real, I feel like that was identified early on by OpenAI and probably lit a fire to get ChatGPT in the hands of the public. Bf Skinner (I think) is the one who narrowed in on variable ratio reward systems to maximize operant conditioning. An LLM, with hallucinations and imperfections, is the perfect variable ratio reward system. It’s no wonder they’re getting pushed so hard along with a consumption based pricing model. Whether you’re a human, rat, plant, bacteria there’s no real defense against that kind of conditioning.

First hit on Google

https://www.simplypsychology.org/operant-conditioning.html

reply
I actually don’t find vibe coding satisfying is one of the many reasons I’m going back. I feel a little of what you’re talking about, but I’m a nerd. I like to code.

But I’m not dismissing your concern. Because it is one of the reasons I’m making this decision. I’m a professional. I’m not just here to feel good I’m here to do a good job over the course of a career. I think all in, when you think about writing good maintainable, software, learning, staying mentally sharp, and speed put together. Vibe coding could be less effective and maybe even in the aggregate “slower”.

reply
The average efficiency improvement is closer to something like 2-3x per Anthropic’s numbers and this is only the rate at which software can advance. Do you expect to notice if 12 months of software engineering on a project you’re following gets done in 6 months? I suspect not.

The root cause is that the acceleration is pareto distributed so the modern engineering team at the moment looks like one 10x engineer, one 5x engineer, and the rest are approximately 1.5x engineers.

reply
I have been building an entire operating system ( not figuratively)

Prior to ai autocomplete 500 loc a day and then with ai autocomplete I could do 2500 a day and now 50k is pretty normal. Walking around tech week with my phone yielded 150k this week

reply
> ended up being a million lines of code

This almost reeks of "I've never cleaned up our code base because there is too much code, and didn't even bother having agents/LLM cleaning them up".

You almost never need a million lines of code - this includes your software, infra, testing and operational tools. You didn't ship the linux kernel in 3 weeks and you know it. The code is already speghetti and it achieve the basic functions OK but it will harder and harder to simplify and untangle and maintain.

reply
Even the linux kernel doesn't need millions of lines of code; most of the actual LOC is device drivers, and you don't need all of them, you just need the ones for the devices you have.
reply
And Linux maintainers are actively pushing to radically cut down on the LOC by eliminating drivers etc.
reply
As a point of reference, 1MLOC is about the size of the entire Python standard library including tests, as well as stuff like IDLE. (Well, the Python part of the code. There's about half that much again of C in Modules/ .)
reply
Yeah I cannot see how "we shipped 1 million lines of code in three weeks" is... something to be proud of haha
reply
They directly address routine code cleanup and regularly paying down technical debt near the end of the article.
reply
I stand corrected, but the LOC being advertised still make me doubt the efficacy of their process.
reply
> should expect maybe 5x faster cycle in major software apps

To what end and what would that even look like though? Enshittifying everything at maximum speed? The apps/platforms I use regularly - GitHub, Spotify, Google maps (just to name a few), have gotten noticeably shittier in recent times.

reply
Confirmation bias. The internet has complained about software updates decades before LLMs became ubiquitous. I made a career fixing human slop by domain experts.

We easily forget that the great majority of software engineering is fixing the mistakes of other highly capable software engineers.

It's just so easy to blame the machine instead of admitting no one here is an expert on anything and they count their hits and not misses. If they did, we would find the probability of making a mistake to be higher than a fronter coding agent.

It's a hard headed crowd and everyone, LLM pilled or not, suffers from the Dunning-Kruger. All of us.

Just look at the comments. Everyone is perfect when they do things themselves.

reply
>GitHub, Spotify, Google maps (just to name a few), have gotten noticeably shittier in recent times.

What if AI lets you create new versions of those tools, but without the enshitification?

I say that being in the "soaking" stage of using AI to rebuild a shitty software project in 70KLOC over about 2 weeks of spare time, so this may not be as theoretical as you might think.

reply
Oh I definitely agree that AI can and will help create great software.

It's just that creating great software isn't really the SV/VC/big tech business model or main goal.

reply
> What if AI lets you create new versions of those tools, but without the enshitification?

I'm not sure I fully understand what you're saying here. Isn't the value of these tools almost entirely independent of their actual software? That is, we have many good open source, self-hostable forges (Forgejo, sr.ht, etc.), lots of great music player software (Jellyfin, Symphonium, etc.), and decent maps software (OsmAnd and Organic Maps). People use GitHub, Spotify, and Google Maps -- perhaps even _put up_ with their often bad/glitchy software -- because of network effects (all three) and content/licensing partnerships (Spotify/GMaps). That proprietary data isn't something AI can help you with, right?

reply
It really depends on the use-case. For example, my most starred github repo is a tool to convert Spotify playlists to YouTube Music (that was done pre-AI). Github depends on what issues you have with it, what your use case is, and whether you can leverage some of the network effects via API from the github source. Maps, same story.
reply
AI coders are great for making scrapers, possibly because AI companies use their own tools to make an awful lot of scrapers.
reply
This is a lot tamer than what Claude Code's team claims tbf.
reply
[flagged]
reply
[flagged]
reply
[dead]
reply
It is likely better because AI agents make access to domain knowledge easier. However, I would wager that the problem is people don’t remember the code well. The problems are going to be long-term as the pace of change increases.

If you think about it, successful products rely on designing well-thought-out experiences, customer discovery (see all the Forward-Deployed Enginneer job listings at OpenAI) so the code velocity somewhat becomes irrelevant.

If you’re solving the right problem and you’ve got a good team then competitive advantage comes from somewhere OUTSIDE of code velocity.

The more important question I think is does faster code yield more value long-term? At the moment, it’s like yeah we do 3.5 pull requests per day.

I’m thinking, great, good for you. You could also combine three pull requests into one and then you’re doing 1 per day. This is quantitative data that doesn’t really mean anything tangible.

reply