undefined

points

[-]

LOC is useful here not because it's a metric for output but because it's a metric for _understandability_. Reviewing 200 lines is a very different workload than reviewing 2000.

by moregrist19 hours ago|

parent|

[-]

It’s still a bad metric.

I have worked with code where 1000s of lines are very straightforward and linear.

I’ve worked on code where 100 lines is crucial and very domain specific. It can be exceptionally clean and well-commented and it still takes days to unpack.

The skills and effort required to review and understand those situations are quite different.

One is like distance driving a boring highway in the Midwest: don’t get drowsy, avoid veering into the indistinguishable corn fields, and you’ll get there. The other is like navigating a narrow mountain road in a thunderstorm: you’re 100% engaged and you might still tumble or get hit by lightning.

by jimbokun18 hours ago|

parent|

[-]

The number of bugs tends to be linear to lines of code written meaning fewer lines of code for the same functionality will have fewer bugs.

So I’m pretty skeptical that reviewing 2000 lines of code won’t take any more time than reviewing 200 lines of code.

Furthermore how do you know the AI generated lines are the open highway lines of code and not the mountain road ones? There might be hallucinations that pattern match as perfectly reasonable with a hard to spot flaw.

by moregrist11 hours ago|

parent|

[-]

> The number of bugs tends to be linear to lines of code written meaning fewer lines of code for the same functionality will have fewer bugs.

It depends on the code. If you’re comparing code of the same complexity then, sure, 2000 lines will take longer than 200.

I was comparing straight linear code to far more complex code. The bug/line rate will be different and the time to review per line will be different.

> Furthermore how do you know the AI generated lines are the open highway lines of code and not the mountain road ones?

Again, it depends on the code. Which was my point.

Linear code lacks branches, loops, indirection, and recursion. That kind of code is easy to reason about and easy to review. The assumptions are inherently local. You still have to be alert and aware to avoid driving into the cornfields.

It’s a different beast than something like a doubly-nested state machine with callbacks, though. There you have to be alert and aware, and it’s inherently much harder to review per line of code.

by lelandfe19 hours ago|

parent|

prev|

[-]

There’s still a limit on how far one can drive in a day, no matter the road.

by 19 hours ago|

parent|

prev|

[-]

deleted

by jazzypants20 hours ago|

parent|

prev|

[-]

That's assuming the 200 lines are logical and consistent. Many of my most frustrating LLM bugs are caused by things that look right and are even supported by lengthy comments explaining their (incorrect) reasoning.

by mcmcmc20 hours ago|

parent|

[-]

Ok? No one is saying that all LOC are equal. Ceteris paribus, 2000 lines is 10x more time consuming to review than 200

by 19 hours ago|

parent|

[-]

deleted

by embedding-shape19 hours ago|

parent|

prev|

[-]

> 2000 lines is 10x more time consuming to review than 200

Very far from the truth in practice, every line of code isn't as difficult/easy to review as the other.

by jimbokun18 hours ago|

parent|

[-]

But why would the lines in the 2000 case be easier to review per line?

by squeaky-clean6 hours ago|

parent|

[-]

Which of these programs is easier to review

  {x{x,sum -2#x}/0 1}

  def f(n):
      if n <= 1:
          return n
      else:
          return f(n-1) + f(n-2)

They're both the same program

by mcmcmc18 hours ago|

parent|

prev|

[-]

Holy shit, read the words I wrote. Ceteris Paribus. Assume the 200 lines and 2000 lines have a similar distribution of complexity.

by badc0ffee14 hours ago|

parent|

[-]

Romanes eunt domus

by 18 hours ago|

parent|

prev|

[-]

deleted

by jazzypants19 hours ago|

parent|

prev|

[-]

The point is that LOC is never a good metric for any aspect of determining the quality of code or the coder because it ignores the nuance of reality. It's impossible to generalize because the code can be either deceptively dense or unnecessarily bloated. The only thing that actually matters is whether the business objective is achieved without any unintended side effects.

by mcmcmc19 hours ago|

parent|

[-]

> The only thing that actually matters is whether the business objective is achieved without any unintended side effects.

Objectives change; timeliness matters. The speed at which you deliver value is incredibly important, which is why it matters to measure your process. Deceptively dense is what I’d call software engineers who can’t accept that the process is actually generalizable to a degree and that lines of code are one of the few tangible things that can be used as a metric. Can you deliver value without lines of code?

by jazzypants19 hours ago|

parent|

[-]

> Objectives change; timeliness matters. The speed at which you deliver value is incredibly important, which is why it matters to measure your process.

This assumes that shorter code is faster to write. To quote Blaise Pascal, "I would have written a shorter letter, but I did not have the time."

> Can you deliver value without lines of code?

No, but you can also depreciate value when you stuff a codebase full of bloated, bug-ridden code that no man or machine can hope to understand.

by mcmcmc18 hours ago|

parent|

[-]

You seem determined to misinterpret. I’m not talking about LOC as a measure of productivity. The ratio of LOC needing review to the capacity of reviewers (using how many LOC can be read/reviewed over the sampling period) is what’s being discussed. Agentic AI/vibe coding has caused that ratio to increase and shows a bottleneck in the SDLC. It’s a proxy metric, get over yourself.

“All models are wrong, some are useful”. What’s not useful is constantly bitching about how there’s no way to measure your work outside of the binary “is it done” every time process efficiency is brought up.

by jazzypants17 hours ago|

parent|

[-]

Yes, reading this back, I definitely veered off-topic. I apologize. I still don't think that you can say how much time it will take to review code based on how many lines of code are involved, but my argument was not well crafted. I just hope that others can learn something from our discussion. Thank you for being patient with me, and I hope you have a good day! :)

by mrbnprck19 hours ago|

parent|

prev|

[-]

Its still posssible to run any LLM in a loop and optimize for LoC while preserving the wanted outcome.

by keeda18 hours ago|

prev|

[-]

LoC is perfectly fine as a metric for engineering output. It is terrible as a standalone measure of engineering productivity, and the problems occur when one tries to use it as such.

It's still useful, however, because that is the only metric that is instantly intuitively understandable and comparable across a wide variety of contexts, i.e. across companies and teams and languages and applications.

As we know, within the same team working on the same product, a 1000 LoC diff could take less time than a 1 line bug fix that took days to debug. Hence we really cannot compare PRs or product features or story points across contexts. If the industry could come up with a standard measure of developer productivity, you'd bet everyone would use it, but it's unfeasible basically for this very reason.

So, when such comparisons are made (and in this case it was clearly a colloquial usage), it helps to assume the context remains the same. Like, a team A working on product P at company C using tech stack T with specific software quality processes Q produced N1 lines of code yesterday, but today with AI they're producing N2 lines of code. Over time the delta between N1 and N2 approximates the actual impact.

(As an aside, this is also what most of the rigorous studies in AI-assisted developer productivity have done: measure PRs across the same cohorts over time with and without AI, like an A/B test.)

by faizshah19 hours ago|

prev|

[-]

I experimented with vibe coding (not looking at the code myself) and it produced around 10k LOC even after refactors etc.

I rewrote the same program using my own brain and just using ChatGPT as google and autocomplete (my normal workflow), I produced the same thing in 1500 LOC.

The effort difference was not that significant either tbh although my hand coded approach probably benefited from designing the vibe coded one so I had already though of what I wanted to build.

by embedding-shape19 hours ago|

parent|

[-]

Sounds like a great oppurtunity to understand your own development process, and codify it in such detail that the agent can replicate how you work and end up with less code but doing the same.

My experience was the same as you when I started using agents for development about a year ago. Every time I noticed it did something less-than-optimal or just "not up to my standards", I'd hash out exactly what those things meant for me, added it to my reusable AGENTS.md and the code the agent outputs today is fairly close to what I "naturally" write.

by 8note18 hours ago|

parent|

[-]

or go with this, and use the agent to prototype ideas, and write it yourself once you know what you want

by jwpapi12 hours ago|

prev|

[-]

I deleted 75000 lines of code of my codebase in the last 2 months and that was tremendously more useful to by business than the 75000 AI has written the 2 months before...

by mcmcmc20 hours ago|

prev|

[-]

Is it? The whole point of the article is that the rate of output for writing code has surpassed the rate at which it can be reviewed by humans. LOC as an input for software review makes a lot of sense, since you literally need to read each line.

by adtac20 hours ago|

prev|

[-]

LOC is the worst metric for engineering output, except for all the others - Churchill

by deadbabe20 hours ago|

parent|

[-]

The amount of times an engineer says what the fuck while reading code still seems like a reliable metric for code quality assessment.

by dyauspitr18 hours ago|

parent|

[-]

We won’t be doing that for much longer, enjoy it while you can.

by seanw44412 hours ago|

parent|

[-]

Two more weeks!

by deadbabe10 hours ago|

parent|

prev|

[-]

I’m sure an agent can audibly play “what the fuck” as it crunches tokens reading through a codebase

by AnimalMuppet18 hours ago|

parent|

prev|

[-]

Somewhat reliable, yes. Not objective, though, and hard to reproduce.

by deadbabe18 hours ago|

parent|

[-]

In a world where everything is vibes now that doesn’t matter much.

by root_axis19 hours ago|

prev|

[-]

He's not using LOC as a metric, he's making an observation about the impact of a change in the typical volume of LOC.

by np18109 hours ago|

prev|

[-]

I just read somewhere on HN that "code is a liability, not an asset, the idea behind the code/final product is the actual asset." And, I can't agree more...

> It is so embarrassing that LOC is being used as a metric for engineering output.

In one of my previous org, LOC added in the previous year was a metric used to find out a good engineer v/s a PIP (bad) engineer. Also, LOC removed was treated as a negative metric for the same. I hope they've changed this methodology for LLM code-spitting era...

by etothet20 hours ago|

prev|

[-]

Agreed. And, LOC has historically been one of the things we've collectively fought against management for how to evalute a "productive" developer!

by ButyTh019 hours ago|

parent|

[-]

Why?

We should have gone the other way; generated a lot of code and demanded pay raises; look at the LOC I cranked out! Company is now in my debt!

If they weren't going to care enough as managers to learn and line go up is all that matters to them, make all lines go up = winning

You all think there's more to this than performative barter for coin to spend on food/shelter.

by embedding-shape19 hours ago|

parent|

[-]

Because not everyone is just out after earning the most money, some people also want to enjoy the workplace where they work. Personally, what the quality of the codebase and infrastructure is in matters a lot for how much you enjoy working in it, and I'd much rather work in a codebase I enjoy and earn half, than a codebase made by just jerking out as many LOC as possible and earn double.

Although this requires you to take pride in your profession and what you do.

by ButyTh018 hours ago|

parent|

[-]

All of human agency must prop up the vanity of you. Of all people.

Got it.

...ok fine; lack of political action to put us all on the hook for your healthcare is your choice to take a gamble on a paycheck. It's a choice to say your own existence is not owed the assurance of healthcare.

So I will honor your choice and not care you exist.

by christophilus13 hours ago|

parent|

prev|

[-]

> in my debt

Good way of putting it.

by autoconfig12 hours ago|

prev|

[-]

The charitable interpretation here is obviously that the LoCs are equivalent in quality, in which case it is a very useful metric in the context that was presented. The inability to infer that should be embarrassing.

by sva_14 hours ago|

prev|

[-]

I wonder if '2000 LOC' was chosen to refer to this old anecdote from the 80s:

https://www.folklore.org/Negative_2000_Lines_Of_Code.html

by vrganj19 hours ago|

prev|

[-]

I read somewhere that measuring software engineering output by LoC is like measuring aerospace engineering by pounds added to the plane and I thought that was an apt comparison.

by hungryhobbit19 hours ago|

prev|

[-]

Humans are also incredibly varied and different.

Do you reject all stats that treat the number of people involved (eg. 2 million pepole protested X) as "embarrassing" ... because they lump incredibly varied people together and pretend they're equal?

by moomoo1113 hours ago|

prev|

[-]

I follow Garry Tan on X and he’s a big proponent of LOCmaxxing using AI.

AI helps eng ship more and faster, I think that’s the takeaway.

by dyauspitr18 hours ago|

prev|

[-]

Honestly it’s more like 200 to a 100,000 of pretty decent quality code at this point.

by kashyapc20 hours ago|

prev|

[-]

[flagged]

by simonw18 hours ago|

parent|

[-]

This was a podcast, not a pre-scripted talk. I suggest listening to the audio version - it makes it more clear that this was thinking out loud, not carefully considering every word.

by kashyapc17 hours ago|

parent|

[-]

I see, fair point. Sorry for taking a dig at you. Please know that I do appreciate a lot of work that you do. I was just worried for a moment when just reading that bit.

by Daishiman19 hours ago|

parent|

prev|

[-]

LOC is very much an effective metric for general productivity for the median feature. You can't code golf most lines of code out of existence.

We're also assuming LOC vibe coded by competent engineers who should be able to tell when something is overengineered.

by estimator729220 hours ago|

prev|

[-]

At least "mentions of LOC" is now a great metric for "how clueless is this person"