undefined

points

[-]

I personally don't view coding agents making software as "software gotten better" you are comparing a tool and the end result, these are two different things. Agent you use going down and your product going down mean two different things to you customers. I will not deny that we made incredible progress in coding and hell, even design over the past 3.5 years, this technology is here to stay.

That being said while I agree that measuring better quality of software is vague (part of the reason it is hard for models as well), there are universal things I believe every engineer will agree on. Reliability, uptime, customer feedback, legibility of your engineering, performance, these are things we often optimized for. Google Maps is a bit of a strawman because neither of us (unless you work on it), knows how much agent code there is, I think it is likely that it's little since it was working fine prior to 2023. I could bring up github reliability as an example, given how much copilot usage they promote at MS, but once again only folks there know for certain. I do, however, see scores of various AI powered SAAS that looks like it is in a perpetual MVP state. I think you are right in that even if agents give us "good enough" results and we can swallow failure rates and our increasingly lesser understanding of what we, or more so model, created, then it is still progress overall, but this is progress not to human-AI collaboration but to AI-only engineering IMO, this is good or bad depending on how you view the future.

I'm a scientist and most of code I currently write is somewhere on the intersection of critical software and machine learning, squaring these two is not easy and I guess the way I was taught to reason about engineering informs my opinions on this. Maybe it's just a matter of time before codex can help here in an unconstrained manner as well, but I am skeptical at the moment.

by YZF50 minutes ago|

parent|

[-]

My point was "look at what computers(software) can do today vs. 3 years ago" - for everyone. You are saying that software that the arborist can have ChatGPT write to help it draw the garden isn't the same quality as a team of software engineers would write manually (I think). GitHub is (mostly?) software for software developers. Most software in the world is software for random people. Nobody(tm) cares about the quality of GitHub. The "has software (computers) gotten better" has to be measured from the perspective of the consumer, not the perspective of the software engineer, and nobody using AI is going to tell you "computers now are worse/can do less than they were 3 years ago". At least that's my thesis.

If AI today can make you more productive that's already progress. If it can't then maybe it makes other people more productive.

by Fargren8 hours ago|

prev|

[-]

> Lines of code has always been a terrible metric. But all else being equal it is a measure.

A terrible metric is _worse_ than no metric. A terrible metric can _only_ lead you in the wrong direction. "No metric" means saying we don't know, and that leads us to stop and reconsider. But we've taken "move fast and break things" as a mantra, and we'd rather run towards any direction than stay still.

Using LoC as a metric for quality of LLMs will promote LLMs that write more code. It's better to say we have no way to compare different LLMs than it is to say "let's use the LLMs that produced more LoC because at least we can measure that". We, as an industry, should be focusing on developing better metrics for quality, not on improving LLMs based on known-bad metrics. We should be turning to the computer scientists, not to the venture capitalists.

When a pundit talks about how many lines of code an LLM has created, we should lose all respect for them. It's as if someone talking about physics measured the phlogiston, or as if a doctor started measuring our skulls. We know these theories don't work, and anyone using them should be mocked.

by arcticbull13 hours ago|

prev|

[-]

I also can't help but notice they didn't mention how many tokens were burned, or how much that translates to in terms of cost over the 5 months at enterprise AI prices. I'm going to guess this wasn't a cheap demo.

by csomar9 hours ago|

prev|

[-]

> Is Google Maps suddenly taking you the wrong way more often?

Funny you mention that because I had that issue in a cab just yesterday. Google decided to drive us of the main road to a series of small roads which happened to be a dead end. My guess is that the AI decided that this is a shorter road? less busier road?

That being said, Google maps have been gradually degrading. Most notably, its search function is quasi-broken now.