The current fever pitch mandates from above seem to want it applied liberally, and pushing back against that is so discouraging and often career-limiting as to wear the fabric of one's psyche threadbare. With all the obvious problems being pointed out to people, there are just as many workarounds; and these workarounds, as is often revealed shortly thereafter, have their own problems, which beget new solutions, ad infinitum.
At some point it genuinely seems like all this work is for the sake of the machine itself. I suppose that is true: The real goal has become obscured at so many firms today, that all that remains is the LLM. Are the people betting the farm and helping implement the visions of those who have done so guaranteed a soft exit to cushion them from the consequences, or is rationality really being discarded altogether?
Sure, sound engineering principles can help work around these problems, but what efficiency is truly gained, in terms of cognitive load, developer time, money, or finite resources? Or were those ever an earnest concern?
1. They're low stakes to get wrong.
2. The most common is MCPs or similar ai-tooling.
3. Making them look good takes time and effort still. It's a multiplier, not a replacement.
4. Quality and maintainability require investment. I had to restart an agentic project several times because it painted itself into a corner.
It’s an absolute game changer, and it can now multiply your productivity fivefold if it’s a solo greenfield project.
Maybe half a year ago it was as you said. You had to wait for the agent to finish, you had to review carefully, and often the result was not that great. You did not save a lot of time.
Now I can spin up 3+ parallel conversations in Codex, each in a git worktree. My work is mainly QA testing the features, refining the behavior, and sometimes making architectural decisions.
The results are now undeniable. In the past I could not have developed a product of that scope in my free time.
That is what is possible today. I suspect many engineers have not yet tried things that became feasible over the last months. Like parallel agents, resolving merge conflicts, separating out functionality from a large branch into proper PRs.
I have heard this statement every single day for 2 years and yet we still have no companies compressing 10 years into 1 year thus exploding past all the incumbents who don't "get it".
> if it’s a solo greenfield project
which is a pretty large caveat. Anecdotally, I've found my side projects (which are solo greenfield projects, and don't need to be supported to the same standards as enterprise software) have gained the boost the GP was talking about.
At work, it's different, since design, review, and maintenance is much more onerous.
The first line of code was written on November 25th. It achieved adoption in the "personal agents" space that far exceeded the other companies that had tried the same thing.
(Whether or not you trust the quality of the software you can't deny the impact it had in such a short time. It defined a new category of software.)
Like, look at e.g. YC minus the AI and AI ajacent companies. Are those startups meaningfully more impressive or feature-rich as compared to a couple years ago?
I expect we will start seeing the impact of the new coding agent enhanced development processes over the next few months.
If agents could really compress 10 years of development into 1 year, you'd see people making e.g. HFT platforms and becoming obscenely rich, not making a fun open-source project and getting hired by OpenAI as an employee.
https://tools.simonwillison.net/github-repo-stats?repo=OpenC...
I meant a month for the initial release, not current state.
Regardless, much like lines of code, number of commits is not a good metric, not even as a proxy, for how much "work" was actually done. Quickly browsing there are plenty[0] of[1] really[2] small[3] commits[4]. Agentic coding naturally optimizes for small commits because that's what the process is meant to do, but it doesn't mean that more work is being done, or that the work is effective. If anything, looking at the changelog[5] OpenClaw feels like a directionless dumpster fire right now. I would expect a lot more from a project if it had multiple people working on it for 5 years, pre-AI.
[0] https://github.com/openclaw/openclaw/commit/e43ae8e8cd1ffc07...
[1] https://github.com/openclaw/openclaw/commit/377c69773f0a1b8e...
[2] https://github.com/openclaw/openclaw/commit/ffafa9008da249a0...
[3] https://github.com/openclaw/openclaw/commit/506b0bbaad312454...
[4] https://github.com/openclaw/openclaw/commit/512f777099eb19df...
[5] https://github.com/openclaw/openclaw/blob/main/CHANGELOG.md
> (Whether or not you trust the quality of the software you can't deny the impact it had in such a short time. It defined a new category of software.)
I brought up OpenClaw here because the challenge was:
> we still have no companies compressing 10 years into 1 year thus exploding past all the incumbents who don't "get it".
I don't know anything about the code quality of OpenClaw, but telling me the number of commits tells me precisely nothing of use.
If that were true, all of these anti-AI greybeards who have been in the game for 30 years would all own their own jets.
Which is exactly why you can't use it as an example, there is no control. This is basic stuff.
https://www.reuters.com/technology/openclaw-enthusiasm-grips...
Cryptocurrencies? Barely any other use than money laundering, buying drugs and betting on the outcome of battles in war. And NFTs? No use at all other than money laundering and setting money ablaze.
It's like I never wrote them, because I didn't. I've got the gist of them, but it's the same way I get the gist of something like Numpy: I know how it works theoretically, but certainly not specifically enough to jump in and write some working Fortran that fixes bugs or adds features.
I now have a bunch of stalled projects I'm not very familiar with. I no longer do solo green field projects that way.
Why do I not see 5x as many interesting greenfield projects than before?
That's a big if. I don't have numbers but most professional engineers are not working on such projects
The degenerate side is clueless upper management and fad-driven engineering. We have talked extensively about this.
There is a more rational side to it that I've seen in my org: some engineers absolutely refuse to use AI and as a consequence they are now, clearly and objectively, much less productive than other engineers. The thing is, you still need to learn how to use the tool, so a nontrivial percentage of obstinate engineers need to be driven to use this in the same way that some developers have refused to use Docker or k8s or whatever.
Perhaps these “obstinate” engineers have good reason in their decision. And it should be their decision!
To be so confident in what is “the right way (TM)” and try to force it onto others is... revealing.
Sounds like a human? The ‘statistical’ part is arguable, I suppose.
I'm sure I will have no problem whatsoever remaining in the employ of a firm that trusts me to make products and tooling that still push the envelope of what's possible without having to resort to the sheer brute force of trillion parameter-scale models.
After 18 months the hard evidence is in place. And much like replacing bare-metal servers for many use cases where evidence shows that the burden of k8s or the substitution of shell scripts for Terraform, it's time to move on.
I don't really see a place for no AI usage in line-of-business software apps anymore.
Honest question: what about the counter-argument that humans make subtle mistakes all the time, so why do we treat AI any differently?
A difference to me is that when we manually write code, we reason about the code carefully with a purpose. Yes we do make mistakes, but the mistakes are grounded in a certain range. In contrast, AI generated code creates errors that do not follow common sense. That said, I don't feel this differentiation is strong enough, and I don't have data to back it up.
But another answer is that human autonomy is coupled to responsibility. For most line employees, if they mess up badly enough, it's first and foremost their problem. They're getting a bad performance review, getting fired, end up in court or even in prison. Because you bear responsibility for your actions, your boss doesn't have to watch what you're up to 24x7. Their career is typically not on the line unless they're deeply complicit in your misbehavior.
LLMs have no meaningful responsibility, so whoever is operating them is ultimately on the hook for what they do. It's a different dynamic. It's probably why most software engineers are not gonna get replaced by robots - your director or VP doesn't want to be liable for an agent that goes haywire - but it's also why the "oh, I have an army of 50 YOLO agents do the work while I'm browsing Reddit" is probably not a wise strategy for line employees.
Isn’t this just because you have seen a lot of PRs from inexperienced engineers? People learn LLM behavior over time, too.
Yes, as an engineer I make mistakes, but I could never make as many mistakes per day as an LLM can
We're investing in the human getting better rather than paying $100 to Anthropic and hoping that's enough that they don't make the product worse.
Their mental model doesn't map cleanly enough to yours, and so where for a human you'd have some way to follow their thought patterns and identify mistakes, here the alien makes mistakes that don't add up.
Like the alien has encyclopedic knowledge of op codes in some esoteric soviet MCU but sometimes forgets how to look for a function definition, says "It looks like the read tool failed, that's ok, I can just make a mock implementation and comment out the test for now."
People used to like them and they used to be legends (even if not everyone liked them)
Notch, Woz, Linus and Geohot come to mind
The Metasploit creator Dean McNamee worked for me and he was just like that and a total monster at engineering hard tech products
I have no strong idea why people can't accept that intelligence formed separately of a human brain can truly be alien: not in the hyperbolic sense of "that person is so unique it's like they're a different species", but "that thing does not have a brain, so it can have intelligence that is not human-like".
A human without a brain would die. An LLM doesn't have a brain and can do wonderous things.
It just does them in ways that require first accepting that there is no homo sapien thinks like an LLM.
We trained it on human language so often times it borrows our thought traces so to speak, but effective agentic systems form when you first erase your preconceived notions of how intelligence works and actually study this non-human intelligence and find new ways to apply it.
It's like the early days of agents when everyone thought if you just made an agent for each job role in a company and stuck them in a virtual office handing off work to each other it'd solve everything, but then Claude Code took off and showed that a simple brain dead loop could outperform that.
Now subagents almost always are task specific, not role specific.
I feel like we could leap ahead a decade if people could divorce "we use language, and it uses language so it is like us", but I think there's just something really challenging about that because it's never been true.
Nothing had this level of mastery over human language before that wasn't a human. And funnily enough, the first times we even came close (like Eliza) the same exact thing happened: so this seems like a persistent gap in how humans deal with non-humans using language.
Or maybe just maybe... the thing should be much better designed around the human.
That's how personal computers made their way into homes. People like yourself are comical and can't understand how widespread adoption takes place to obtain value from what the thing intrinsically possesses.
Firms literally exist to take care of the hassle so that the person can get the value from the thing closer to the present - like hello...?
We can't choose if the LLM is like us unless you want to go back 10-20 years in time and choose a new direction for AI/ML.
We stumbled upon an architecture with mostly superficial similarities to how we think and learn, and instead focused on being able to throw more compute and more data at our models.
You're talking about ergonomics that exist at a completely different layer: even if you want to make LLM based products for humans, around humans, you have to accept it's not a human and it won't make mistakes like a human (even if the mistakes look human) -
If anything you're going to make something that burns most people if you just blindly pretend it's human-like: a great example being products that give users a false impression of LLM memory to hide the nitty gritty details.
In the early days ChatGPT would silently truncate the context window at some point and bullshit its way through recalling earlier parts of the conversation.
With compaction it does better, but still degrades noticeably.
If they'd exposed the concept of a context window to the user through top level primitives (like being able to manage what's important for example), maybe it'd have been a bit less clean of a product interface... but way more laypeople today would have a much better understanding of an LLM's very un-human equivalent to memory.
Instead we still give users lossy incomplete pictures of this all with the backends silently deciding when to compact and what information to discard. Most people using the tools don't know this because they're not being given an active role in the process.
Despite what the headlines say, these systems aren’t inscrutable.
We know how these things work and can build around and within and change parameters and activation functions etc…and actually use experience and science and guidance.
However those are not technical problems those are organizational social and quite frankly resource allocation problems.
> but effective agentic systems form when you first erase your preconceived notions of how intelligence works and actually study this non-human intelligence and find new ways to apply it.
There's no reason you can't make good use of them and learn how to do it more reliably and predictably, it's just chasing those gains through a human intelligence-like model because they use human language leads to more false starts and local maxima than trying to understand stand them as their owb systems.
I don't think it should even be a particularly contentious point: we humans think differently based on the languages we learn and grew up with, what would you expect when you remove the entire common denominator of a human brain?
Software developers get paid big money because they can speak alien, the only thing that is changing is the dialect.
I'm an engineers engineer: I get the job isn't LOC but being able to communicate and translate meatspace into composable and robust sustems.
So when I mean an alien when I say an alien.
Not human.
Not in the cute "oh that guy just hears what everyone else hears and somehow interprets it entirely differently like he's from a different planet" alien way, but in the, "it is a different definition of intelligence derived from lacking wetware" alien way.
Intelligence is such multidimensional concept that all of humanity as varied as we are, can fit in a part of the space that has no overlap with an LLM.
-
Now none of that is saying it can't be incredibly useful, but 99% of the misuse and misunderstanding of LLMs stems from humans refusing to internalize that a form of intelligence can exist that uses their language but doesn't occupy the same "space" of thinking that we all operate in, no matter how weird or unqiue we think we are.
I swear I'm living through mass hysteria.
I’m not saying that it’s all hunky dory, but you use AI for straight up test driven development to catch edge cases and correct sloppy implementations before they even get coded by your giant chaos machine.
You instruct it to write the code you want to be written. You still have to know how to develop, it just makes you faster.
If I get pwned because my AI agent wrote code that had a security vulnerability, none of my users are going to accept the excuse that I used AI and it's a brave new world. I will get the blame, not Anthropic or OpenAI or Google but me.
The same goes for if my AI generated code leads to data loss, or downtime, or if uses too many resources, or it doesn't scale, or it gives out error messages like candy.
The buck stops with me and therefore I have to read the code, line-by-line, carefully.
It's not even a formality. I constantly find issues with AI generated code. These things are lazy and often just stub out code instead of making a sober determination of whether the functionality can be stubbed out or not.
You could say "just AI harder and get the AI to do the review", and I do this a lot, but reviewing is not a neutral activity. A review itself can be harmful if it flags spurious issues where the fix creates new problems. So I still have to go through the AI generated review issue-by-issue and weed out any harmful criticism.
First of all, building a system that constrains the output of the AI sufficiently, whether that's typing, testing, external validation, or manual human review in extremis. That gets you the best result out of whatever harness or orchestration you're using.
Secondly, there's the level at which you're intervening, something along the hierarchy of "validate only usage from the customer perspective" to "review, edit, and validate every jot and tiddle of the codebase and environment". I think for relatively low importance things reviewing at the feature level (all code, but not interim diffs) is fine, but if you're doing network protocol you better at least validate everything carefully with fuzzing and prop testing or something like that.
And then you've got how you structure your feedback to the LLM itself - is it an in-the-loop chat process, an edit-and-retry spec loop, go-nogo on a feature branch, or what? How does the process improve itself, basically?
I agree with you entirely that the responsibility rests on the human, but there are a variety of ways to use these things that can increase or decrease the quality of code to time spent reviewing, and obviously different tasks have different levels of review scrutiny, as well.
My nonexistent backend isn’t going to be pwned if there is a bug in the thumbnail generation.
After the QA testing on my device, a quick scroll through of the code is enough.
Maybe prompt „are errors during thumbnail generation caught to prevent app crashes?“ if we‘re feeling extra cautious today.
And just like that it saved a day of work.
Hmm. Historically image editing was one of the easier to exploit security holes in many systems. How do you feel about having unknown entities having shell inside your datacenter or vpc?
- webview fallback with canvas capture for codecs not supported in the default player
- detecting blank frames and diff between thumbnails to maximize variety
- UI integration to visualize progress and pending thumbnails, batched updates to the gallery
- versioning scheme and backfill for missing/outdated thumbnail formats
Honestly, a day seems rather optimistic to me. Maybe if I was an expert for this platform and would have implemented a similar feature before, then I could hope to do it in a day.
If I had to handwrite it and estimate it for Scrum at work, I‘d budget a week.
Video thumbnails are a different beast altogether. And you might want to double check your assumptions about security considerations. If any of your ffmpeg, opencv, pyscenedetect code is running on your server, it might well be exploitable.
Ironically, already another user in this comment section was concerned about the security of my nonexistent backend.
But it’s good to know, I was not previously aware that video processing on the backend is a common source of vulnerabilities.