undefined

upvote

points

by matt321055 days ago |

upvote

by MontyCarloHall55 days ago|

[-]

Good question. Why hasn't there been a profusion of new game-changing software, fixes to long-standing issues in open-source software, any nontrivial shipped product at all? Heck, why isn't there a cornucopia of new apps, even trivial ones? Where is all the shovelware [0]? Previous HN discussion here [1].

Don't get me wrong, AI is at least as game-changing for programming as StackOverflow and Google were back in the day. I use it every day, and it's saved me hours of work for certain specific tasks [2]. But it's simply not a massive 10x force multiplier that some might lead you to believe.

I'll start believing when maintainers of complex, actively developed, and widely used open-source projects (e.g. ffmpeg, curl, openssh, sqlite) start raving about a massive uptick in positive contributions, pointing to a concrete influx of high-quality AI-assisted commits.

[0] https://mikelovesrobots.substack.com/p/wheres-the-shovelware...

[1] https://news.ycombinator.com/item?id=45120517

[2] https://news.ycombinator.com/item?id=45511128

reply

upvote

by jerf55 days ago|

[-]

"Heck, why isn't there a cornucopia of new apps, even trivial ones?"

There is. We had to basically create a new category for them on /r/golang because there was a quite distinct step change near the beginning of this year where suddenly over half the posts to the subreddit were "I asked my AI to put something together, here's a repo with 4 commits, 3000 lines of code, and an AI-generated README.md. It compiles and I may have even used it once or twice." It toned down a bit but it's still half-a-dozen posts a day like that on average.

Some of them are at least useful in principle. Some of them are the same sorts of things you'd see twice a month, only now we can see them twice a week if not twice a day. The problem wasn't necessarily the utility or the lack thereof, it was simply the flood of them. It completely disturbed the balance of the subreddit.

To the extent that you haven't heard about these, I'd observe that the world already had more apps than you could possibly have ever heard about and the bottleneck was already marketing rather than production. AIs have presumably not successfully done much about helping people market their creations.

reply

upvote

by hansmayer55 days ago|

[-]

Well, the LLM industry is not completely without results. We do have ever increasing frequency of outages in major Internet services...Somehow correlates with the AI mandates major tech corps seem to pushing now internally.

reply

upvote

by agumonkey55 days ago|

[-]

Disclaimer: I am not promoting llms.

There was a GitHub PR on the ocaml project where someone crafted a long feature (mac silicon debugging support). The pr was rejected because nobody wanted to read it for it was too long. Seems to me that society is not ready for the width of output generated this way. Which may explain the lack of big visible change so far. But I already see people deploying tiny apps made by Claude in a day.

It's gonna be weird...

reply

upvote

by cageface55 days ago|

[-]

As another example, the MacApps Reddit has been flooded with new apps recently.

reply

upvote

by amirhirsch55 days ago|

[-]

The effect of these tools is people losing their software jobs (down 35% since 2020). Unemployed devs aren’t clamoring to go use AI on OSS.

reply

upvote

by majewsky55 days ago|

[-]

Wasn't most of that caused by that one change in 2022 to how R&D expenses are depreciated, thus making R&D expenses (like retaining dev staff) less financially attractive?

Context: This news story https://news.ycombinator.com/item?id=44180533

reply

upvote

by Bombthecat54 days ago|

[-]

Yes! Even though it's only a tax rule for USA, it somehow applied for the whole world! Thats how mighty the US is!

Or could it be, after the growth and build, we are in maintenance mode and we need less people?

Just food for thought

reply

upvote

by mattmanser53 days ago|

[-]

Yes, because US big tech have regional offices in loads of other countries too, fired loads of those developers at the same time and so the US job market collapse affected everyone.

And since then there's been a constant doom and gloom narrative even before AI started.

reply

upvote

by amirhirsch55 days ago|

[-]

Probably also end of ZIRP and some “AI washing” to give the illusion of progress

reply

upvote

by KetoManx6455 days ago|

[-]

Same thing happened to farmers during the industrial revolution, same thing happened to horse drawn carriage drivers, same thing happened to accountants when Excel came along, mathmaticins, and on and on the list goes. Just part of human peogress.

reply

upvote

by agumonkey55 days ago|

[-]

I keep asking chatgpt when will LLM reach 95% software creation automation, answer is ten years.

reply

upvote

by Bombthecat54 days ago|

[-]

I don't think that long, but yeah, I give it five years.

Two years and 3/4 will be not needed anymore

reply

upvote

by jhayward49 days ago|

[-]

I don't know, I go back and forth a bit. The thing that makes me skeptical is this: where is the training data that contains the experiences and thought processes that senior developers, architects, and engineering managers go through to gain the insight they hold?

reply

upvote

by agumonkey54 days ago|

[-]

I don't have all the variables in (financials of openai debt etc) but a few articles mention that they leverage part of their work to {claude,gemini,chatgpt} code agents internally with good results. it's a first step in a singularity like ramp up.

People think they'll have jobs maintaining AI output but i don't see how maintaining is that harder than creating for a llm able to digest requirements and codebase and iterate until a working source runs.

reply

upvote

by Bombthecat54 days ago|

[-]

I don't think either, people forget that agents are also developing.

Back then, we put all the source code into AI to create things, then we manually put files into context, now it looks for needed files on their own. I think we can do even better by letting AI create a file and API documentation and only read the file when really needed. And select the API and documentation it needs and I bet there is more possible, including skills and MCP on top.

So, not only LLMs are getting better, but also the software using it.

reply

upvote

by spreiti55 days ago|

[-]

I use GitHub Copilot in Intellij with Claude Sonnet and the plan mode to implement complete features without me having to code anything.

I see it as a competent software developer but one that doesn't know the code base.

I will break down the tasks to the same size as if I was implementing it. But instead of doing it myself, I roughly describe the task on a technical level (and add relevant classes to the context) and it will ask me clarifying questions. After 2-3 rounds the plan usually looks good and I let it implement the task.

This method works exceptionally well and usually I don't have to change anything.

For me this method allows me to focus on the architecture and overall structure and delegate the plumbing to Copilot.

It is usually faster than if I had to implement it and the code is of good quality.

The game changer for me was plan mode. Before it, with agent mode it was hit or miss because it forced me to one shot the prompt or get inaccurate results.

reply

upvote

by gcbirzan54 days ago|

[-]

> I see it as a competent software developer but one that doesn't know the code base.

I know what you mean, but the thing I find windsurf (which we moved to from copilot) most useful (except writing opeanapi spec files) is asking it questions about the codebase. Just random minutiae that I could find by grepping or following the code, but would take me more than the 30s-1m it takes it. For reference, this is a monorepo of a bit over 1M LoC (and 800k YAML files, because, did I mention I hate API specs?), so not really a small code base either.

> I will break down the tasks to the same size as if I was implementing it. But instead of doing it myself, I roughly describe the task on a technical level (and add relevant classes to the context) and it will ask me clarifying questions. After 2-3 rounds the plan usually looks good and I let it implement the task.

Here I disagree, sort of. I almost never ask it to do complex tasks, the most time consuming and hardest part is not actually typing out the code, describing it to an AI takes me almost as much time as implementing for most things. One thing I did find very useful is the supertab feature of windsurf, which, at a high level, looks at the changes you started making and starts suggesting the next change. And it's not only limited to repetitive things (like . in vi), if you start adding a parameter to a function, it starts adding it to the docs, to the functions you need below, and starts implementing it.

> For me this method allows me to focus on the architecture and overall structure and delegate the plumbing to Copilot.

Yeah, a coworker said this best, I give it the boring work, I keep the fun stuff for myself.

reply

upvote

by madcocomo55 days ago|

[-]

My experience is that GitHub Copilot works much better in VS Code than Intellij. Now I have to open them together to work on one single project.

reply

upvote

by hansmayer55 days ago|

[-]

Yeah, but what did you produce with it in the end? Show us the end result please.

reply

upvote

by spreiti55 days ago|

[-]

I cannot show it because the code belongs to my employer.

reply

upvote

by hansmayer55 days ago|

[-]

Ah yes of course. But no one asked for the code really. Just show us the app. Or is it some kinda super-duper secret military stuff you are not even supposed to discuss, let alone show.

reply

upvote

by spreiti55 days ago|

[-]

It is neither of these. It's an application that processes data and is not accessible outside of the companies network. Not everything is an app.

I described my workflow that has been a game changer for me, hoping it might be useful to another person because I have struggled to use LLMs for more than a Google replacement.

As an example, one task of the feature was to add metrics for observability when the new action was executed. Another when it failed.

My prompt: Create a new metric "foo.bar" in MyMetrics when MyService.action was successful and "foo.bar.failed" when it failed.

I review the plan and let it implement it.

As you can see it's a small task and after it is done I review the changes and commit them. Rinse and repeat.

I think the biggest issue is that people try to one shot big features or applications. But it is much more efficient to me to treat Copilot as a smart pair programming partner. There you also think about and implement one task after the other.

reply

upvote

by williamcotton55 days ago|

[-]

I've been writing an experimental pipeline-based web app DSL with Claude Code for the last little while in my spare time. Sort of bash-like with middleware for lua, jq, graphql, handlebars, postgres, etc.

Here's an already out of date and unfinished blog post about it: https://williamcotton.com/articles/introducing-web-pipe

Here's a simple todo app: https://github.com/williamcotton/webpipe/blob/webpipe-2.0/to...

Check out the BDD tests in there, I'm quite proud of the grammar.

Here's my blog: https://github.com/williamcotton/williamcotton.com/blob/mast...

It's got an LSP as well with various validators, jump to definitions, code lens and of course syntax highlighting.

I've yet to take screenshots, make animated GIFs of the LSP in action or update the docs, sorry about that!

A good portion of the code has racked up some tech debt, but hey, it's an experiment. I just wanted to write my own DSL for my own blog.

reply

upvote

by danenania55 days ago|

[-]

I know of many experienced and capable engineers working on complex stuff who are driving basically all their development through agents. This includes production level work. This is the norm now in the SV startup world at least.

You don't just YOLO it. You do extensive planning when features are complex, and you review output carefully.

The thing is, if the agent isn't getting it to the point where you feel like you might need to drop down and edit manually, agents are now good enough to do those same "manual edits" with nearly 100% reliability if you are specific enough about what you want to do. Instead of "build me x, y, z", you can tell it to rename variables, restructure functions, write specific tests, move files around, and so on.

So the question isn't so much whether to use an agent or edit code manually—it's what level of detail you work at with the agent. There are still times where it's easier to do things manually, but you never really need to.

reply

upvote

by matt321055 days ago|

[-]

Can you show some example? I feel like there would be streams or YouTube lets plays on this if it was working well

reply

upvote

by sixtyj55 days ago|

[-]

I would like to see it as well. It seems to me that everybody sells shovels only. But nobody haven’t seen gold yet. :)

reply

upvote

by shsush55 days ago|

[-]

The real secret to agent productivity is letting go of your understanding of the code and trusting the AI to generate the proper thing. Very pro agent devs like ghuntley will all say this.

And it makes sense. For most coding problems the challenge isn’t writing code. Once you know what to write typing the code is a drop in the bucket. AI is still very useful, but if you really wanna go fast you have to give up on your understanding. I’ve yet to see this work well outside of blog posts, tweets, board room discussions etc.

reply

upvote

by submain55 days ago|

[-]

> The real secret to agent productivity is letting go of your understanding of the code and trusting the AI to generate the proper thing

The few times I've done that, the agent eventually faced a problem/bug it couldn't solve and I had to go and read the entire codebase myself.

Then, found several subtle bugs (like writing private keys to disk even when that was an explicit instruction not to). Eventually ended up refactoring most of it.

It does have value on coming up with boilerplate code that I then tweak.

reply

upvote

by maplethorpe55 days ago|

[-]

You made the mistake of looking at the code, though. If you didn't look at the code, you wouldn't have known those bugs existed.

reply

upvote

by PunchyHamster55 days ago|

[-]

fixing code now is orders of magnitude cheaper than fixing it in month or two when it hits production.

which might be fine if you're doing proof of concept or low risk code, but it can also bite you hard when there is a bug actively bleeding money and not a single person or AI agent in the house that knows how anything work

reply

upvote

by urig55 days ago|

[-]

That's just irresponsible advice. There is so little actual evidence of this technology being able to produce high quality maintainable code that asking us to trust it blindly is borderline snake-oil peddling.

reply

upvote

by hansmayer55 days ago|

[-]

Not borderline - it is just straight snake-oil peddling.

reply

upvote

by _zoltan_55 days ago|

[-]

yet it works? where have you been for the last 2 years?

calling this snake oil is like when the horse carriage riders were against cars.

reply

upvote

by hansmayer55 days ago|

[-]

I am an early adopter since 2021 buddy. "It works" for trivial use-cases, for anything more complex it is utter crap.

reply

upvote

by Kubuxu55 days ago|

[-]

I don’t see how I would feel comfortable pushing the current output of LLMs into high-stakes production (think SLAs, SRE).

Understanding of the code in these situation is more important than the code/feature existing.

reply

upvote

by danenania55 days ago|

[-]

You can use an agent while still understanding the code it generates in detail. In high stakes areas, I go through it line by line and symbol by symbol. And I rarely accept the first attempt. It’s not very different from continually refining your own code until it meets the bar for robustness.

Agents make mistakes which need to be corrected, but they also point out edge cases you haven’t thought of.

reply

upvote

by Kubuxu55 days ago|

[-]

Definitely agreed, that is what I do as well. At that point you have good understanding of that code, which is in contrast to what the post I responded suggests.

reply

upvote

by shsush55 days ago|

[-]

I agree and am the same. Using them to enhance my knowledge and as well as autocomplete on steroids is the sweet spot. Much easier to review code if im “writing” it line by line.

I think the reality is a lot of code out there doesn’t need to be good, so many people benefit from agents etc.

reply

upvote

by heavyset_go55 days ago|

[-]

> The real secret to agent productivity is letting go of your understanding of the code

This is negligence, it's your job to understand the system you're building.

reply

upvote

by hansmayer55 days ago|

[-]

Not to blow your bubble, but I've seen agents expose Stripe credentials by hardcoding them as text into a react frontend app, so, no kids, do not "let go" of code understanding, lest you want to appear as the next story along the lines of "AI dropped my production database".

reply

upvote

by yonaguska55 days ago|

[-]

This is sarcasm right?

reply

upvote

by PunchyHamster55 days ago|

[-]

I wish, that's dev brain on AI sadly.

We've been unfucking architecture done like that for a month after the dev that had hallucination session with their AI left.

reply

upvote

by shsush55 days ago|

[-]

[dead]

reply

upvote

by danenania55 days ago|

[-]

A lot of that would be people working on proprietary code I guess. And most of the people I know who are doing this are building stuff, not streaming or making videos. But I'm sure there must be content out there—none of this is a secret. There are probably engineers working on open source stuff with these techniques who are sharing it somewhere.

reply

upvote

by matt321055 days ago|

[-]

That’s understandable, I also wouldn’t stream my next idea for everyone to see

reply

upvote

by dmurvihill55 days ago|

[-]

Let’s see it then

reply

upvote

by 55 days ago|

[-]

deleted

reply

upvote

by _zoltan_55 days ago|

[-]

go on reddit and you can see a million of these vibe coded codebases. is that not good enough?

reply

upvote

by hansmayer55 days ago|

[-]

+1 here. Lets see those productivity gains!

reply

upvote

by cachvico55 days ago|

[-]

Here's one - https://apps.apple.com/us/app/pistepal/id6754510927

The app is definitely still a bit rough around the edges but it was developed in breakneck speed over the last few months - I've probably seen an overall 5x acceleration over pre-agentic development speed.

reply

upvote

by PaulHoule55 days ago|

[-]

I use Junie to get tasks done all the time. For instance I had two navigation bars in an application which had different styling and I told it make the second one look like the first and... it made a really nice patch. Also if I don't understand how to use some open source dependency I check the project out and ask Junie questions about it like "How do I do X?" or "How does setting prop Y have the effect of Z?" and frequently I get the right answer right away. Sometimes I describe a bug in my code and ask if it can figure it out and often it does, ask for a fix and often get great results.

I have a React application where the testing situation is FUBAR, we are stuck on an old version of React where tests like enzyme that really run react are unworkable because the test framework can never know that React is done rendering -- working with Junie I developed a style of true unit tests for class components (still got 'em) that tests tricky methods in isolation. I have a test file which is well documented explaining the situation around tests and ask "Can we make some tests for A like the tests in B.test.js, how would you do that?" and if I like the plan I say "make it so!" and it does... frankly I would not be writing tests if I didn't have that help. It would also be possible to mock useState() and company and might do that someday... It doesn't bother me so much that the tests are too tightly coupled because I can tell Junie to fix or replace the tests if I run into trouble.

For me the key things are: (1) understanding from a project management perspective how to cut out little tasks and questions, (2) understanding enough coding to know if it is on the right track (my non-technical boss has tried vibe coding and gets nowhere), (3) accepting that it works sometimes and sometimes it doesn't, and (4) recognizing context poisoning -- sometimes you ask it to do something and it gets it 95% right and you can tell it to fix the last bit and it is golden, other times it argues or goes in circles or introduces bugs faster than it fixes them and as quickly as you can you recognize that is going on and start a new session and mix up your approach.

reply

upvote

by matt321055 days ago|

[-]

Manually styling two similar things the same way is a code smell. Ask the ai to make common components and use them for both instead of brute forcing them to look similar.

reply

upvote

by PaulHoule55 days ago|

[-]

Yeah, I thought about this in that case. I tend to think the way you do to the extent that it is sometimes a source of conflict with other people I work with.

These navbars are similar but not the same, both have a pager but they have other things, like one has some drop downs and the other has a text input. Styled "the same" means the line around the search box looks the same as the lines around the numbers in the pager, and Junie got that immediately.

In the end the patch touched css classes in three lines of one file and added a css rule -- it had the caveat that one of the css classes involved will probably go away when the board finally agrees to make a visual change we've been talking about for most of a year but I left a comment in the first navbar warning about that.

There are plenty of times I ask Junie to try to consolidate multiple components or classes into one and it does that too as directed.

reply

upvote

by commanderkeen0855 days ago|

[-]

https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/... That’s what slots are for

reply

upvote

by matt321055 days ago|

[-]

This is a lot of good reasons not to use it yet IMO

reply

upvote

by 55 days ago|

[-]

deleted

reply