Don't get me wrong, AI is at least as game-changing for programming as StackOverflow and Google were back in the day. I use it every day, and it's saved me hours of work for certain specific tasks [2]. But it's simply not a massive 10x force multiplier that some might lead you to believe.
I'll start believing when maintainers of complex, actively developed, and widely used open-source projects (e.g. ffmpeg, curl, openssh, sqlite) start raving about a massive uptick in positive contributions, pointing to a concrete influx of high-quality AI-assisted commits.
[0] https://mikelovesrobots.substack.com/p/wheres-the-shovelware...
There is. We had to basically create a new category for them on /r/golang because there was a quite distinct step change near the beginning of this year where suddenly over half the posts to the subreddit were "I asked my AI to put something together, here's a repo with 4 commits, 3000 lines of code, and an AI-generated README.md. It compiles and I may have even used it once or twice." It toned down a bit but it's still half-a-dozen posts a day like that on average.
Some of them are at least useful in principle. Some of them are the same sorts of things you'd see twice a month, only now we can see them twice a week if not twice a day. The problem wasn't necessarily the utility or the lack thereof, it was simply the flood of them. It completely disturbed the balance of the subreddit.
To the extent that you haven't heard about these, I'd observe that the world already had more apps than you could possibly have ever heard about and the bottleneck was already marketing rather than production. AIs have presumably not successfully done much about helping people market their creations.
There was a GitHub PR on the ocaml project where someone crafted a long feature (mac silicon debugging support). The pr was rejected because nobody wanted to read it for it was too long. Seems to me that society is not ready for the width of output generated this way. Which may explain the lack of big visible change so far. But I already see people deploying tiny apps made by Claude in a day.
It's gonna be weird...
Context: This news story https://news.ycombinator.com/item?id=44180533
Or could it be, after the growth and build, we are in maintenance mode and we need less people?
Just food for thought
And since then there's been a constant doom and gloom narrative even before AI started.
Two years and 3/4 will be not needed anymore
People think they'll have jobs maintaining AI output but i don't see how maintaining is that harder than creating for a llm able to digest requirements and codebase and iterate until a working source runs.
Back then, we put all the source code into AI to create things, then we manually put files into context, now it looks for needed files on their own. I think we can do even better by letting AI create a file and API documentation and only read the file when really needed. And select the API and documentation it needs and I bet there is more possible, including skills and MCP on top.
So, not only LLMs are getting better, but also the software using it.
I see it as a competent software developer but one that doesn't know the code base.
I will break down the tasks to the same size as if I was implementing it. But instead of doing it myself, I roughly describe the task on a technical level (and add relevant classes to the context) and it will ask me clarifying questions. After 2-3 rounds the plan usually looks good and I let it implement the task.
This method works exceptionally well and usually I don't have to change anything.
For me this method allows me to focus on the architecture and overall structure and delegate the plumbing to Copilot.
It is usually faster than if I had to implement it and the code is of good quality.
The game changer for me was plan mode. Before it, with agent mode it was hit or miss because it forced me to one shot the prompt or get inaccurate results.
I know what you mean, but the thing I find windsurf (which we moved to from copilot) most useful (except writing opeanapi spec files) is asking it questions about the codebase. Just random minutiae that I could find by grepping or following the code, but would take me more than the 30s-1m it takes it. For reference, this is a monorepo of a bit over 1M LoC (and 800k YAML files, because, did I mention I hate API specs?), so not really a small code base either.
> I will break down the tasks to the same size as if I was implementing it. But instead of doing it myself, I roughly describe the task on a technical level (and add relevant classes to the context) and it will ask me clarifying questions. After 2-3 rounds the plan usually looks good and I let it implement the task.
Here I disagree, sort of. I almost never ask it to do complex tasks, the most time consuming and hardest part is not actually typing out the code, describing it to an AI takes me almost as much time as implementing for most things. One thing I did find very useful is the supertab feature of windsurf, which, at a high level, looks at the changes you started making and starts suggesting the next change. And it's not only limited to repetitive things (like . in vi), if you start adding a parameter to a function, it starts adding it to the docs, to the functions you need below, and starts implementing it.
> For me this method allows me to focus on the architecture and overall structure and delegate the plumbing to Copilot.
Yeah, a coworker said this best, I give it the boring work, I keep the fun stuff for myself.
I described my workflow that has been a game changer for me, hoping it might be useful to another person because I have struggled to use LLMs for more than a Google replacement.
As an example, one task of the feature was to add metrics for observability when the new action was executed. Another when it failed.
My prompt: Create a new metric "foo.bar" in MyMetrics when MyService.action was successful and "foo.bar.failed" when it failed.
I review the plan and let it implement it.
As you can see it's a small task and after it is done I review the changes and commit them. Rinse and repeat.
I think the biggest issue is that people try to one shot big features or applications. But it is much more efficient to me to treat Copilot as a smart pair programming partner. There you also think about and implement one task after the other.
Here's an already out of date and unfinished blog post about it: https://williamcotton.com/articles/introducing-web-pipe
Here's a simple todo app: https://github.com/williamcotton/webpipe/blob/webpipe-2.0/to...
Check out the BDD tests in there, I'm quite proud of the grammar.
Here's my blog: https://github.com/williamcotton/williamcotton.com/blob/mast...
It's got an LSP as well with various validators, jump to definitions, code lens and of course syntax highlighting.
I've yet to take screenshots, make animated GIFs of the LSP in action or update the docs, sorry about that!
A good portion of the code has racked up some tech debt, but hey, it's an experiment. I just wanted to write my own DSL for my own blog.
You don't just YOLO it. You do extensive planning when features are complex, and you review output carefully.
The thing is, if the agent isn't getting it to the point where you feel like you might need to drop down and edit manually, agents are now good enough to do those same "manual edits" with nearly 100% reliability if you are specific enough about what you want to do. Instead of "build me x, y, z", you can tell it to rename variables, restructure functions, write specific tests, move files around, and so on.
So the question isn't so much whether to use an agent or edit code manually—it's what level of detail you work at with the agent. There are still times where it's easier to do things manually, but you never really need to.
And it makes sense. For most coding problems the challenge isn’t writing code. Once you know what to write typing the code is a drop in the bucket. AI is still very useful, but if you really wanna go fast you have to give up on your understanding. I’ve yet to see this work well outside of blog posts, tweets, board room discussions etc.
The few times I've done that, the agent eventually faced a problem/bug it couldn't solve and I had to go and read the entire codebase myself.
Then, found several subtle bugs (like writing private keys to disk even when that was an explicit instruction not to). Eventually ended up refactoring most of it.
It does have value on coming up with boilerplate code that I then tweak.
which might be fine if you're doing proof of concept or low risk code, but it can also bite you hard when there is a bug actively bleeding money and not a single person or AI agent in the house that knows how anything work
calling this snake oil is like when the horse carriage riders were against cars.
Understanding of the code in these situation is more important than the code/feature existing.
I think the reality is a lot of code out there doesn’t need to be good, so many people benefit from agents etc.
Agents make mistakes which need to be corrected, but they also point out edge cases you haven’t thought of.
This is negligence, it's your job to understand the system you're building.
We've been unfucking architecture done like that for a month after the dev that had hallucination session with their AI left.
The app is definitely still a bit rough around the edges but it was developed in breakneck speed over the last few months - I've probably seen an overall 5x acceleration over pre-agentic development speed.
I have a React application where the testing situation is FUBAR, we are stuck on an old version of React where tests like enzyme that really run react are unworkable because the test framework can never know that React is done rendering -- working with Junie I developed a style of true unit tests for class components (still got 'em) that tests tricky methods in isolation. I have a test file which is well documented explaining the situation around tests and ask "Can we make some tests for A like the tests in B.test.js, how would you do that?" and if I like the plan I say "make it so!" and it does... frankly I would not be writing tests if I didn't have that help. It would also be possible to mock useState() and company and might do that someday... It doesn't bother me so much that the tests are too tightly coupled because I can tell Junie to fix or replace the tests if I run into trouble.
For me the key things are: (1) understanding from a project management perspective how to cut out little tasks and questions, (2) understanding enough coding to know if it is on the right track (my non-technical boss has tried vibe coding and gets nowhere), (3) accepting that it works sometimes and sometimes it doesn't, and (4) recognizing context poisoning -- sometimes you ask it to do something and it gets it 95% right and you can tell it to fix the last bit and it is golden, other times it argues or goes in circles or introduces bugs faster than it fixes them and as quickly as you can you recognize that is going on and start a new session and mix up your approach.
These navbars are similar but not the same, both have a pager but they have other things, like one has some drop downs and the other has a text input. Styled "the same" means the line around the search box looks the same as the lines around the numbers in the pager, and Junie got that immediately.
In the end the patch touched css classes in three lines of one file and added a css rule -- it had the caveat that one of the css classes involved will probably go away when the board finally agrees to make a visual change we've been talking about for most of a year but I left a comment in the first navbar warning about that.
There are plenty of times I ask Junie to try to consolidate multiple components or classes into one and it does that too as directed.