I am using them in projects with >100kloc, this is not my experience.
at the moment, I am babysitting for any kloc, but I am sure they will get better and better.
> Somehow 90% of these posts don't actually link to the amazing projects that their author is supposedly building with AI.
You are in the 90%.
I am sure there are ways to get around this sort of wall, but I do think it's currently a thing.
I built a skribbl.io clone to use at work. We like to play eod on Friday as a happy hour and when we would play skribbl.io we would try to get screencaps of the stupid images we were drawing but sometimes we would forget. So I said I'd use claude to build our own skribbl.io that would save the images.
I was definitely surprised that claude threaded the needle on the task pretty easily, pretty much single shot. Then I continued adding features until I had near parity. Then I added the replay feature. After all that I looked at the codebase... pretty much a single big file. It worked though, so we played it for the time being.
I wanted to fix some bugs and add more features, so I checked out a branch and had an agent refactor first. I'd have a couple context/sessions open and I'd one just review, the other refactored, and sometimes I'd throw a third context/session in there that would just write and run tests.
The LLM will build things poorly if you let it, but it's easy to prompt it another way and even if you fail that and back yourself into a corner, it's easy to get the agents to refactor.
It's just like writing tests, the llms are great at writing shitty useless tests, but you can be specific with your prompt and in addition use another agent/context/session to review and find shitty tests and tell you why they're shitty or look for missing tests, basically keep doing a review, then feed the review into the agent writing the tests.
That said, I do catch it doing some of the stuff the OP mentioned— particularly leaving “backwards compatibility” stuff in place. But really, all of the stuff he mentions, I’ve experienced if I’ve given it an overly broad mandate.
You also need a reasonably modular architecture which isn't incredibly interdependent, because that's hard to reason about, even for humans.
You also need lots and lots (and LOTS) of unit tests to prevent regressions.
Surely it depends on the design. If you have 10 10kloc modular modules with good abstractions, and then a 10k shell gluing them together, you could build much bigger things, no?
Then again the problem is that the public has learned nothing from the theranos and WeWorks and even more of a problem is that the vc funding works out for most of these hype trains even if they never develop a real business.
The incentives are fucked up. I’d not blame tech enthusiasts for being too enthusiastic
Might as well talk about how AI will invent sentient lizards which will replace our computers with chocolate cake.
Thinking usually happens inside your head.
What is your point?
If you’re trying to say that they should have kept their opinion to themselves, why don’t you do the same?
Edit: tone down the snark
Holy Spiderman what is your point? That if someone says something dumb I can never challenge them nor ask them to substantiate/commit?
> tone down the snark
It's amazing to me that the neutral observation "thinking happens in your head" is snarky. Have you ever heard the phrase "tone police"?
If you spend a couple of years with an LLM really watching and understanding what it’s doing and learning from mistakes, then you can get up the ladder very quickly.
A "basic" understanding in critical domains is extremely dangerous and an LLM will often give you a false sense of security that things are going fine while overlooking potential massive security issues.
All I could think was, "good luck" and I certainly hope their app never processes anything important...
I don't feel like most providers keep a model for more than 2 years. GPT-4o got deprecated in 1.5 years. Are we expecting coding models to stay stable for longer time horizons?
If the person who is liable for the system behavior cannot read/write code (as “all coders have been replaced”), does Anthropic et al become responsible for damages to end users for systems its tools/models build? I assume not.
How do you reconcile this? We have tools that help engineers design and build bridges, but I still wouldn’t want to drive on an “autonomously-generated bridge may contain errors. Use at own risk” because all human structural engineering experts have been replaced.
After asking this question many times in similar threads, I’ve received no substantial response except that “something” will probably resolve this, maybe AI will figure it out