upvote
Meanwhile, in the grandparent comment:

> Somehow 90% of these posts don't actually link to the amazing projects that their author is supposedly building with AI.

You are in the 90%.

reply
It's fine at adding features on a non-vibecoded 100kloc codebase that you somewhat understand. It's when you're vibecoding from scratch that things tend to spin out at a certain point.

I am sure there are ways to get around this sort of wall, but I do think it's currently a thing.

reply
You just have another agent/session/context refactor as you go.

I built a skribbl.io clone to use at work. We like to play eod on Friday as a happy hour and when we would play skribbl.io we would try to get screencaps of the stupid images we were drawing but sometimes we would forget. So I said I'd use claude to build our own skribbl.io that would save the images.

I was definitely surprised that claude threaded the needle on the task pretty easily, pretty much single shot. Then I continued adding features until I had near parity. Then I added the replay feature. After all that I looked at the codebase... pretty much a single big file. It worked though, so we played it for the time being.

I wanted to fix some bugs and add more features, so I checked out a branch and had an agent refactor first. I'd have a couple context/sessions open and I'd one just review, the other refactored, and sometimes I'd throw a third context/session in there that would just write and run tests.

The LLM will build things poorly if you let it, but it's easy to prompt it another way and even if you fail that and back yourself into a corner, it's easy to get the agents to refactor.

It's just like writing tests, the llms are great at writing shitty useless tests, but you can be specific with your prompt and in addition use another agent/context/session to review and find shitty tests and tell you why they're shitty or look for missing tests, basically keep doing a review, then feed the review into the agent writing the tests.

reply
I’m using it in a >200kloc codebase successfully, too. I think a key is to work in a properly modular codebase so it can focus on the correct changes and ignore unrelated stuff.

That said, I do catch it doing some of the stuff the OP mentioned— particularly leaving “backwards compatibility” stuff in place. But really, all of the stuff he mentions, I’ve experienced if I’ve given it an overly broad mandate.

reply
Yes, this is my experience as well. I've found the key is having the AI create and maintain clear documentation from the beginning. It helps me understand what it's building, and it helps the model maintain context when it comes time to add or change something.

You also need a reasonably modular architecture which isn't incredibly interdependent, because that's hard to reason about, even for humans.

You also need lots and lots (and LOTS) of unit tests to prevent regressions.

reply