upvote
> You look at the code again and there is so much code spaghetti is an understatement it’s the Chinese wall.

I don't understand this. A large codebase should be a collection of small codebases, just like a large city is a collection of small cities. There is a map and you zoom into your local area and work within that scope. You don't need to know every detail of NYC to get a cup of coffee.

Its your responsibility to build a sane architecture that is maintainable. AI doesn't prevent you from doing that, and in fact it can help you do so if you hold the tool correctly.

reply
I don't think that's a useful mental model for software in general.

There are software that works like this (e.g. a website's unrelated pages and their logic), but in general composing simple functions can result in vastly non-proportional complexity. (The usual example is having a simple loop, and a simple conditional, where you can easily encode Goldbach or Collatz)

E.g. you write a runtime with a garbage collector and a JIT compiler. What is your map? You can't really zoom in on the district for the GC, because on every other street there you have a portal opening to another street on the JIT district, which have portals to the ISS where you don't even have gravity.

And if you think this might be a contrived example and not everyone is writing JIT-ted runtimes, something like a banking app with special logging requirements (cross cutting concerns) sits somewhere between these two extremes.

reply
No but the speed up of AI is giving up control, and then you notice these issues too late.
reply
I find it interesting that this outcome is a surprise. I don't want this to sound smug, I'm genuinely curious what the initial expectations are and where they come from.

They seem to be different for LLMs, because would anyone be surprised if they handed summary feature descriptions to some random "developer" you've ever only met online, and got back an absolute dung pile of half-broken implementation?

For some reason, people seem to expect miracles from some machine that they would not expect of other humans, especially not ones with a proven penchant for rambling hallucinations every once in a while.

I'd like to know, ideally from people who've been there, why they think that is. Where does the trust come from?

reply
I've never relied on an LLM to build a large section of code but I can see why people might think it is worth a try. It is incredible for finding issues in the code that I write, arguably its best use-case. When I let it write a function on its own, it is often perfect and maybe even more concise and idiomatic than I would have been able to produce. It is natural to extrapolate and believe that whatever intelligence drives those results would also be able to handle much more.

It is surprising how bad it is at taking the lead given how effective it is with a much more limited prompt, particularly if you buy in to all the hype that it can take the place of human intelligence. It is capable of applying a incredible amount of knowledge while having virtually no real understanding of the problem.

reply
LLMs do deliver "miracles", in certain cases, if you've experienced it and have been blown away by their output (one shot functional app from a well manufactured prompt, new feature added flawlessly on a complicated existing codebase, etc.), it can be tempting to reajust your expectations and think this will work consistently and at a much larger scale.

They can assimilate 100s of thousands of tokens of context in few seconds/minutes and do exceptional pattern matching beyond what any human can do, that's a main factor in why it looks like "miracles" to us. When a model actually solves a long standing issue that was never addressed due to a lack of funding/time/knowledge, it does feel miraculous and when you are exposed to this a couple of times it's easy to give them more trust, just like you would trust someone who provided you a helping hand a couple of times more than at total stranger.

reply
Thanks, that makes sense.

I suppose it's difficult to account for the inconsistency of something able to perform up to standard (and fast!) at one time, but then lose the plot in subtle or not-so-subtle ways the next.

We're wired to see and treat this machine as a human and therefore are tempted to trust it as if it were a human who demonstrated proficiency. Then we're surprised when the machine fails to behave like one.

I have to say, I'm still flabbergasted by the willingness to check out completely and not even keep on top of, and a mental model of, what gets produced. But the mind is easily tempted into laziness, I presume, especially when the fun part of thinking gets outsourced, and only the less fun work of checking is left. At least I can extrapolate this from the difference I know from myself between coding and reviewing.

reply
Probably same reason people expected outsourcing to the cheapest firm in India would work: wishful thinking. People wanted it to work and therefore deluded themselves.

Or really the same reason people fall for get rich quick schemes.

reply
I think this is true, but i imagine there's a workflow solution to this which isnt to drop AI.

Eg., treating AI code generated as immediately legacy, with tight encapsulation boundaries, well-defined interfaces etc. And integrating in a more manual workflow.

There's a range from single-shot prompts to inline code generation, that will make more sense depending on the problem and where in the code base it is.

Single-shot stuff is going to make more sense for a protyping phase with extensive spec iteration. Once that prototype is in place, you then prob want to drop down into per-module/per-file generation, and be more systematic -- always maintaining a reasonably good mental model at this layer.

reply
That workflow just sounds exhausting to me. Would I always need to consider how much of a blast radius my AI-generated code might have? Sounds like there’s so much extra management going into these micro decisions that it ultimately defeats the purpose of generating code altogether.

I could see value in using it during the prototyping phase, but wouldn’t like to work like you described for a serious project for end users.

reply
And you have discovered the job of managers! There has always been a lot of hate for managers. Wonder if the robots hate us just as much? (I often feel a weird guilt when I tell an agent to do something I know I am going to throw away but will serve as an interesting exploration...I know if I did that to a human they would be pissed...)
reply
IMO the hate has always been for clueless managers, especially clueless yet demanding managers. Managing an LLM for coding is different, try being clueless and demanding and see how far you get.
reply
> I know if I did that to a human they would be pissed

You call it a hackathon. You tell the human to stay up the whole night. In exchange for the extra hours worked you provide some pizza.

reply
I just don't like to type code anymore. If I can accomplish the same by describing the code, and get the same results as if I typed it myself, I'll opt for not typing so damn much. I've done so much typing in my career, that typing ~80% less to get the same results, makes a pretty big difference in how likely I am to set out to accomplish something.

I care more about code quality now, because typing no longer limits if I feel like it's worth to refactor something or not.

reply
> treating AI code generated as immediately legacy, with tight encapsulation boundaries, well-defined interfaces etc.

This is good advice regardless whether you're using AI or not, yet in real life "let's have well-defined boundaries and interfaces" always loses against "let's keep having meetings for years and then ducttape whatever works once the situation gets urgent".

reply
Were you auto-committing everything without reading the generated code? and if you read it but didn't understand it why not just ask for detailed comments for each output? Knowing that a larger codebase causes it to struggle means the output needs to be increasingly scrutinized as it becomes more complex.
reply
I don’t think it’s about what the code does. I think it’s more about how the code fits in its whole context. How useful it is in solving the overarching problem (of the whole software). How well does it follow the paradigm of the platform and the codebase.

You can have very good diffs and then found that the whole codebase is a collection of slightly disjointed parts.

reply
Yep. I’m approaching the same problem from a different angle: writing code fast means you aren’t being thoughtful about the features you’re building. I started realizing that after I had kids and spent more time thinking about code than writing it and it really improved the quality of my work: https://bower.sh/thinking-slow-writing-fast
reply
I haven't had the chance to work on large codebases, but isn't it possible to somehow adapt the workflow of Working Effectively with Legacy Code, building islands of higher quality code, using the AI to help reconstruct developer intention and business rules and building seams and unit tests for the target modules?

AI doesn't necessarily have to increase your throughput, it can also serve as a flexible exploration and refactoring tool that will support either later hand crafter code or agentic implementation.

reply
I don’t think you truly captured the worst part:

There comes a realization, to many engineer’s horror, that AI won’t be able to save them and they will have to manually comprehend and possibly write a ton of code by hand to fix major issues, all while upper management is breathing down their back furious as to why the product has become a piece of shit and customers are leaving to competitors.

The engineers who sink further into denial thrash around with AI, hoping they are a few prompts or orchestrations away from everything be fixed again.

But the solution doesn’t come. They realize there is nothing they can do. It’s over.

reply