Hard disagree. When you've had to chase through a change in untold and actually unknown numbers of duplications of code in different permutations and fix them because they are all on fire simultaneously, you'd disagree too. A bad abstraction would at least have had one fire in one place.
The other end of this spectrum is dealing with the architecture astronaut's up-front abstraction. Totally overengineered for solving the initial requirements, but then constantly needing new hacks to make it cope with new requirements as they come up in the normal course of work.
That's why there's a balance in there, it's somewhere between "always duplicate code even when you know a lot about the problem" and "always write abstractions even when you know very little about the problem."
Wouldn't most large codebases with poor abstractions just have engineers engineer around them with their own solutions? In a large enough codebase you'd have both the bad abstractions and all the not-quite-duplicate implementations ignoring the bad abstraction?
I'm using bad here loosely, it could be buggy, incorrect, incomplete, insufficient and more; while being owned by someone or some team that's a challenge to work with for various reasons (overloaded, under-resourced, overbearing, etc., etc.).
Obviously, yes. But it is my experience that this happens more slowly and that API invocations that break when the abstraction is changed are much easier to identify than broader duplicated patterns of code that span many lines and subtly diverge.
And even then those divergences are better because each wrapper around the abstraction is documenting the problem with it. But the abstraction can generally be replaced by one with the same API surface.
(Even if you take into account the fact that any API behaviour ultimately gets relied upon even if undocumented. Which is true.)
To be fair my experience is that of a freelancer and contractor who arrives trying to fix things that have been through many such hands. And I think if these developers had it drummed into their head that any attempt at abstraction would be better than copy and paste, these situations would be more knowable.
When that happens there's a major engineering leadership failure currently in progress, even if engineering leadership isn't aware of it.
EDIT: LLM or not, this is still true. If you have LLMs pumping out tons of duplicate code you're wasting tokens, and probably more importantly wasting engineer hours reviewing duplicate code.
In some cases it might be a fair trade, in moderation. In general it's certainly wrong.
That's true only for "good" abstractions. Bad abstractions will often require you to change code in all the places using it, requiring you to understand how all of them work and what are their requirements, _all at the same time_.
A uses the abstraction, but finds the API doesn't work. Fixes that.
That causes B to have to make a tracking change which induces a bug. B realizes that the API isn't quite right. Fixes it.
That causes A and C to make tracking changes. These induce more bugs. C fixes the abstraction to avoid these cases.
This breaks A and B so they decline to update.
And so on. This is what a bad abstraction looks like. API "fixes" bouncing around the code as they reflect off of the bad abstraction.
The security bugs were all in features I never wanted.
A bit of simple duplication would have been golden.
On the contrary: that's precisely what a bad abstraction would not offer.
Instead it would spread its assumptions to different parts of the system, as every caller, sub-service, etc. would have to change shape to fit in that abstraction's box, however unnatural it is (and we know it would be unnatural, because we already said it's a bad abstraction).
Abstraction is not the same as encapsulation.
But so does duplication, in practice, and it diverges more as it does.
But any abstraction ends up with a signature and a name that can quickly be found in code.
The risk of a long-lived duplication losing its shape and being hard to find is much greater. Especially if the code is going through multiple hands.
I once had to pick up a project — a working, fully functional website. I could see, pretty clearly, the work of several people. All but one of them terrible.
The one was a diligent developer who was fully wrong in their abstraction (in fact significantly) but was consistent in how they used it.
The rest had simply worked around that code, copied and re-copied their own modified duplications and let things lose any shape. The result was error-prone stuff.
Clearly either the budget (or the client's capriciousness — a separate issue and arguably the bigger one) scared away the one guy, who I actually wanted to talk to but could not track down. He possibly had the origin story, and I wanted to know why his particular abstraction, which was at odds with the framework, was there. It was good code in the wrong shape, and it clearly used to do more, and that is interesting.
All the expedient people who had decided to avoid his code and just patch in duplicated pieces around it were the problem. There was no form to their solution at all. And that had clearly happened over some time (because you could see several different code styles)
Abstractions are a form of coupling, and coupling can be good, if the components are truly interdependent, and have a well defined domain. The problem with most abstractions, and I’ve seen this time and time again, is that they become brittle, are over used, and the cost of maintaining them grows exponentially with the size of the code base. With the reason for the cost ballooning being the system has disparate components that look interrelated but are absolutely not. Once you give someone a hammer they tend to assume everything is a nail.
The biggest problem, IMHO, is that abstractions are often used where a pattern would be more effective, easier to maintain, and easier to iterate on. And the primary difference between a pattern and an abstraction really comes down to coupling. Patterns remain decoupled, abstractions are tightly coupled.
And to be clear, I will and do use abstractions, when and where they make sense. But only after clear patterns emerge, and it’s been proven that components are truly coupled.
I will gladly die on the hill, that abstractions are measurably worse than duplication an overwhelming amount of the time. They’re often nothing more than a form of premature optimization.
It all depends on the amount of duplication and the complexity of the abstraction. Like you said, no generic advice is possible that clearly separates it into "abstract here" and "duplicatehere".
In your example it sounds like we aren't talking about 2-3 places where duplicate code existed that just needed to be refactored into separate units. It sounds more like a complete disregard for abstraction to move on quickly.
If you see duplicate code and have a good understanding how to solve that then it's totally a good thing. The real problem comes in if you add abstractions without knowing wether they will hold up. And this is where the blogpost comes in. In my opinion 2 duplicates are fine, at 3 you should start thinking or implementing an abstraction if you have a good understanding of the code and usecases.
Exactly. The abstraction purists are not working in the messy, dead line driven real world.
Write everything twice quickly becomes write everything 4 times once a new change appears, just as quickly as it becomes write everything 8 times, and so on.
I'm afraid there's no sensible soundbite developers can follow blindly.
That's a good problem to have. Getting to 4 or 8 or 12, and then pruning it to 1 or maybe 2 or 3 clearly different cases, is better than shoehorning multiple cases into the wrong abstraction, having everything that speaks with them coupled to that and dancing around their assumptions, and then having to untangle that.
Duplicated code is by definition LESS coupled.
Having a lot of if/else in your code is definitely a cost. My weakness isn’t so much the libraries and APIs, but the actual binary - once I have a service that does A very well, and I run into needing A’ I mostly just add in a config line “op_mode = A|A’” and have the else/if chains in the server driving code. Moreso for CLIs that I use myself than production services, but I have added tunables for consistency and replication to datastores to allow new use cases and expand my footprint in the data center.