The goal is to have code that corresponds to a coherent conceptual model for whatever you are doing, and the resulting codebase should clearly reflect the design of the system. Once I started thinking about code in these terms, I realized that questions like "DRY vs YAGNI" were not meaningful.
It's not about copying identical code twice, it's about refactoring similar code into a shared function once you have enough examples to be able to see what the shared core is.
I too often see junior engineers (and senior data scientists…) write code procedurally, with giant functions and many, many if statements, presumably because in their brain they’re thinking about “1st I do this if this, 2nd I do that if that, etc”.
Yet again, understanding when to follow a rule of thumb or not is another thing that separates the junior from the senior.
Exactly.
Instead, I tend to ask: if I change this code here, will I always also need to change it over there?
Copy-paste is good as long as I'm just repeating patterns. A for loop is a pattern. I use for loops in many places. That doesn't mean I need to somehow abstract out for loops because I'm repeating myself.
But if I have logic that says that button_b.x = button_a.x + button_a.w + padding, then I should make sure that I only write that information down once, so that it stays consistent throughout the program.
Your example is a pretty good one. In most practical applications, you do not want to be setting button x coordinates manually. You want to use a layout manager, like CSS Flexbox or Jetpack Compose's Row or Java Swing's FlowLayout, which takes in a padding and a direction for a collection of elements and automatically figures out where they should be placed. But if you only have one button, this is overkill. If you only have two buttons, this is overkill. If you have 3 buttons, you should start to realize this is the pattern and reach for the right abstraction. If you get to 10 buttons, you'll realize that you need to arrange them in 2D as well and handle how they grow & shrink as you resize the window, and there's a good chance you need a more powerful abstraction.
IMO, this is the exact (and arguably only) question to ask.
If you have two copies of some piece of code, and you can reasonably say that if you ever want to update one copy then you will almost certainly want to update the other copy as well, then it's probably a good idea to try to merge them and keep that logic in some centralized place.
On the other hand, if you have three copies of the same piece of code, but they kind of just "happen to" be identical and it's completely plausible that any one of the copies will be modified in the future for reasons which won't affect the other copies, maybe keeping them separate is a good idea.
And of course, it's sometimes worth it to keep two or more different copies which do share the same "reason to change". This is especially clear when you have the copies in different repositories, where making the code "DRY" would mean introducing dependencies between repositories which has its own costs.
An expensive consultant suggested creating pristine implementation and then writing a rule layer that would modify things as needed and deploying the whole thing as a pile of lamdba functions.
I copy pasted the protocol consumer file per producer and made all the necessary changes with proper documentation and mocks. Got it working quickly and we could add new ones without affecting.
If I'd try to keep it DRY, i think it would be a leaky mess.
Well, turns out that 3 of the APIs changed the way they return the data, so instead of separating the logic, someone kept adding a bunch of if statements into a single function in order to avoid repeating the code in multiple places. It was a nightmare to maintain and I ended up completely refactoring it, and even tho some of the code was repeated, it was much easier to maintain and accommodate to the API changes.
Having identical logic in multiple places (even only 2) is a big contributor to technical debt, since if you're searching for something and you find it and fix it /once/ we often thing of the job as done. Then the "there is still a bug and I already fixed that" confusion is avoided by staying DRY.
Mostly at the massive switch statements and 1000 line's of flow control logic that end up embedded someplace where they really dont belong in the worst cases.
Sometimes four or five doesn’t seem too bad, sometimes two is too many
If two pieces of code use the same functionality by coincidence but could possibly evolve differently then don't refactor. Don't even refactor if this happens three, four, or five times. Because even if the code may be identical today the features are not actually identical.
But if you have two uses of code that actually semantically identical and will assuredly evolve together then go ahead and refactor to remove duplication.
Extract a method or object if it's something that feels conceptually a "thing" even if it has only one use. Most tools to DRY your code also help by providing a bit of encapsulation that do a great job of tidying things up to force you to think about "should I be letting this out of domain stuff leak in here?"
DRY is one step removed from that goal and people use it to make very unmaintainable code because they confuse any repeated code with unmaintainability. (or their theory that some day we might want to repeat this code so we might as well pre-DRY it)
The result is often a horrendous complex mess. Imagine a cookbook with a cookie recipe that resided on 47 different pages (40 of which were pointers on where to find other pointers on where to find other pointers on where to find a step) in attempts to never write the same step twice in the whole book or your planned sequels in a 20 volume set.
The problem is zealots. Zealotry doesn't work for indeterminate things that require judgement like "code quality" or "maintainability", but a simple rule like "don't repeat yourself" is easy for a zeal. They take a rule and shut down any argument with "because the rule!"
If you're arguing about code quality and maintainability without one sentence rules then you actually have to make arguments. If the rule is your argument there's no discussion only dogma.
As a result? Easy to distill rules spread fast, breed zealots, and result in bad code.