Do what you gotta do, but most of my job for the past decade has been replacing data pipelines that randomly duplicate data with pipelines that solve duplication at the source, and my users strongly prefer it.
Of course, a lot of one-off data analysis has no rules but get a quick answer that no one will complain about!