upvote
Just like that place that's so crowded nobody goes there anymore.
reply
deleted
reply
I've just tried this, and the most touched files are also the most irrelevant or boring files (auto generated, entry-point of the service etc.) in my tests.
reply
Yeah same thing happens with lockfiles and CI configs. You end up filtering out half the list before it tells you anything useful.
reply
I just tried it too and it basically just flagged a handful of 1500+ line files which probably ought to be broken up eventually but arent causing any serious problems.
reply
If it's (like in my case) dependency management, localization or config files, breaking them up will likely only cause more issues. Make sure that it's an actual improvement before breaking things up.
reply
deleted
reply
This command needs a warning. Using this command and drawing too many conclusions from it, especially if you’re new, will make you look stupid in front of your team mates.

I ran this on the repo I have open and after I filtered out the non code files it really can only tell me which features we worked on in the last year. It says more about how we decided to split up the features into increments than anything to do with bugs and “churn”.

reply
Good thing that the article contains that warning, then.
reply
Not really strong enough in a post about what to do in a codebase you’re not familiar with. In that situation you’re probably new to the team and organisation and likely to get off on the wrong foot with people if you assume their code “hurts”.
reply
The post is “here’s what I do”, not “here’s what you should do and then confront the team about the results.” It’s just showing you a quick way to get some insights. It’s not even guaranteeing it’s accurate, just showing you some things you might be able to draw some quick conclusions on.

I’m not sure why HN attracts this need to poke holes in interesting observations to “prove” they aren’t actually interesting.

reply
It’s a bit reductive to call it poking holes. The author shared his valuable knowledge and I shared mine.
reply
deleted
reply
I found it interesting, that Git itself has built in similarity notion... when it packs objects, it groups files by path+size, runs delta cmpression to find which are close.

Very different from just counting commits - https://vectree.io/c/delta-compression-heuristics-and-packfi...

reply
These commands are just about what files to start looking at to understand new codebase.
reply
Better for people to know they're just blindly copying tools and parroting their output as if it's automatically meaningfully. Any warning against that should be built into the individual, for their own sake
reply
Right? Some of these comments feel “you gave me commands to run and I should be able to turn my brain off to interpret the outputs”. These aren’t newbie commands so the assumption would be that you kinda know what you’re doing at least a little bit. If not, then don’t run them… similar to how you should approach all commands/things from the internet
reply
Plotting Churn against Complexity is far more useful than merely churn.

It shows places that are problematic much better. High churn, low complexity: fine. Its recognized and optimizef that this is worked on a lot (e.g. some mapping file, a dsl, business rules etc). Low churn high complexity: fine too. Its a mess, but no-one has to be there. But both? Thats probably where most bugs originate, where PRs block, where test coverage is poor and where everyone knows time is needed to refactor.

In fact, quite often I found that a teams' call "to rewrite the app from scratch" was really about those few high-churn-high-complexity modules, files or classes.

Complexity is a deep topic, but even simple checks like how nested smt is, or how many statements can do.

reply
Maybe it's a start to find conflict-prone regions ?

otherwise you're right, it could be a long linear list of appends where people are happy to contribute.

reply
Yes. Because the fear is butressed with necessity. You have to edit the file, and so does everyone else and that is a recipe for a lot of mess. I can think back over years of files like this. Usually kilolines of impossible to reason about doeverything.
reply
Definitely not in my experience. The most changed are the change logs, files with version numbers and readmes. I don't think anyone is afraid of keeping those up to date.
reply
pom.xml and package.json came up on couple of separate projects I ran the commands on. Which makes sense because the versions get bumped rather frequently. I guess context matters, as usual.
reply
Yeah, the truth is going to be a lot more subtle than this.
reply
The LLM that wrote the copy is an idiot.
reply
This is such obvious LLM slop.
reply
Could be also that a frequently edited file had most opportunity to be broken. And it was edited by the most random crowd.
reply
In my case, it's .github/CODEOWNERS.

Nobody is afraid of changing it.

reply
Why does github owners need frequent change? Do members in you team change so often?
reply