Ratchets in software development (2021)

[-]

I built a ratchet system for ESLint originally that we’ve extended it to work with TypeScript, Terraform, and Biome linters.

integrating with each linger is complex but it pays dividends - it’s so handy to be able to write a new lint rule or introduce an off-the-shelf rule without needing to fix all existing violations.

We maintain allowed error counts on a file-by-file basis which makes it easier for developers to understand where they added the new violation.

blog post: https://www.notion.com/blog/how-we-evolved-our-code-notions-...

by maxkfranz1 hours ago|

[-]

This is a really pragmatic approach.

Your errors-over-time chart feels pretty accurate to me. The yellow warnings line really sneaks up on you over time.

by loglog1 hours ago|

[-]

Counting warnings is a poor practice, because you don't see where warnings exist or are added or removed while reading or writing code. Suppression annotations in code next to where the problem occurs are more explicit, and the progress is easy to measure with e.g. git log -S. The main difficulty is automating adding these annotations. For at least one static analysis systems, there is an off the shelf solution: https://github.com/palantir/suppressible-error-prone

by burticlies8 hours ago|

[-]

I’ve never understood why linters don’t have this baked in. You want to deprecate a pattern, but marking it as an error and failing the build won’t work. So you mark it warning and fill everyone’s editors with yellow lines. And then we just get used to the noisy warnings.

Ratchet is such a good word for it.

by notpushkin7 hours ago|

[-]

I’ve been wondering about that, too. FlakeHeaven / FlakeHell does that for Python, but it’s the only example I can think of: https://flakeheaven.readthedocs.io/en/latest/commands/baseli...

by jerf5 hours ago|

[-]

Grovel over your linter's command-line options and/or configuration file. It's not an uncommon feature but from my personal and limited experience it is also not always advertised as well as you like. For instance, golangci-lint has not just a feature to check only changed code, but several variants of it available, but I think possibly the only places that these are mentioned on its site are in the specific documentation of the issues configuration YAML documentation: https://golangci-lint.run/docs/configuration/file/#issues-co... written in a My Eyes Glaze Over coloration scheme [1], and mentioned in the last FAQ, which means reading to the bottom of that page to find out about it.

Most mature systems that can issue warnings about source code (linters, static analyzers, doc style enforcers, anything like that) have this feature somewhere because they all immediately encounter the problem that any new assertion about source code applied to code base even just two or three person-months large will immediately trigger vast swathes of code, and then immediately destroy their own market by being too scary to ever turn on. So it's a common problem with a fairly common solution. Just not always documented well.

[1]: Let me just grumble that in general coloration schemes should not try to "deprioritize" comments visually, but it is particularly a poor choice when the comments are the documentation in the most literal sense. I like my comment colors distinct, certainly, but not hidden.

by lloeki7 hours ago|

[-]

Rubocop allows generation of a TODO file, which is basically an extra config file that you include in the main one and that contains current violations per cop+file and sets appropriate values to cops that have numerical limits.

From there on you can only go one direction.

OP's ratchet would simply be a new custom cop that matches a string, and the whole ignore existing violations would Just Work (and actually "better" since one wouldn't be able to move the pattern around)

by aitchnyu5 hours ago|

[-]

Python linters have a one-step way to suppress all individual errors. I assumed a SaaS like Sourcegraph is the only solution to ensure a codebase doesnt become worse.

by jayd162 hours ago|

[-]

Honestly it would be better to bake in source control context so you can mark new code as more strict than legacy.

You can usually achieve this by adding ignore pragmas to your legacy warnings (although you need to touch that code). But at least that way, daily workflow will see errors and you can find the legacy errors by disabling the pragma.

[-]

the more recent term i’ve heard is “bulk suppression” eg https://eslint.org/blog/2025/04/introducing-bulk-suppression...

by allannienhuis2 hours ago|

[-]

I did something like this years ago for a really large team (~50 devs) when first introducing linting into a legacy project. All we did was count the gross total number of errors for the lint run, and simply tracked it as a low-water mark - failing the build if the number was > the existing number of errors, and lowering the stored number if it was lower. So in practice people couldn't introduce new errors. The team was encouraged to use the boy-scout rule of fixing a few things anytime you had to touch a file for other reasons, but it wasn't a requirement. We threw up a simple line chart on a dashboard for visibility. It worked like a charm - total number went down to zero over the course of a year or so, without getting in the way of anyone trying to get new work done.

by dependency_2x8 hours ago|

[-]

Ratchet is a good name/pattern. It is also grandfathering.

It is similar to how code coverage can be done. Old coverage may be low e.g. 40%, but may require 80% coverage on new lines, and over time coverage goes up.

I wonder if there has ever been a sneaky situation where someone wanted to use forbiddenFunction() really bad, so they remove the call elsewhere and tidy that up, so they could start using it.

by zdc15 hours ago|

[-]

Yep. Grandfathering, deprecation. It's a new implementation of the same concepts.

And ditto for test coverage quality gates. I've seen that pattern used to get a frontend codebase from 5% coverage to >80%. It was just a cycle of Refactor -> Raise minimum coverage requirement -> Refactor again -> Ratchet again, with the coverage gate used to stop new work from bringing down the average.

by antonyh7 hours ago|

[-]

One would hope code reviews could pick up these deceptions, but then again they would spot the use of forbidden functions too albeit much later in the dev cycle than is optimal. Breaking the build early is a solid idea, before it's even committed to source control. No different to applying PMD, CPD, Checkstyle, eslint, yamllint, other linters, but with a custom rule. I really want to use this pattern, there's semi-deprecated code in so many codebases.

For more control and to close that loophole, it could be possible to put annotations/comments in the code to `/* ignore this line */` in the same way that eslint does? Or have a config that lists how many uses in each file, instead of one-per-project?? There's always refinements, but I'm sure that for many projects the simplicity of one counter is more than enough, unless you have devious developers.

[-]

if you have eslint you might as well just write custom rules and get actual syntax aware linting rather than relying on more brittle regex rules. claude et al are very good at getting a lint rule started, with a bit of setup you can make testing lint rules easy. we have a zillion custom rules at notion, most are pretty targeted “forbid deprecated method X besides circumstance Y” kind of things

by alex_smart3 hours ago|

[-]

I know Jenkins is not fashionable these days, but the warnings-ng plugin is perfect for solving this in a tool-independent way. :chefskiss:

The way it works is - the underlying linter tool flags all the warnings, and the plugin helps you keep track of when any particular issue was introduced. You can add a quality gate to fail the build if any new issue was added in a merge request.

by arnorhs8 hours ago|

[-]

Interesting, props for coming up with a good name.

But it's weird to me to call this a "ratchet", and not just a custom lint rule. Since it sounds exactly like a lint rule.

The hard-coded count also sounds a bit like something that I would find annoying to maintain in the long run and it might be hard to get a feeling for whether or not the needle is moving in the right direction. - esp. when the count goes down and up in a few different places so the number stays the same.. you end up in a situtation where you're not entirely sure if the count goes up or down.

A different approach to that is to have your ratchet/lint-script that detects these "bad functions" write the file location and/or count to a "ratchets" file and keep that file in version control.

In CI if the rachet has changes, you can't merge because the tree is dirty, and you'd have to run it yourself and commit it locally, and the codeowner of the rachet file would have to approve.

at least that would be a slightly nicer approach that maintaining some hard-coded opaque count.

[-]

yeah that’s the way we do it at Notion. it’s important to store the allowed violation count in a file type that makes merges easy; we use TSV rather than JSON because dealing with commas and delimiters during merge conflict is super annoying and confusing.

right now we have one huge ratchet.json.tsv file with all violations but it’s getting pretty ungainly now that it’s >1mb length.

by arnorhs3 hours ago|

[-]

interesting, so you guys call it a ratchet file? i thought it was something that OP came up with

by dominicrose3 hours ago|

[-]

> a script which runs at source code linting time

There are moments when we don't bother with optional things like linting, formatting, warnings, etc.

So it's important that there is a moment when these things aren't optional.

by MithrilTuxedo2 hours ago|

[-]

>So it's important that there is a moment when these things aren't optional.

I haven't found anything more effective than making sure it happens fast enough other devs don't have time to think about disabling it. They might make their changes locally relying on an IDE without running the full build, which pushes the exceptions to the build agent. Developers may not have privileges to modify those builds directly, but complaints and emergencies slowly erode impediments to deploying.

by dgoldstein09 hours ago|

[-]

I built something like this that we use both for migrations and disallowing new instances of bad patterns for my mid sized tech company and maintain it. Ours is basically a configuration layer, a metrics script which primarily uses ripgrep to search for matches of configured regexes, a linter that uses the same configuration and shows any configured lint messages on the matches, a CI job that asserts that the matches found are only in the allowlisted files for each metric, and a website that displays the latest data, shows graphs of the metrics over time, and integrates with our ownership system to show reports for each team & the breakdown across teams. The website also has the ability to send emails and slack messages to teams involved in each migration, and when the configuration for a migration includes a prompt, can start a job for an agent to attempt to fix the problem and create a pr.

by Traubenfuchs9 hours ago|

[-]

Do you have any examples?

by OsamaJaber7 hours ago|

[-]

We did something similar with TypeScript strict mode Turned it on per file with a ratchet count, and over a few months, the whole codebase was strict without ever blocking anyone

by taeric3 hours ago|

[-]

The general fault I see here, is that we don't typically make our work tracking tools so that they look at the code for us. Instead, our ticketing systems only have what we have put in them, directly.

This is obviously obnoxious when it comes to stuff like warnings and deprecations. But is also annoying when doing migrations of any kind. Or when working to raise test coverage. Anything that can be determined by checking the source code.

by viraptor7 hours ago|

[-]

I like the idea of ratchets, but the implementation needs to be good for them to work nicely.

> If it counts too few, it also raises an error, this time congratulating you and prompting you to lower the expected number.

This is a pain and I hate that part. It's one of the things that isn't even a big deal, but it's regularly annoying. It makes leaving things in simpler than removing them - the good act gets punished.

One way to make this better is to compare the count against the last merge base with the main branch. No need to commit anymore. Alternatively you can cache the counts for each commit externally, but that requires infra.

by nvader3 hours ago|

[-]

Although this isn't my own article, I wanted to share it because we refer to it often at Imbue because we have an internal system inspired by it. One remarkable side-effect I've discovered of a ratchet system is the increased code quality you get from agents, once you build your workflow to respect them.

I have no qualms about adding patterns like 'interior mutability' in Rust to a ratchet, and forbidding front-line coding agents from incrementing the counter. Then when a feature truly requires it, they can request their parent coordinator agent to bump the count, which gives it a chance to approve or deny the request.

This also gives us the ability to run clean-up agents on the codebase in the background. They are tasked with finding unreasonable instances of the failing ratchets (our ratchet tool spits out file and line numbers), and attempting to fix them.

An early iteration I was mostly amused (and slightly frustrated) to see a cleanup agent stuck in a loop as it tried to clean up `expect()` calls by converting them into `unwrap()`s, which were also forbidden. Then we would see the `unwrap()`s and attempt to fix them by converting them into `expect()`s.

by throw109201 hours ago|

[-]

basedpyright (which is just generally an amazing tool) implements this pattern with a JSON "baseline file" that tracks type warnings and errors. It can even update the baseline file for you during development!

by cocoflunchy7 hours ago|

[-]

Also see https://www.notion.com/fr/blog/how-we-evolved-our-code-notio...

by 5 hours ago|

[-]

deleted

by thraxil8 hours ago|

[-]

Shameless self-promotion, but my own post on Ratchets from a few years back: https://thraxil.org/users/anders/posts/2022/11/26/Ratchet/ Similar basic idea, slightly different take.

by jiehong8 hours ago|

[-]

I think this could be handled by an open rewrite rule [0], with the side effect that it could also fix it for you.

[0]: https://docs.openrewrite.org/recipes

by esafak4 hours ago|

[-]

I forgot about this. It should be a great tool for agents. Does anyone have experience or tips to share? Moderne's thought of it too: https://www.moderne.ai/product/moddy

by HPsquared8 hours ago|

[-]

It's like looking for "linter regressions" rather than test regressions.

by 0xfab15 hours ago|

[-]

When the calls to THE FORBIDDEN METHOD are eventually replaced and the method removed, we can bury the ratchet.

by gorgoiler8 hours ago|

https://github.com/Instagram/LibCST

[-]

Love it! …but of course I’d worry about a diff that added one offense while removing another, leaving the net sum the same. Perhaps the author handles this? You want to alert on the former and praise on the latter, not have them cancel out through a simple sum. Admittedly it’s a rare sounding edge case.

The more trad technique for this would be to mark the offending line with # noqa or # ignore: foo. Another way is to have a .fooignore file but those are usually for paths or path globs to ignore.

I like the author’s idea[1] of having the “ignore” mechanism next to the linter codebase itself, rather than mixed in with the production codebase. Adding the files and line numbers for known-offenders to that code could be a useful alternative to a simple sum?

Perhaps more robustly, some kind of XPath like AST syntax to indicate which parts of the codebase have the known problem? It feels just as fragile and could quickly get over complicated.

At the end of the day an online comment has always done it for me. With Python, Meta’s libcst is an excellent and fast way to get an AST that includes comments. It’s the most robust tool I’ve found but you can just use built-in ast.py and ad-hoc file:line parsing too.

[1] Sorry to be a fanboi but Antimemetics is amazing!

https://qntm.org/fiction

by Normal_gaussian3 hours ago|

[-]

This is exactly the issue I bumped into eight or so years ago. I had a ratchet style job that would maintain a file of all of the type issues in the system. Add one, remove one was very common - particularly when it amounted "transform one" so the error change is all on the same line. Note that this is compounded by the errors often showing up quite far away from the code change.

I ended up using something similar to `// @ts-ignore` which would encode a truncated hash of the error message on the line, as well as a truncated hash of the lines AST; the original error message and the justification.

These were long lines so were a PITA, but they immediately had this 'ratchet' effect.

I tried several times to move to a central file referencing the issues, but the complexity in maintaining the references with common refactors was a blocker.

by charliecs8 hours ago|