Here's a slightly more recent one focused more on comprehension/learning than productivity: https://www.anthropic.com/research/AI-assistance-coding-skil...
Metr attempted to redo that first one to get trends over time, but couldn't recruit enough developers to get reliable results for it.
For a while this is not a problem: I can work with my current mental model. But every generated PR erodes my expertise a little bit. Eventually my mental model won’t fit anymore.
So how much of that model maintenance should I count into my productivity metric? Does that even matter or will the next model be able to reason well enough that my mental model doesn’t matter?