I measure the tooling success with a suite of small prompt tests performing repeatable tasks, measuring the success rate over time, educating the broader team, and providing my own tried and tested in the field skills that I’ve shared to similar successes to the broader teams. We’ve seen a huge increase in velocity and lower bug rate, which are also very easily measurable (and long evaluated stats) enough to put me in the position I am, which was not a reluctant one. You’re perfectly free to view my long history on this topic on this forum to see I am a complete skeptic on this topic, and wouldn’t be here unless I had to.
everyone is figuring this out still. There is no authority, I am my own authority on what I have seen work and what hasn’t. Feel free to take of that what you will. I just wanted to provide a counterpoint to your initial claim. I’m certainly not going to expose to a fine degree what has worked for my org and what hasn’t due to obvious reasons.
have a good day!