Well, software presumably has a goal of accomplishing something for some end-user, so the progress should be trivial to measure: are features/changes being completed?
The marketing ploys of OpenAI/Anthropic where agents build something that nobody uses might be hard to track given that there are zero users. But what about everyone using agents for real software? It's trivial to prove that agents make progress.