Anthropic's stuff been useful for the last two years I'd say, especially in the beginning of Claude Code, but as soon as the Codex TUI was available, I was daily-driving both of them, literally executing the same prompts for each of them and comparing the final results, and Codex simply writes better code in 9/10 cases (but still not always).
We've been experimenting with "agent harnesses" way before that though, I'm sure the first time I tried building that sort of thing was in 2023 sometime with GPT3, and I'm like 80% confident I tried the same sort of TUI experience as CC from some random user before Claude Code even became public.