What’s interesting about this is that for previous technologies you could define a standard and demonstrate compliance with interfaces and behavior.
But with LLMs, how do you know switching from one to another won’t change some behavior your system was implicitly relying on?