They could be converting on products the user didn’t even begin with an intention to buy, which would be impossible to compare with their existing flows.
Walmart's goal is to sell more things. Using AI did not help to achieve that goal. Is this a failure? No, we need to define a new metric based on what AI can do.
Same thing with software.
Are we shipping better software to happier customers? No? Better measure token usage, number of lines changed, and "developer velocity" instead.
The kind of microcosm parodied by The Truman Show is becoming plausible, at least digitally.