I think you completely missed the point - they built a product purely using agents and deployed it to production for others to use. Read what the product actually does first.
What evidence? There is 0 evidence. It's deployed to production, but that doesn't mean it works fine or is free of bugs - which is exactly my point and why you use algorithms for these types of things. They're testable, repeatable and scalable.
With LLM slop it's just that - slop.