It is impossible to prove or disprove because if everything DOES NOT work fine you can always say that the prompts were bad, the agent was not configured correctly, the model was old, etc. And if it DOES work, then all of the previous was done correctly, but without any decent definition of what correct means.
If a program works, it means it's correct. If we know it's correct, it means we have a definition of what correct means otherwise how can we classify anything as "correct" or "incorrect". Then we can look at the prompts and see what was done in those prompts and those would be a "correct" way of prompting the LLM.