upvote
> but the engineers did not add a sufficiently thorough e2e test case to test the tool call against unrelated email verification attempts being provided to the tool call.

I'd go out on a limb to say the tests were likely AI generated. It's easy to miss a case like this one given that models like to generate a ton of test code that 'look' good at a glance but have subtle logic bugs that could potentially defeat the purpose of the test itself.

My own anecdata here, Claude generated a JUnit test with all the right setup, but missed a crucial assertion (there were very many other minor assertions) which made the test useless mostly.

reply
Seems like the most plausible explanation. OTOH it feels like this is the sort of thing that might have been discovered/mitigated more quickly had there been a human in the loop.
reply
OTOH one could previously pay an Instagram support contractor to do an account swap, so having a human in the loop allows for other avenues of exploit:

https://www.wsj.com/articles/meta-employees-security-guards-...

reply
This still happens. Meta doesn't do much to protect against this, they just fire more people and hire new agents when they find out one was bribed.
reply