Based on what I've seen so far, Meta AI Support Assistant (they call it "MAISA") had tool calls that a) start an email verification to any specific email, phone number, or the contact points linked to an account and b) allow generating a password reset link for an account based on an email verification attempt. I don't think it had any access to the actual codes themselves, but rather think a handle or ID for an email verification attempt (along with the user provided verification code based on user input) was provided to the "generate reset password link" tool call, and the tool call failed to properly validate the actual email used in that attempt belonged to the account allowing the ATO.
The tool call for MAISA to generate a password reset link should have failed with an email verification attempt that corresponds to an email not linked to the account (and I believe I even tested this at one point on Facebook and encountered an error that successfully prevented it), but I suspect they tried making a change to this tool call for Instagram where slightly older, recently unlinked emails could be used to recover an account that got hijacked by an attacker, which added the need to allow emails not currently linked to the account to be used and set to the user's primary email.
I also suspect that the MAISA tool call change called a wrong API or something that unintentionally allowed any email verification attempt that was successful to be used, but the engineers did not add a sufficiently thorough e2e test case to test the tool call against unrelated email verification attempts being provided to the tool call. This is the part I think should be focused on the most. Tool calls for agents that have their output potentially influenced by an attacker should be treated like external APIs that anyone can reach, and they should be tested as such.
This is all obviously a guess, doesn't take into account the many signals they use to determine if an account recovery attempt is valid, and could be very inaccurate, but it's the closest to what I (someone who deals with Meta security a lot) think could have allowed this to happen.
I'd go out on a limb to say the tests were likely AI generated. It's easy to miss a case like this one given that models like to generate a ton of test code that 'look' good at a glance but have subtle logic bugs that could potentially defeat the purpose of the test itself.
My own anecdata here, Claude generated a JUnit test with all the right setup, but missed a crucial assertion (there were very many other minor assertions) which made the test useless mostly.
https://www.wsj.com/articles/meta-employees-security-guards-...
This exact same flow could have been (and may have been; I don’t know how much the chatbot here actually does) statically coded.
For what it’s worth I don’t think you can call this social engineering since there was no human on the other end, even though it appears similar.
The question is, if there were actual human support agents, would they have built additional safeguards to prevent social engineering in this manner?
Even if humans failed at the same rate, if you tried to exploit at scale you’d be throttled by the size of the support team. The failure would happen at human-scale time frames and throughput.
- instead of the ai context dying.
in the ai case, information only survives to the extent where the ai is empowered to store a note or notify a manager of an observation. Anything that does not result in sending a message/storage is wiped
The reason it worked there is that the designers of the system didn't anticipate that the AI will agree to accept any email (maybe they even put guardrails against it in the system prompt, we don't know). It's more like social engineering than bad-security-code, except that like the sibling comment said an actual human will probably not approve that.
These are contradictory cases. If you put guardrails into the system prompt, you've anticipated that the AI will take the action you're guardrailing against. And since AI prompt compliance is at best stochastic (and realistically just crap, over large sample sizes), every guardrail is an explicit recognition of a failure -- the guardrail will be ignored, and you can't pretend you didn't realize it was a problem, since you put it in.
The best comparison I can think of is that it's like validating dats on the frontend; it can make for a better user experience and he more efficient than hitting the backend when you know it will be an error, but it's not protection in any meaningful sense, and if you're not also enforcing invariants from behind the API, you're going to have a bad time. This is pretty similar to the type of issues you might run into with an implementation like that, where someone might make a request with data that you wouldn't expect from your frontend and perform operations you didn't mean to allow.
It might be bad to have it if the user can obtain the system prompt and make note of any advisories as potential weaknesses.
This looks like a terrible design rather than an AI problem to me, though.
An AI enabled terrible design. AI acted as a black box of stupidity, that obscured the stupidity of the design.
Humans do get fooled but it usually takes far more effort than that because a human service rep can learn and is worried about having a job tomorrow.
Do we actually know that a human was in the loop before and that the human judgement was replaced by an LLM? Or is that pure speculation?
I have certainly seen account reclamation flows that allowed providing a new email address (but usually with better safeguards).
https://www.meta.com/account-recovery-support/ai-support-ass...
Now, it’s possible that they instead moved it to human workers and simultaneously forgot everything they’d learned about security or training, but that seems unlikely.
I can think of several pre-2000s chat rooms that did EXACTLY this. It is how I lost several chat accounts as a teenager.
But had never been until it was wrapped in a chatbot. It’s just about unheard of for a major site in the modern era, isn’t it? I think the AI factor is essentially essential. All but.
Like, flagging VPN endpoints is bread and butter for this kind of thing and must already exist. But it's been bypassed
Until I remember seeing someone saying "MCP is dead, we just give agents command line access now". Then I start to think that looking at this in the context of ai is helpful.
If you'd do a retrospective and ignore how AI has shaped expectations and a company's culture to allow this to pass through into production, you'd be complicit/perpetuating what led to this debacle in the first place.
It's not the end of the world, and water isn't going anywhere, but saying AI has essentially nothing to do with it is just a bad take.
Also I've used Meta's old password recovery system. It's not possible to do this in that version. The chatbot is what makes this possible.
I mean this particular auth flow has been a well-known pattern, even before Ai came along.
I guess the only way they got away with this is due to the Ai in the loop. They kind of social (artificial) engineered the Ai, which prolly overlooked the well-known password recovery pattern.
dontake excuses for the greedy
My anecdotal experience is my Facebook account was compromised several years ago after TOTP 2FA was disabled. Didn't exactly give me a warm fuzzy about Facebook security policies at the time, and this new attack just reaffirms that.
Assigning Jr engineers for security support is ridiculous partly because young people don’t understand how critical security is sometimes. And partly because they don’t value privacy as much.
As for your comment about junior engineers, see kennywinker's reply to this thread - I share the same thoughts.
If our goal isn’t to make excuses for the top of the org chart, a more likely explanation is that senior management is heavily incentivizing shipping AI features and this went out as a high-impact change reviewed in a rush, probably by AI.
I’ve been a jr engineer at a large company. I had the power to implement absolutely jack shit on my own. I deeply doubt the security flow for account recovery in meta ai account security was a single jr engineer.
What i think is actually going on is basically a soft form of ai psychosis. Senior engineer gets ai to code ai account recovery feature, that same or a different engineer asks ai to review the feature, and then it gets pushed to prod. Move fast, break things. The ai coded it, the ai reviewed it - the people trusted the ai because it sounds confidently right.
Just like how the ai doesn’t know if you should walk or drive to the car wash, the ai doesn’t understand exploits like this one.
I will agree that anyone that works at Meta is likely not somebody who values privacy very much, though.
Genuine question...why would that need to be hand-written?
It makes absolute sense as a general statement and is kinda crazy that this wasn't a built-in limitation, but I'm not quite sure why the code for that bit must be hand-written (provided the code functionally does what you describe).
Because they are idiots. You need to be a freaking idiit to trust AI.