Meta Director of AI Safety Allows AI Agent to Accidentally Delete Her Inbox


Meta Director of AI Safety Allows AI Agent to Accidentally Delete Her Inbox

Meta’s director of safety and alignment at its “superintelligence” lab, supposedly the person at the company who is working to make sure that powerful AI tools don’t go rogue and act against human interests, had to scramble to stop an AI agent from deleting her inbox against her wishes and called it a “rookie mistake.” 

Summer Yue, the director of alignment at Meta Superintelligence Labs, a part of the company that is working on a hypothetical AI system that exceeds human intelligence, posted about the incident on X last night. Yue was experimenting with OpenClaw, an viral AI agent that can be empowered to perform certain tasks with little human supervision. OpenAI hired the creator of OpenClaw last week. 

“Nothing humbles you like telling your OpenClaw ‘confirm before acting’ and watching it speedrun deleting your inbox,” Yue said. “I couldn’t stop it from my phone. I had to RUN to my Mac mini like I was defusing a bomb.”

Yue also shared screenshots of her WhatsApp chat with the OpenClaw agent, where she implores it to “not do that,” “stop, don’t do anything,” and “STOP OPENCLAW.”

Yue said she instructed the AI agent to “Check this inbox too and suggest what you would archive or delete, don’t action until I tell you to.” She said in an X post, “This has been working well for my toy inbox, but my real inbox was too huge and triggered compaction. During the compaction, it lost my original instruction.”

As we reported last month, OpenClaw, which was known as ClawdBot at the time, is not ready for prime time. Hacker Jamieson O’Reilly showed that it’s possible for bad actors to access someone’s AI agent through any of its processes connected to the public facing internet, and that it’s trivial to create a supply chain attack through a site where people share and download popular instructions for these AI agents. 

OpenClaw is also subject to classic AI alignment problems, in which AI is technically following instructions, but is doing so in a way that is unexpected and harmful. For example, it could drain your wallet by spending $0.75 cents every 30 minutes to check if it’s daytime yet

As countless people on X have said in response to her post, seeing the person in charge of making sure powerful AI tools are safe at one of the biggest tech companies in the world trust an AI agent that is known to pose several serious security risks, does not inspire a lot of confidence in what Meta and other big AI companies are doing. 

“Rookie mistake tbh,” Yue said in another post. “Turns out alignment researchers aren’t immune to misalignment. Got overconfident because this workflow had been working on my toy inbox for weeks. Real inboxes hit different.”

Scroll to Top