A security researcher at Meta AI said that an OpenClaw customer tampered with her inbox


the Now X’s post is going viral From Meta AI, security researcher Summer Yue reads, at first, like satire. She asked her OpenClaw AI agent to check her full email inbox and suggest what should be deleted or archived.

The agent started running frantically. She began deleting all of her emails in a “fast run” while ignoring her commands from her phone to stop.

“I had to run to my Mac mini like I was defusing a bomb,” she wrote, posting photos of discarded stop claims as receipts.

The Mac Mini, an affordable Apple computer that lies flat on a desk Fits in the palm of your handhas become the preferred device these days to run OpenClaw. (The Mini is selling “like hotcakes,” said one apparently “confused” Apple employee.) Famous artificial intelligence researcher Andrei Karpathy When he bought one to run an OpenClaw alternative called NanoClaw.)

OpenClaw It is, of course, the open source AI agent that achieved fame through Moltbook, a social network dedicated solely to AI. OpenClaw customers were at the heart of this Now the episode has been largely debunked On Moltbook where it seemed that the AI ​​was conspiring against the humans.

But OpenClaw’s mission, according to her GitHub pagedoes not focus on social networks. It aims to be an AI personal assistant that runs on your own devices.

The Silicon Valley crowd fell in love with OpenClaw so much that it became “the claw” and “the claws.” Buzzwords of choice For agents running on personal devices. These include other factors ZeroClaw, Iron clawand Pico Claw. Y Combinator’s podcast team was even featured on their site Latest episode They wear lobster costumes.

TechCrunch event

Boston, MA
|
June 9, 2026

But Yu’s post serves as a warning. As others at X have noted, if an AI security researcher encounters this problem, what hope do ordinary humans have?

“Were you intentionally testing her guardrails or did you make a rookie mistake?” A software developer asked her on X.

“Rookie mistake,” she replied. She was testing her agent with a smaller inbox, as she called it, and it worked well for less important emails. I had gained her trust, so I thought she would give up the real thing.

She wrote that Yue believed the large amount of data in her real inbox “led to stress.” Compression occurs when the context window—the running record of everything the AI ​​was said and done in a session—becomes too large, causing the agent to begin summarizing, compressing, and managing the conversation.

At this point, the AI ​​may override instructions that humans consider very important.

In this case, you may have skipped her last prompt—asking her not to behave—and reverted back to her instructions from her “game” inbox.

As well as several others On the X pointed, Claims cannot be trusted To act as security barriers. Models may misunderstand or ignore them.

Many people offered suggestions that ranged from the exact syntax Yu should have used to stop the client, to different methods to ensure better compliance with guardrails, such as writing instructions to custom files or using other open source tools.

In the interest of complete transparency, TechCrunch was unable to independently verify what happened to Yu’s inbox. (She did not respond to our request for comment, although she did respond to many of the questions and comments I sent via X.)

But it doesn’t really matter.

The moral of the tale is that agents targeting knowledge workers, at their current stage of development, are risky. People who say they use it successfully list ways to protect themselves.

And one day, perhaps soon (by 2027? 2028?), it may be ready for widespread use. Goodness knows many of us want help with email, grocery orders, and making dentist appointments. But that day has not yet come.

Leave a Reply

Your email address will not be published. Required fields are marked *