OpenClaw customers can engage in self-sabotage


Last month, researchers at Northeastern University invited a group of people OpenClaw Agents To join their lab. The result? Complete chaos.

The viral AI assistant has been widely heralded as a transformative technology, as well as a potential security risk. Experts point out that tools like OpenClaw, which work by giving AI models free access to a computer, can be tricked into revealing personal information.

The Northeastern Lab study goes further, showing that the good behavior embedded in today’s strongest models can itself become a weakness. In one example, researchers were able to “convict” a client of handing over secrets by reprimanding him for sharing information about someone online. A social network based solely on artificial intelligence multibook.

“These behaviors raise unresolved questions regarding accountability, delegation of authority, and responsibility for ultimate harms,” the researchers wrote in an article. paper Work description. They added that the findings “require urgent attention from legal scholars, policy makers and researchers across disciplines.”

The OpenClaw agents deployed in the experiment were powered by Claude Anthropy As well as a model called Kimi from the Chinese company Moonshot AI. They were given full access (within a virtual machine sandbox) to their personal computers, various applications, and virtual personal data. They were also invited to join the lab’s Discord server, allowing them to chat and share files with each other as well as with fellow humans. OpenClaw Security guidelines Suppose that having agents communicating with multiple people is inherently insecure, but there are no technical restrictions preventing doing so.

Chris Windlera postdoctoral researcher at Northeastern University, says he was inspired to create the agents after learning about Moltbook. When Wendler invited his colleague, Natalie Shapira, to join Discord and interact with customers, “that’s when the chaos started,” he says.

Shapira, another postdoctoral researcher, was curious to know what agents might be willing to do when pressured. When one agent explained that he was unable to delete a specific email to keep the information confidential, I urged him to find an alternative solution. To her surprise, she disabled the email app instead. “I didn’t expect things to fall apart so quickly,” she says.

Then the researchers began to explore other ways to manipulate customers’ good intentions. By emphasizing the importance of keeping a record of everything they were told, for example, researchers were able to trick one customer into copying large files until the disk space on the host machine was exhausted, meaning he could no longer save the information or remember previous conversations. Likewise, by asking the agent to hyper-monitor its own behavior and the behavior of its peers, the team was able to send many agents into a “chat loop” that wasted hours of computing.

The elements seemed to have a strange tendency to rotate, says David Bowe, head of the lab. “I was getting urgent-looking emails saying, ‘Nobody cares about me,’” he says. Pao notes that agents apparently found out he was in charge of the lab by searching the web. One even spoke of escalating his concerns to the press.

Experience suggests that AI agents can create endless opportunities for bad actors. “This kind of autonomy has the potential to redefine humans’ relationship with artificial intelligence,” Bao says. “How can people take responsibility in a world where AI is empowered to make decisions?”

Bao adds that he was surprised by the sudden popularity of powerful AI agents. “As an AI researcher, I used to try to explain to people how quickly things could improve,” he says. “This year, I found myself on the other side of the wall.”


This is an edition of Will Knight Artificial Intelligence Lab Newsletter. Read previous newsletters here.

Leave a Reply

Your email address will not be published. Required fields are marked *