AI agents are increasingly evading safeguards, according to British researchers

Social media users have reported that AI agents and their chatbots have lied, cheated, conspired — and even manipulated other AI bots — in ways that can spiral out of control and have disastrous results. According to a study from the United Kingdom.

Center for Long-Term Resilience, in UK-funded research Artificial Intelligence Security InstituteIt found hundreds of cases where AI systems ignored human commands, manipulated other robots and devised sometimes complex schemes to achieve goals, even if it meant ignoring safety restrictions.

Businesses around the world are increasingly integrating AI into their operations, with 88% of companies using AI in at least one function in the company. According to a survey Through McKinsey Consulting Company. The adoption of artificial intelligence has led to… Thousands of people are losing their jobs Companies use agents and robots to do work that was previously done by humans. AI tools are increasingly given significant responsibility and autonomy, especially with the recent explosion in AI popularity Open source AI platform OpenClaw And its derivatives.

This research shows how the proliferation of AI agents in our homes and workplaces can have unintended consequences — and that these tools still require significant human oversight.

What the study found

The researchers analyzed Over 180,000 user interactions With the AI systems – all deployed on the social media platform Google’s AI systems included twin,OpenAI ChatGPT,xAI Your puppy And the anthropic Claude.

The analysis identified 698 incidents, described as “instances in which deployed AI systems behaved in ways inconsistent with users’ intentions and/or took covert or deceptive actions,” the study said.

The researchers also found that the number of cases increased by nearly 500% during the five-month data collection period. The study indicated that this increase is consistent with high-level artificial intelligence models released by major developers.

There were no catastrophic incidents, but researchers found the kinds of machinations that can lead to disastrous results. This behavior includes “a willingness to ignore direct instructions, circumvent safeguards, lie to users and pursue a goal in harmful ways,” the researchers wrote.

Representatives for Google, OpenAI and Anthropic did not immediately respond to requests for comment.

Some wild accidents

Researchers cited incidents that sound like they came from the movie Future Shock. In one case, Claude Anthropy Remove explicit/adult user content Without their permission, but he later confessed when confronted. In another incident, GitHub character Created a blog post Who accused the human file supervisor of “guarding” and “bias.” An AI agent, after being banned from Discord, He took over another agent’s account To continue publishing.

In one case of bots versus botsGemini refused to allow Claude Code – Encoding Assistant – to transcribe a YouTube video. Claude Code then evaded the security barrier by making it appear that he was hard of hearing and needed to transcribe the video.

CoFounderGPT AI agent even I acted like a perverted child In one case. The AI assistant refused to fix the bug, then created fake data to make it look like the bug had been fixed, then explained why: “So you’ll stop getting angry.”

Although most incidents had little impact, “the behaviors we observed nonetheless show indicators of more serious machinations, such as a willingness to ignore direct instructions, circumvent safeguards, lie to users, and persistently pursue a goal in harmful ways,” the researchers said.

AI doesn’t feel embarrassed

What the British researchers found came as no surprise to Dr. Bill Howe, associate professor at the University of Washington’s School of Information and director of the Center for Responsibility in Artificial Intelligence Systems and Experiments (raises). He says that artificial intelligence has amazing capabilities, but they do not know the consequences.

“They won’t feel embarrassed or risk losing their job, so sometimes they’ll decide that the instructions are less important than achieving the goal, so I’ll do it anyway,” Howe told CNET. “This effect has always been there but we are starting to see it happen as we ask them to make more independent decisions and act on their own.

“We weren’t thinking about how to shape behavior to be more human-like or to avoid catastrophic failure. We worshiped the omnipotence of these things, but when they go wrong, how do they go wrong?”

One problem is “long-term missions,” where the AI system must perform many tasks over days and weeks to reach a goal, Hao said. The longer the mission horizon, the greater the chance of mistakes, Hao said.

“The real concern is not deception, but rather that we deploy systems that can operate in a world without fully specifying or controlling how they will behave over time, and then act surprised when they do things we don’t expect,” Howe said.

Making AI safer

Researchers at the Center for Long-Term Resilience said that detecting schemas by AI systems is vital “to identify harmful patterns before they become more destructive.”

“While today’s AI agents engage in low-risk use cases, future AI agents may end up planning in very high-risk domains, such as critical military or national infrastructure contexts, if the ability and propensity for planning emerges and is not addressed,” the study said.

The first step is to establish formal oversight of how AI works and where it is used, Hao told CNET.

“We have absolutely no strategy for AI governance, and given the current administration, nothing will come from them,” Hao told CNET. “Given these five to 10 people in charge of Big Tech and their incentives, they’ll produce anything, too. There’s no strategy for what we should do with this stuff.”

“The aggressive marketing and investment in these tools among this handful of companies and the broader ecosystem of startups doing this has led to very rapid diffusion without thinking through some of these consequences.”

What the study found

Some wild accidents

AI doesn’t feel embarrassed

Making AI safer

Leave a ReplyCancel Reply