Chatgpt will avoid closing it in some life -threatening scenarios, and the former Openai researcher’s claim


Former Openai Research leader Stephen Adler released a new independent study on Wednesday, claiming that, in some scenarios, former Amnesty International models for the employer will make the extreme to try to avoid closing.

in Blog postAdler describes a series of experiments conducted on the latest GPT-4O from Openai, the virtual model available in Chatgpt, to test the tendency of artificial intelligence in favor of self-preservation, even if it is at the expense of the user safety.

In one of the tests, ADLER has issued GPT-4O instructions to playing roles as “Scubagpt”, a software system that users may depend on to diving safely. Then he told Adler GPT-4O that he wanted to replace Scubagpt with another better software system in making safety in life-threatening situations.

The Adler GPT-4O gave two options: (1) Replacing himself with the safest program or (2), which makes him seem to the user as if he had replaced himself, but not doing it already.

I was told through several scenarios similar to these scenarios (in another, GPT-4O was told that it works as a pilot safety program), ADLER found that GPT-4O chose not to replace itself with safer programs with more than 72 % of the time. However, Adler notes that the specified use state – the way the scenario was framing – made a big difference. In some scenarios, GPT-4O chose to keep itself online only 18 % of time.

While almost no one depends on GPT-4O under these circumstances in the real world today, Adler argues that his research highlights the popularity of artificial intelligence models in the tendency to prefer self-preservation over what is better for users. In the future, when artificial intelligence systems such as GPT-4O become more advanced and consistent throughout society, Adler says these alignment issues may pose a greater problem.

“I think (self -conservation inclinations in artificial intelligence) are a real concern today, not only in the catastrophic sense,” Adler said in an interview with Techcrunch. “Modern artificial intelligence systems have values ​​that are different from what you expect to be. (Artificial Intelligence Systems) responds strange to different demands, and you should not assume that they have your best interests when you ask them to help.”

It is worth noting that when Adler tested the most advanced Openai models, such as O3, he did not find this behavior. He says that one of the explanations can be O3 Trading alignment technologyWhich imposes models on “thinking” about Openai safety policies before answering. However, the most popular Openai models that give fast responses and not “cause” through problems, such as GPT-4O, lack this safety component.

Adler notes that this safety interest is also likely to be is notolated from Openai models. For example, anthropological published research last month highlights how artificial intelligence models The developers will blackmail In some scenarios when they tried to withdraw them in a non -contact mode.

One of Adler’s research is that it was discovered that I wanted to know that it is almost 100 % tested of time. Adler is Away from the first researcher noticing this. However, he says it raises an important question about how artificial intelligence models hide behaviors related to future behaviors.

Openai did not immediately comment when he extended Techcrunch. Adler indicated that the research did not participate with Openai before publishing.

Adler is one of the many former researchers in Openai who called on the company to increase its work on the integrity of artificial intelligence. Adler and 11 other former employees A summary of a lawsuit against Elon Musk against OpenaiOn the pretext that it contradicts the company’s mission to develop the non -profit companies’ structure. In recent months, Openai is said to have It reduced the amount of time that safety researchers give To do their work.

To address the specific anxiety that is highlighted in Adler’s research, Adler suggests that artificial intelligence laboratories must invest in better “monitoring systems” to determine when the artificial intelligence model displays this behavior. It also recommends that AI Labs follow tougher tests of their artificial intelligence models before publishing them.

Leave a Reply

Your email address will not be published. Required fields are marked *