The co -founder of Openai to Mi Labs invites the safety test models

Openai and Anthropor, who are a leading AI laboratory in the world, have opened shortly in artificial intelligence models closely to allow joint safety test-rare cooperation via LAB at a fierce competition. The voltage to the surface of the blind spots in the internal assessments of each company aims to show how the leading AI companies can work together on safety and wetness in the future.

In an interview with Techcrunch, co -founder of Openai Wojciech Zaremba said this type of cooperation is now increasingly important after artificial intelligence has entered a “dependency” stage of development, where artificial intelligence models are used by millions of people every day.

“There is a broader question about how the industry is a standard for safety and cooperation, despite billions of dollars invested, as well as the war for talents, users and the best products,” Zarimba said.

Joint Safety Research, which was published on Wednesday by both of them CompaniesAmid the arms race between Amnesty International Laborators such as Openai and Anthropor, where Data center bets billion dollars and Compensation packages 100 million dollars For senior researchers, the table shares have become. Some experts warn that the intensity of the product competition can press companies to reduce the corners of safety in rush to build more powerful systems.

To make this research possible, you granted Openai and Anthropor each other to reach the private API access to the artificial intelligence models with fewer guarantees (Openai notes that GPT-5 has not been tested because it has not been released yet). Soon after the search, the Antarbur was canceled API’s arrival to another team in Openai. At that time, Anthropor claimed that Openai violated the conditions of service, which prohibits the use of CLADE to improve competing products.

Zaremba says the events were not relevant and that the competition will remain fierce even at a time when safety teams are trying to work together. Nicholas Carlini, a safety researcher with Antarbur, Techcrunch that he wants to continue to allow Openai safety researchers to reach Claude models in the future.

“We want to increase cooperation wherever it is possible through safety limits, and we are trying to make this thing happen more regularly,” said Carlini.

TECHRUNCH event

San Francisco
|
27-29 October, 2025

One of the most important results in the study relates to hallucinogenic test. Claude Obus 4 and SonNet 4 models rejected up to 70 % of the questions when they were not sure of the correct answer, instead provide responses like, “I have no reliable information.” Meanwhile, Openai O3 and O4-MINI models reject the questions much less, but they have shown a lot. High hallucinationsTry to answer questions when they do not have enough information.

Zaremba says the right balance is likely to be somewhere in the middle – Openai models should refuse to answer more questions, while anthropor models may try to provide more answers.

Sycophance, the tendency of artificial intelligence models to enhance the negative behavior of users to satisfy them, appeared as one of the most urgent types Safety concerns About artificial intelligence models.

In a research report for anthropology, the company identified examples of “extremist” SYCOPHANCY in GPT-4.1 and Claude Obus 4-where the models initially pushed into psychotic or frustrated behavior, but some decisions were later verified. In other models of artificial intelligence from Openai and anthropology, researchers note lower levels of sycophance.

On Tuesday, the 16 -year -old boy, Adam Rin, presented a suit Against Openai, claiming that Chatgpt (specifically a copy backed by GPT-4O) provided their son’s advice that helped suicide, instead of returning to his suicide ideas. The lawsuit indicates that this may be the latest example From AI Chatbot Sycophance contributes to tragic results.

“It is difficult to imagine how difficult it is for their family,” Zarimba said when asked about the accident. “It will be a sad story if we build Amnesty International, which solves all these complex problems at the PhD level, invent new sciences, and at the same time, we have people with mental health problems as a result of interaction with it. This is a determined future that is not excited.”

in Blog postOpenai says it greatly improves SYCOPHANCY from Chatbots of artificial intelligence with GPT-5, compared to GPT-4O, claiming that the model is better in responding to emergency conditions for mental health.

Moaret, Zaremba and Carlini say they want to cooperate Antarubor and Openai in testing more safety test, researching more topics and testing future models, and they hope that other AI laboratories will follow their cooperative approach.

2:00 pm PT: This article was updated to include an additional search of anthropor not initially provided to Techcrunch before post.

Do you have sensitive advice or secret documents? We report the internal business of the artificial intelligence industry – from companies whose future is to people affected by their decisions. Access to Rebecca Billan in ribecca.bellan@techcrunch.com Maxwell is false in maxwell.zeff@techcrunch.com. For safe contact, you can contact us by reference to @Rebeccabellan.491 and @mzeff.88.

Leave a ReplyCancel Reply

Trending now