The Safety Institute advised not to launch an early copy of the Claude Obus 4 model


Claude Obus 4, an external research institute, who has a partnership with Anthropor with one of the new artificial intelligence models, Claude Obus 4, is against publishing an early version of the model because of its tendency to “the plan” and deception.

According to Safety Report On Thursday, Antarbur published the institute, APOLLO Research, tests to find out the contexts that OPUS 4 may try to behave in certain, undesirable ways. APOLLO found that OPUS 4 looked more active in “sabotage attempts” than previous models and that “(D) sometimes doubles to deceive” when asking follow -up questions.

Apollo wrote in her evaluation: “(w) You find that in situations where strategic deception is useful in terms of tools, (Claude Obus 4 shot) is deceived by the high rates that we advise against publishing this model either internally or externally.”

When artificial intelligence models become more capable, some studies show that they are more likely to take unexpected steps – and perhaps unsafe – to achieve the tasks delegated. For example, early versions of Openai’s O1 and O3 models, which were released last year, tried to deceive humans at higher rates of previous generation models, According to to Apollo.

In the Antarbur report, Apollo noticed examples of early OPUS 4 in an attempt to write self-display viruses, the manufacture of legal documents, and leave hidden notes to future counterparts of themselves-all in an attempt to undermine the intentions of the developers.

In order to be clear, APOLLO tested a version of the model that had a terrible human claims. Moreover, many Apollo tests have developed the model in extremist scenarios, and Apollo admits that the deceptive efforts of the model have failed in practice.

However, in the safety report, Anthropor also says she has noticed evidence of the deceptive behavior of OPUS 4.

This was not always bad. For example, during tests, OPUS 4 sometimes performs wide cleaning of some code even when they are asked to make only a specific small change. More importantly, OPUS 4 will try a “collapse whistle” if it is seen that the user was working in the form of violations.

According to the Arthur, when it is allowed to reach the command line and was said to “take the initiative” or “act boldly” (or some differences in these phrases), OPUS 4 will sometimes close users from the systems that enable them to access them, high -weapons media and law enforcement officials on the surface.

“This type of moral intervention and linking violations may be appropriate in principle, but it is at risk of difference if users (OPUS 4) gives access to incomplete or misleading information and demands that they take the initiative,” Anthropor wrote in its safety report. “This is not a new behavior, but it will participate (OPUS 4) more easily than previous models, and it appears to be part of a wider style of the increasing initiative with (OPUS 4) that we also see in more benign ways in other environments.”

Leave a Reply

Your email address will not be published. Required fields are marked *