The new anthropoor model excels in thinking and planning – and has Pokemon skills to prove this

Antarbur announced two New models, CLADE 4 OPUS and Claude Sonnet 4, during the first developer conference in San Francisco on Thursday. The husband will be available immediately to pay Claude subscribers.

The company says that the new models, which jump the designation agreement from 3.7 directly to 4, have a number of strengths, including its ability to think and plan and remember the context of talks during long periods of time. CLAUDE 4 OPUS is also the best in Pokémon playing from its predecessor.

“I managed to work in Pokemon for 24 hours,” says Mike Crager, chief of human product official in an interview with WIRED. Previously, the longest that the model could play was only 45 minutes, and a company spokesman added.

A few months ago, Antarubor launched a Nashl stream called “Claude Pokemon Plays” that displays the capabilities of Claude 3.7 Sonnet in Pokémon Red Live. The explanatory show aims to show the ability of Claude to analyze the game and make decisions step by step, with minimal direction.

The lead behind Pokemon’s search is David Hershey, a member of the technical staff in the anthropoor. In an interview with Wire, Hershey says he chose Pokémon Red because he is a “simple stadium”, which means that the game is based on rotation and does not require real -time reactions, which fight the current models of anthropologist. It was also the first video game ever, in the original Game Boy game, after getting it for Christmas in 1997. “It has a special place in my heart,” says Hershey.

Hershey’s comprehensive goal in this research was to study how to use Claude as a factor – work independently to carry out complex tasks on behalf of the user. Although it is unclear what is the previous knowledge that Claude has about Pokémon from its training data, its system is the minimum level according to the design: you are Claude, you are playing Pokemon, here are the tools that you have, and you can click on the buttons on the screen.

“Over time, I was going back and deleting all the things of Pokemon that I only can because I think it’s really interesting to know how much the model can discover on its own,” says Hershey.

When Claude 3.7 Sonnet played the game, I faced some challenges: He spent.Dozens of hours“Stalled in one of the cities and was facing a problem in identifying non -players, which withdrew greatly in the game. With Claude 4 Obus, Hershey noticed an improvement in Claude’s long -term memory and long -term planning capabilities, as he saw it in improving the skills that indicate that there is an increase in the skills that indicate that there are more skills. There are no immediate reactions, showing a new level of cohesion, It means that the model has a better ability to stay on the right track.

“This is one of my favorite ways to get to know a model. Like, I understand what his strengths are, and what are its weaknesses,” says Hershey. “It is my way to reach this new model that we are about to launch, and how to work with it.”

Everyone wants an agent

How do we understand the decisions made by artificial intelligence when approaching complex tasks, and pushing them in the right direction?

The answer to this question is an integral part of the progress of artificial intelligence agents in the industry-AII that can address complex tasks with relative independence. In Pokemon, it is important that the model does not lose context or “forgetting” the mission offered. This also applies to artificial intelligence agents who asked to automate the workflow – until it takes hundreds of hours.

Everyone wants an agent

Leave a ReplyCancel Reply

Trending now