The math on artificial intelligence agents doesn’t add up

Great artificial intelligence Companies We promised The year 2025 will be the “Year of AI Agents.” It turned out to be common We talk about it AI agents, and start implementing that transformative moment until 2026 or perhaps later. But what if the answer to the question was “When will our lives become fully automated with generative AI robots doing our tasks for us and essentially running the world?” It is, like that New Yorker Cartoons“What about never?”

That was basically the message of a research paper published without much fanfare a few months ago, in the middle of a hyped-up year for “agent AI.” “Deserved”Hallucinating stations: On some basic limitations of transformer-based models of language. It aims to show mathematically that “LLMs are unable to carry out computational and agentic tasks beyond a certain complexity.” Although the science is beyond my abilities, the authors—a former CTO at SAP who studied artificial intelligence under one of the field’s founding minds, John McCarthy, and his teenage prodigy son—have punctured the vision of an agentic paradise with mathematical certainty. Even inference models that go beyond the MBA word prediction process won’t solve the problem, they say.

“There is no reliable way,” Vishal Sikka, the father, told me. After a career that included, in addition to SAP, a stint as CEO of Infosys and a member of Oracle’s board, he currently heads an AI services startup called Viana. “Should we forget the AI agents that run nuclear power plants?” I ask. “Exactly,” he says. Maybe you can have him submit some paperwork or something to save time, but you may have to succumb to some mistakes.

But the AI industry is different. For one thing, the big success in AI has been programming, which took off last year. Just this week in Davos, Google’s Nobel Prize-winning head of artificial intelligence, Demis Hassabis, Breakthroughs reported In minimizing the hallucination, hyperscalers and startups alike push the customer narrative. Now they have some backup. A startup company called harmonic Announces a major breakthrough in AI coding that is also based on mathematics, and tops standards credibility.

Harmonic, co-founded by Robinhood CEO Vlad Tenev and Todor Achim, a Stanford-trained mathematician, claims that this latest improvement to its product called Aristotle (no arrogance there!) is an indication that there are ways to ensure the trustworthiness of AI systems. “Are we doomed to be in a world where AI generates waste and humans can’t really verify it? That would be a crazy world,” Achim says. The combinatorial solution is to use formal methods of mathematical reasoning to verify the outputs of the LLM. Specifically, it encodes the output in the Lean programming language, which is known for its ability to verify coding. Harmonic’s focus to date has certainly been limited: its main mission is the pursuit of “superior mathematical intelligence,” and programming is a fairly organic extension. Things like historical articles – which cannot be verified mathematically – go beyond their limits. for now.

However, Achim doesn’t seem to think trusted customer behavior is as much of a problem as some critics think. “I would say that most models at this point have the level of pure intelligence required to think through booking an itinerary,” he says.

Both sides are right, or perhaps on the same side. On the one hand, everyone agrees that hallucinations will remain an uncomfortable reality. in A research paper published last September, “Despite significant progress, hallucinations continue to sweep the field, and persist in state-of-the-art forms,” the OpenAI scientists wrote. They substantiate this unhappy claim by asking three models, including ChatGPT, to provide the title of the lead author’s thesis. All three composed false titles and all misreported the year of publication. In a blog post about the research, OpenAI stated bleakly that in AI models, “accuracy will never reach 100 percent.”

Leave a ReplyCancel Reply