‘Holy Shit’: Gemini 3 wins the AI race

When the release of an AI model instantly spawns memes and theses declaring that the rest of the industry has been cooked, you know you have something worth analyzing.

Google Gemini 3 was released on Tuesday to widespread fanfare. Company Named The model, “A New Age of Intelligence,” was integrated into Google Search on day one for the first time. It has outperformed OpenAI and other competitors’ products in a range of benchmarks, topping the charts in LMArena, a crowdsourced AI evaluation platform that is essentially a Billboard Hot 100 ranking for AI models. Within 24 hours of its launch, more than 1 million users experienced Gemini 3 in Google AI Studio and the Gemini API, according to Google. “From a day-one adoption standpoint, (it’s) the best we’ve seen from any of our model releases,” said Logan Kilpatrick of Google DeepMind, a product lead for Google AI Studio and Gemini API. Edge.

Even OpenAI CEO Sam Altman and xAI CEO Elon Musk publicly congratulated the Gemini team for a job well done. and Salesforce CEO Marc Benioff books That after using ChatGPT every day for three years, spending two hours on Gemini 3 changed everything: “Oh my God… I’m not going back. The jump is crazy – the thinking, the speed, the images, the video… everything is clearer and faster. It’s as if the world just changed, again.”

“This is more than just a change to the leaderboard,” said Wei-Lin Chiang, co-founder and CTO of LMArena. Xiang said Edge Gemini 3 Pro holds a “clear lead” in professional categories including programming, matching, and creative writing, and its powerful programming capabilities “now outperform in many cases the best programming paradigms such as Claude 4.5 and GPT-5.1.” It also placed first in visual comprehension and was the first model to surpass a score of nearly 1,500 on the platform’s text leaderboard.

The new model’s performance “demonstrates that the AI arms race is being shaped by models that can reason more abstractly, generalize more consistently, and deliver reliable results across an increasingly diverse set of real-world assessments,” Chiang said.

Alex Conway, Principal Software Engineer at DataRobot, said: Edge One of the most notable developments in Gemini 3 was related to a specific reasoning standard called ARC-AGI-2. Gemini scored nearly twice as much as OpenAI’s GPT-5 Pro while running at one-tenth the cost per mission, which “really challenges the idea that these models are in a steady state,” he said. And in the SimpleQA benchmark — which includes simple questions and answers on a wide range of topics, and requires a lot of specialized knowledge — Gemini 3 Pro scored more than double what OpenAI’s GPT-5.1 did, Conway noted. “Use case wise, it would be great for a lot of specialized topics and delving into the latest research and scientific areas,” he said.

But leaderboards aren’t everything. It is possible – and in the high-pressure world of AI, tempting – to train a model to narrow standards rather than general-purpose success. So, to know how well a system performs, you have to rely on real-world testing, anecdotal experience, and complex real-world use cases.

Edge I spoke with professionals in various disciplines who use AI every day at work. Consensus: The Gemini 3 looks impressive, and does a great job at a wide range of tasks — but when it comes to high-end cases and specialized aspects of some industries, many pros won’t be replacing their current models with it any time soon.

The majority of people Edge They spoke with a plan to continue using Anthropic’s Claude for their software needs, despite the progress Gemini 3 had made in this area. Some also said that Gemini 3 is not optimal in terms of user interaction. Although it’s a “great model,” it’s a bit sloppy when it comes to user experience, meaning it “doesn’t follow instructions precisely,” said Tim Detmers, an assistant professor at Carnegie Mellon University and a research scientist at Ai2.

Tulsi Doshi, senior director of product management at Google DeepMind, said to Gemini and Gen Media Edge That the company prioritized bringing Gemini 3 to a variety of Google products “in a very real way.” When asked about concerns about following instructions, she said it’s helpful to know “where people are hitting some sticking points.”

She also said that since the Pro model is the first edition in the Gemini 3 range, subsequent models will help “overcome this concern.”

Joel Hron, chief technology officer at Thomson Reuters, said the company has its own internal standards developed to rank both its internal models and public models in the areas most relevant to their work — such as comparing two documents up to several hundred pages long, interpreting a long document, understanding legal contracts, and reasoning in legal and tax areas. So far, Gemini 3 has performed strongly in all of these releases and is “a big jump from where Gemini 2.5 was,” he said. It also currently outperforms many Anthropic and OpenAI models in some of these areas.

In terms of “pure numbers,” Gemini 3 is “very exciting,” said Louis Blankemeier, co-founder and CEO of Cognita, a radiology AI startup. But he said: “We still need some time to see how useful this model is in the real world.” For more general fields, Gemini 3 is a star, Blankemeyer said, but when he played with it in radiology, he found it difficult to correctly identify tiny rib fractures on chest X-rays, as well as uncommon or rare conditions. He calls Rays similar to self-driving cars in many ways, with a lot of edge cases — so a newer, more powerful model may not be as effective as an older model that has been refined and trained on custom data over time. “The real world is much more difficult,” he said.

Likewise, Matt Hoffman, head of AI at Longeye, a company that provides AI tools for law enforcement investigations, sees promise in the Gemini 3 Pro-powered Nano Banana Pro image generator. Image generators allow Longeye to create compelling synthetic datasets for testing, allowing it to keep real, sensitive investigation data secure. But while the benchmarks are impressive, they may not match the company’s actual use cases. “I’m not confident that Longeye can replace the model we use in production with the Gemini 3 and see immediate improvements,” he said.

Other companies also say they’re excited about Project Gemini, but not necessarily using it to replace everything else. Build, a construction lending startup, currently uses a combination of foundational models from Google, Anthropic, OpenAI and others to analyze construction pull requests — a package of documents often sent to a construction lender, such as invoices and proof of work completed, requesting payment of funds. This requires multi-modal analysis of text and images, as well as a large contextual window into the master agent that delegates tasks to others, said Vice President of Engineering Thomas Schlegel. Edge. This is part of what Google promised with the Gemini 3, so the company is currently exploring the possibility of switching it to the 2.5.

“In the past, we’ve found Geminis to be best at multi-purpose tasks, and 3 seems to be a big step forward in the same vein,” Schlegel said. “It’s everything we love about Gemini on steroids.” But he doesn’t yet think it will replace all other models, including Claude for programming tasks and OpenAI products for business reasoning.

For Tanmay Gopal, co-founder and CEO of AI agent platform PromptQL, the Gemini 3 hype is valid, but it’s “certainly not the end of anything” for Google’s competitors. AI models are getting better and cheaper, and since they are in rapid release cycles, “one is always ahead of the pack for a while.” (For example, the day after Gemini 3 was released, OpenAI Released GPT-5.1-Codex-Max, an update to a week-old model, ostensibly to challenge Gemini 3 in some coding standards.)

Gopal said PromptQL is still working on internal evaluations to determine how to change its typical team options, if at all, but “the initial results don’t necessarily show something significantly better” than the current lineup. He said his current preference is Claude for code generation, ChatGPT for web search, and GPT-5 Pro for “deep brainstorming,” but he may incorporate Gemini 3 as a default model, because it is “probably best in class for consumer tasks across creative, text, (and) image.”

And like almost all models, the Gemini 3 had moments that I’ll call “robotic hand syndrome” — when the AI system does something complex with flying colors but is stunned by the simplest query, similar to the robotic hands of yesteryear having trouble gripping a soda can. Renowned researcher Andrei Karpathy, who was a founding member of OpenAI and former Director of AI at Tesla, books On He refused to believe him When he said it was the year 2025, he later said she had forgotten to run a Google search. (Be sure that in early tests, he may have been given a form with an outdated prompting system.)

in EdgeOwn experience Gemini test 3we found that it “delivers reasonably good performance – with a few caveats.” It probably won’t stay at the top forever, but it represents a clear step for the company.

“You’re kind of in this game of jumping from model to model, month to month, when a new model drops,” Hron said. “But what stuck out to me about Google’s version is that it delivers substantial improvements across many dimensions of models — so it’s not like they’ve gotten better at programming or they’ve gotten better at reasoning… they’ve really gotten a little bit better across the board.”

Follow topics and authors From this story to see more like this in your personalized homepage feed and receive email updates.

Haydenfield

‘Holy Shit’: Gemini 3 wins the AI race — for now

Leave a ReplyCancel Reply