Amazon’s bet that AI standards don’t matter

This is an excerpt from Sources by Alex Heatha newsletter about artificial intelligence and the technology industry, is only distributed to The Verge subscribers once a week.

Amazon’s head of AI has a message for model benchmark nerds: Stop looking at leaderboards.

“I want real-world utility. “None of these benchmarks are real,” Rohit Prasad, senior vice president of AGI at Amazon, told me before today’s announcements at AWS re:Invent in Las Vegas. “The only way to get real measurement is for everyone to commit to the same training data and for the evaluations to be completely retained. This is not what happens. Reviews have become frankly noisy, and do not show the true power of these models. “

It’s a paradoxical situation when every other AI lab is quick to brag about how their new models are quickly climbing the leaderboards. It’s also convenient for Amazon, since the previous version of the Nova, its flagship model, was sitting at #79 on LMArena when Prasad and I spoke last week. However, rejecting standards only works if Amazon can present a different story about what progress looks like.

“They don’t show the true power of these models.”

The main focus of re:Invent’s announcements today is Nova Forge, a service that Amazon claims allows companies to train custom AI models in ways that were previously impossible without spending billions of dollars. The Forge titles problem is real. Most companies trying to customize AI models face three bad choices: improve a closed model (but only at the edges), train on open-weighted models (but without the original training data and risk degradation of ability, as the AI becomes expert at the new data but forgets the broader original skills), or build a model from scratch at enormous cost.

Forge offers something else: access to Amazon’s Nova Form checkpoints in the pre-training, mid-training, and post-training phases. Companies can inject their own data early in the process, when the model’s “learning ability” is at its highest, Prasad says, rather than just modifying the model’s behavior at the end.

“What we’ve done is democratize AI and develop frontier models for your use cases at fractions of the (previous) cost,” Prasad said. Forge was created because Amazon’s internal teams wanted a tool to inject their domain expertise into a basic model without having to build from scratch.

“We built Forge because our internal teams wanted Forge,” he said. It’s a familiar Amazon pattern. AWS itself started out as infrastructure designed for Amazon’s retail operations before becoming the company’s profit engine.

Reddit Forge is used to create custom security models trained on 23 years of community moderation data. “I haven’t seen anything like this yet,” Chris Slough, Reddit’s CTO and chief employee, told me. “We had an outstanding engineer who was like a kid in a candy store.”

Reddit ran an ongoing pre-training post last week that “looks really promising,” Slow said. The goal: to replace multiple ad hoc safety models with a single Reddit expert model that understands the nuances of societal moderation, including the infamous subjective rule that pops up across subreddits everywhere: “Don’t be an asshole.”

“Having an expert model, the community will understand,” Slough said. “She’ll have a good idea what a jerk means.”

This is the theme Amazon wants developers to embrace: not raw IQ, but control and specialization.

He explained that Forge enables Reddit to control its models, avoid surprises from API changes, retain ownership of its weights, and avoid sending sensitive data to third-party model providers. Reddit is already exploring using the same approach with Reddit Answers and other products, he said.

When I asked Slow whether it mattered that Nova was not a top-tier model by standards, he was blunt: “In this context, what matters is the Reddit experience of the model.” This is the theme Amazon wants developers to embrace: not raw IQ, but control and specialization.

With Forge, Amazon is making a calculated bet that model racing has been commoditized and can succeed by being the place where companies can build specialized AI for specific business problems. It’s a vision of the world shaped like AWS: infrastructure instead of intelligence, and customization instead of raw capabilities. The strategy also allows Amazon to avoid direct comparisons with OpenAI and Anthropic, both of which it has done so before He hopes to compete in the model class.

Whether Forge is truly groundbreaking or just clever positioning depends, of course, on developer credit. Amazon insists that model race, as widely understood, does not matter. If this turns out to be true, the scoreboard will turn to something quieter and harder to play: whether AI models actually provide benefit in the real world.

Follow topics and authors From this story to see more like this in your personalized homepage feed and receive email updates.

Alex Heath

Leave a ReplyCancel Reply