Anthropic’s safety warnings may have backfired, with the government shutting down its most powerful AI software

The US government on Friday ordered Anthropic to immediately close access to two of its most powerful AI models – Claude Fable 5 and Claude Mythos 5 – due to national security concerns. Anthropic It was announced in X They complied, but made clear that they believed the government I got this error.

The directive, which Anthropic said it received on Friday at 5:21 p.m. ET, forces the company to disable both models for all users worldwide — not just the foreign nationals the government’s export control order was nominally targeting. Access to other Anthropic models is not affected.

Why does any of this matter? Mythos is Anthropic’s most capable AI model, and one of the company’s It was previewed in early April And keep it Tightly bound Since then due to what Anthropic describes as its exceptional ability to discover vulnerabilities in software. According to Anthropic, Mythos identified flaws in every major operating system and web browser it tested, so instead of releasing it broadly, the company launched a controlled program called Project Glasswing, and shared it with nearly 50 vetted organizations, including Amazon, Apple, Google, Microsoft, and CrowdStrike, for use in defensive cybersecurity work.

Fable 5, released just three days ago, was Anthropic’s answer to apparent commercial pressure: a version of Mythos fitted with a guardrail that prevented responses in high-risk areas like cybersecurity and biology, making it safe enough for general release, the company claimed. It was immediately the most capable AI model available to the public, according to benchmark tests conducted by Vals AI, a company that tracks the performance of AI technology.

The government’s guidance was put in place as an export control measure, restricting foreign nationals’ access to the forms. But in A Long blog postAnthropic says its understanding is that the primary concern is an alleged jailbreak for Fable 5. The company says the government has so far provided only verbal evidence of a “potentially narrow, non-universal jailbreak” — which, as Anthropic describes, amounts to prompting the model to read a specific code base and identify software flaws. Incidentally, the company adds, it’s a “level of capability” that’s already widely available in other publicly accessible models, including OpenAI’s GPT-5.5. Anthropic notes that it is also routinely used by cybersecurity professionals for defensive purposes.

Anthropic’s broader argument is that its strongest safeguards operate through independent rating systems that operate separately from the model itself, meaning that even if someone convinces Fable to continue talking after a rejection, basic protection against riskier outputs remains in place. The company also notes in its post that a recent usage review found no evidence that those safeguards were successfully bypassed to produce authentic and harmful content.

Clearly, none of this was enough to stop the government from acting, and Anthropic makes no secret of her frustration. “We disagree that finding a potentially limited jailbreak should be a reason to remind a widespread business model for hundreds of millions of people,” the company wrote. “If this standard were implemented across the industry, we believe it would essentially halt all new model deployments for all frontier model providers.”

Anthropic is widely expected to seek an IPO this year, and has staked much of its public identity on being the safety-conscious alternative to its competitors. The irony is not lost on observers that the extreme caution Anthropic showed in restricting Mithos — which it promoted as a model too dangerous to be published publicly — now appears to have attracted exactly the kind of government scrutiny that could most disrupt its work.

OpenAI’s Sam Altman must be at least enjoying this. In April, he told podcaster Ashley Vance that Anthropic’s handling of Mythos was “aFear-based marketing“It’s obviously incredible marketing to say: ‘We’ve made a bomb.’ We were about to drop it on your head. “We’ll sell you a bomb shelter for $100 million,” Altman said. Altman, whose company is widely expected to seek an IPO as soon as possible, did not anticipate a government shutdown, but he identified something that has come back to bite Anthropic at the moment, which is that when you spend months telling the world that your AI is uniquely dangerous, the world — including the U.S. government — tends to listen.

When you make a purchase through the links in our articles, We may earn a small commission. This does not affect our editorial independence.

Leave a ReplyCancel Reply

Trending now