Anthropic apologizes for Claude Fable’s invisible guardrails


Anthropic has apologized for surreptitiously choking New AI model, Claude Fable 5with hidden guardrails that undermine researchers and competitors who use them to develop competing systems. The company says it is reversing course and will be more transparent about when restrictions will begin, even if it means Fable rejects further inquiries.

Fable is the first widely available model in Anthropic’s Mythos category of AI systems, a group the company has spent months warning about Too dangerous for public release. Anthropic says it has addressed some of these risks by launching Fable with safeguards that prevent it from responding to certain “high-risk” inquiries. One of the anthropic regions He said What would limit Fable’s responses is distillation, a technique for training smaller AI models using the output of larger models.

in Fable system card – A public document released by the AI ​​developers to explain how the system works – Anthropic said it will handle queries that it believes are attempts at distillation by directly changing and degrading the model’s answers. Users will not be notified that they have activated the safety action or informed that responses have been changed.

Anthropic He said It’s now changing its distilling style: inquiries will now return to Claude Opus 4.8, Anthropic’s Previous main model“You’ll see this every time this happens,” the company said in a post on X. Anthropic will also prominently tell users: “You’ll see this every time it happens.”

This is similar to how Fable handles queries in other high-risk areas. When safety features are turned on in domains like biology, chemistry, and cybersecurity, queries are routed through Opus 4.8 unless they are completely blocked by the company’s broader safety rules, such as those covering drugs, weapons, or other prohibited content. In some cases, particularly biology, guarantees have been calibrated quite as broadly as Fable Practically unusable even for basic queriesThis is something the Anthropist acknowledged in a comment on Edge.

“Visual guarantees are verifiable, so they must be robust, which takes a long time to get right,” Anthropic wrote. “Invisible collateral can be targeted more narrowly, allowing us to ship quickly with very few false positives. We chose invisible collateral for this reason – and that was the wrong trade-off. You have to have a clear view of what collateral we have, and why. We’re sorry we didn’t strike the right balance.”

Change follows Intense backlash from the AI ​​research community about Anthropic’s decision to silently limit users suspected of trying to turn Fable into competing models — a protection that critics warned could also impact third parties trying to evaluate the frontier model. In the order card, Anthropic said the ability of newer models to accelerate AI development justifies targeting those requests, noting that “using Claude to develop competing models already violates our terms of service.” Anthropic has already mentioned accused Chinese competitors, such as DeepSeek, unfairly distill their models to an “industrial” scale.

Leave a Reply

Your email address will not be published. Required fields are marked *