Cybersecurity researchers aren’t happy about Anthropic’s Fable guardrails


Anthropic Released the latest model Fable On Tuesday, it described it as a generic, limited version of the powerful and controversial Mythos cybersecurity model.

But not everyone is happy with the restrictions, and number to Cyber ​​security Researchers and Professionals It was broadcast Complaints connected.

“(Fable) casually rejects any potentially relevant online request. Even tasks as innocuous as reading a blog post,” He said Valentina “Chompi” Palmiotti, a well-known security researcher working at IBM X-Force.

When a message triggers its guardrails, Fable pauses the chat and says its “safety procedures have limited this message to cybersecurity or biology topics.”

Guardrails are in place to reduce the risk of Fable being used to develop malware or hacking software – A long-term concern Within Anthropy. The limitations of biology come from a similar concern about it Development of biological weapons.

when The AI ​​giant launched Mythos In April, it limited the model to what it called a limited number of companies and organizations Glasswing Projectan attempt to spread the model to secure critical software and infrastructure. last week, Expanding Anthropic Access to Mythos To hundreds of organizations in 15 countries.

But despite the good intentions, many cybersecurity experts remain troubled by the arbitrary nature of the restrictions. “If you ask him to write secure code, he assumes this is cybersecurity work rather than software engineering best practices, and you get demoted,” Matt Suiche, a veteran cybersecurity expert, told TechCrunch. Fable is programmed to revert to Claude Opus 4.8 if it hits a guardrail. “It seems to be keyword-based, so anything in the lexical domain of ‘cybersecurity’ triggers guardrails.”

Contact us

Do you have more information about how hackers use AI? Or how cybersecurity companies use artificial intelligence? We would love to hear from you. From a device and network outside of work, you can contact Lorenzo Franceschi-Bicchierai securely on Signal at +1 917 257 1382, or via Telegram and Keybase @lorenzofb, or Email.

“But it’s understandable because we’re still in the early days and they’re still adapting their guardrails,” said Suish, a member of the technical staff at Tolmo, an AI cybersecurity startup. “I’m sure they will evolve over time as Anthropic and other model companies will collaborate more with the current new generation of cybersecurity companies.” “It is better to catch more people than not catch enough when you do a release like this and relax the guardrails over time.”

Another researcher catch On X, “even requesting a code review” triggers Fable’s guardrails.

Anthropic did not immediately respond to a request for comment.

Regardless of the guardrails within its models, Anthropic requires cybersecurity professionals to apply for this position Cyber ​​verification program. If they receive approval, applicants will have fewer restrictions on using Cloud for cybersecurity work. OpenAI has a similar program called Reliable access to cyber.

When you make a purchase through the links in our articles, We may earn a small commission. This does not affect our editorial independence.

Leave a Reply

Your email address will not be published. Required fields are marked *