Anthropic has a plan to prevent its AI from creating a nuclear weapon. Will you succeed?

In the end August, artificial intelligence company Anthropic Announce That’s his chatbot Claude It won’t help anyone build a nuclear weapon. According to Anthropic, it partnered with the Department of Energy (DOE) and the National Nuclear Security Administration (NNSA) to ensure that Claude would not reveal nuclear secrets.

Nuclear weapons manufacturing is an exact science and a solved problem. Much of the information about America’s most advanced nuclear weapons is top secret, but the original nuclear science is 80 years old. north korea It demonstrated that a specialized country interested in obtaining the bomb could do so, and did not need the help of a chatbot.

How exactly did the US government work with an AI company to make sure the chatbot didn’t divulge sensitive nuclear secrets? Also: Was there a risk that the chatbot would help someone build a nuclear weapon in the first place?

The answer to the first question is that it was used Amazon. The answer to the second question is complex.

Amazon Web Services (AWS) Offers. Top secret cloud services For government clients where they can store sensitive and confidential information. The Department of Energy already had several of these servers when it started working with Anthropic.

“We deployed a frontier version of Claude at the time in a top-secret environment so that NNSA could systematically test whether AI models could create or exacerbate nuclear risks,” Marina Favaro, who oversees national security policy and partnerships at Anthropic, told WIRED. “Since then, NNSA has red-teamed Claude’s successive models in its secure cloud environment and provided us with feedback.”

The National Nuclear Security Administration’s red teaming process — testing for vulnerabilities — helped Anthropic and American nuclear scientists develop a proactive solution to chatbot-enabled nuclear programs. Together, they “developed a nuclear classifier, which you can think of as a sophisticated filter for AI conversations,” says Favaro. “We built it using a list developed by the Nuclear Security Administration of nuclear risk indicators, specific topics, and technical details that help us identify when a conversation might veer into harmful territory. The list itself is controlled but not secret, which is crucial, because it means that our technical staff and other companies can implement it. “

Favaro says it took months of tweaking and testing to get the classifier to work. “It captures troubling conversations without reference to legitimate discussions about nuclear energy or medical isotopes,” she says.

Leave a ReplyCancel Reply