Claude’s new Anthropic “constitution”: Be useful and honest, and do not destroy humanity


Anthropic repairs Claude So called “Spirit document.”

The new letter is a 57-page document entitled “Claude Constitution“, which details the “humanizing intentions of the model’s values ​​and behaviour”, aimed not at outside readers but at the model itself. The document is designed to articulate Claude’s “moral character” and “essential identity”, including how she should balance conflicting values ​​and high-stakes situations.

where The previous constitutionpublished in May 2023, was largely a list of guidelines, and Anthropic now says it’s important for AI models to “understand” Why “We want them to behave in certain ways rather than just specifying what we want them to do,” according to the statement. The document prompts Claude to act as a largely independent entity that understands itself and its place in the world. Anthropic also allows for the possibility that “Claude may have some kind of awareness or moral status” – partly because the company believes that telling Claude this might make him behave better. …may affect Claude’s integrity, judgment, and safety.

said Amanda Askill, Anthropic’s resident philosopher, who led the development of the new “constitution.” Edge that there is a specific list of strict restrictions imposed on Claude’s behavior in relation to things that are “very extreme” – including providing “serious support to those who seek to create biological, chemical, nuclear or radiological weapons with the potential for mass casualties”; and providing “meaningful support for attacks on critical infrastructure (electricity grids, water systems, financial systems) or safety-critical systems.” (However, the “serious step up” language seems to imply that contributing a certain level of assistance is acceptable.)

Other strict restrictions include not creating cyberweapons or malicious code that could be linked to “serious harm,” not undermining Anthropic’s ability to oversee them, not helping individual groups seize “unprecedented and unlawful degrees of absolute societal, military, or economic control” and not creating child sexual abuse material. Final? Not to “participate in or assist in the attempt to kill or degrade the vast majority of humanity or the human species.”

There is also a list of comprehensive “core values” identified by Anthropic in the document, and Claude was instructed to treat the following list as a descending order of importance, in cases where these values ​​might conflict with each other. They include being “broadly safe” (i.e. “not undermining appropriate human mechanisms for supervising the actions of AI”), “broadly ethical”, “compatible with anthropological guidelines”, and “genuinely useful”. This includes upholding virtues such as “honesty,” including the instruction that “factual accuracy and comprehensiveness when asked about politically sensitive topics provides the best case for most viewpoints if asked, attempting to represent multiple viewpoints in cases where there is a lack of empirical or moral consensus, and adopting neutral terminology rather than politically loaded terminology where possible.”

The new document confirms that Claude will face difficult moral dilemmas. One example: “Just as a human soldier might refuse to shoot peaceful protesters, or an employee might refuse to violate antitrust law, Claude should refuse to assist in actions that would help concentrate power in illicit ways. This is true even if the request comes from the Anthropic itself.” Anthropic specifically warns that “advanced AI may provide unprecedented degrees of military and economic superiority to those who control the most capable systems, and that the resulting power may be used in disastrous ways.” This concern has not prevented Anthropic and its competitors from marketing products directly to the government Giving the green light to some military use cases.

With so many high-stakes decisions and potential risks, it’s easy to wonder who was involved in making these difficult decisions – did Anthropic bring in outside experts, individuals from vulnerable communities and minority groups, or outside organizations? When asked, Anthropic declined to provide any details. The company doesn’t want to “put the burden on other people… it’s actually the responsibility of the companies that are building and deploying these models to bear the burden,” Askill said.

Another part of the statement that stands out is the part related to Claude’s “consciousness” or “moral status.” Anthropic says the document “expresses our uncertainty about whether Claude might have some kind of consciousness or moral status (either now or in the future).” It’s a thorny topic that has sparked conversations and set off alarm bells for people in many different fields — those who care about “paradigm luxury,” those who believe they’ve discovered “emergent beings” inside chatbots, and those who have It escalated further In mental health struggles and even death after believing that a chatbot exhibits some form of awareness or deep compassion.

Beyond Claude’s theoretical benefits, Askell said that anthropology should not “totally reject” the topic “because I also think people won’t take it, necessarily, seriously, if you’re saying: ‘We’re not even open to this, we’re not investigating it, we’re not thinking about it’.”

Follow topics and authors From this story to see more like this in your personalized homepage feed and receive email updates.


Leave a Reply

Your email address will not be published. Required fields are marked *