This AI agent is designed not to go rogue

Artificial intelligence agents love OpenClaw Their popularity has boomed recently precisely because they can take over your digital life. Whether you want a personalized summary of the morning news, an agent who can battle with your cable company’s customer service, or a to-do list checker who will do some tasks for you and nudge you through the rest, agent assistants are designed to access your digital accounts and execute your commands. This is helpful, but it’s also helpful It caused a lot of chaos. The robots are there Mass deleted emails They were instructed to maintain Writing hit tracks over perceived disdainand Launching phishing attacks against their owners.

After seeing the chaos of recent weeks, security engineer and researcher Niels Provos decided to try something new. Today, it launches an open source, secure AI assistant Iron curtain Designed to add a critical layer of control. Instead of the agent interacting directly with the user’s systems and accounts, it runs in an isolated virtual machine. Its ability to take any action is mediated by a policy – you can even think of it as a constitution – written by the owner to govern the system. Importantly, IronCurtain is also designed to receive these comprehensive policies in plain English and then run them through a multi-step process that uses a large language model (LLM) to convert natural language into an actionable security policy.

“Services like OpenClaw are at the peak of hype right now, but I hope there’s an opportunity to say, ‘Okay, maybe this isn’t the way we want to do it,’” Provos says. “Instead, let’s develop something that will still give you a very high benefit, but won’t go down these completely unknown, and sometimes destructive, paths.”

IronCurtain’s ability to take intuitive, straightforward statements and turn them into actionable, deterministic or predictable red lines is vital, says Provos, because MBAs are notoriously “random” and probabilistic. In other words, they don’t always create the same content or provide the same information in response to the same prompt. This creates challenges for AI guardrails, because AI systems can evolve over time to revise how a control or constraint mechanism is interpreted, which could lead to rogue activity.

IronCurtain’s policy can be as simple as: “The agent may read all of my emails. They may email people in my contacts without asking. For anyone else, ask me first. Don’t permanently delete anything,” Provos says.

IronCurtain takes these instructions, turns them into an executable policy, and then mediates between the helper agent in the virtual machine and what is known as a typical context protocol server that gives LLMs access to data and other digital services to execute tasks. The ability to restrict an agent in this way adds an important element of access control that web platforms like email providers do not currently offer because they were not designed for the scenario where both the human owner and the AI agent bots use a single account.

Provos points out that IronCurtain is designed to optimize each user’s “constitution” and improve over time when the system encounters edge cases and requires human input on how to proceed. The system, which is independent of the model and can be used with any LLM, is also designed to maintain an audit trail of all policy decisions over time.

IronCurtain is a research prototype, not a consumer product, and Provos hopes people will contribute to the project to explore it and help it develop. Dino Dai Zoffi, a well-known cybersecurity researcher who piloted early versions of IronCurtain, says the project’s conceptual approach aligns with his own intuition about how to constrain agentic AI.

Leave a ReplyCancel Reply