Why does artificial intelligence fail? Wired

However, models are improving much faster than efforts to understand them. The Anthropic team acknowledges that as AI agents proliferate, the lab’s theoretical criminality is closer to reality than ever before. If we don’t break the black box, it may break us.

“Most of my Life has been focused on trying to do the things I think are important. When I was 18, I left university to support a friend accused of terrorism, because I think it’s very important to support people when others don’t. When he was proven innocent, I saw that deep learning would impact society, and dedicated myself to finding out how humans could understand neural networks. I’ve spent the last decade working on this because I believe it could be one of the keys to making AI safe.

So begins Chris Olah’s song “Date Me Doc,” which he posted on Twitter in 2022. He’s no longer single; document It remains on his Github site “because it was an important document for me,” he writes.

Olah’s description leaves out a few things, including that although he does not have a college degree, he is one of the founders of anthropology. A less significant omission is that he was awarded a Thiel Fellowship, which awards $100,000 to talented dropouts. “It gave me a lot of flexibility to focus on whatever I thought was important,” he told me in a 2024 interview. Motivated by reading articles in WIRED magazine, among other things, he tried to build 3D printers. “At 19, one doesn’t necessarily have the best tastes,” he admitted. Then, in 2013, he attended a series of seminars on deep learning and got excited. He left the sessions with a question that no one seemed to ask: What happens in those systems?

Olah had difficulty getting others interested in the question. When he joined Google Brain as an intern in 2014, he worked on a strange product called Deep dreamAn early experiment in image generation with artificial intelligence. The neural network produced strange, psychedelic patterns, as if the program were on drugs. “We didn’t understand the results,” Olah says. “But one thing they showed is that there is a lot of structure within neural networks.” He concluded that at least some elements could be understood.

I first set out to find such items. He participated in establishing a scientific journal called distillation To bring “more transparency” to machine learning. In 2018, he and a few colleagues at Google published a paper in Distill called “The Building Blocks of Explainability.” They have determined, for example, that specific neurons encode the concept of floppy ears. From there, Olah and his colleagues were able to figure out how the system knew the difference between a Labrador retriever dog and a leopard cat, for example. They acknowledged in their paper that this was just the beginning of deciphering neural networks: “We need to make them human-scale, rather than just massive repositories of information.”

The paper was Ola’s swan song at Google. “There was actually a feeling at Google Brain that you’re not very serious if you’re talking about the safety of AI,” he says. In 2018, OpenAI offered him the opportunity to form a permanent team on interpretability. jump. Three years later, he joined a group of colleagues at OpenAI to co-found Anthropic.

Leave a ReplyCancel Reply