CEO ElevenLabs: Voice is the next frontier for AI

Voice has become the next major interface for AI — the way people will increasingly interact with machines as models move beyond text and screens, says ElevenLabs co-founder and CEO Matej Staniszewski.

He speaks in Web Summit in DohaStaniszewski told TechCrunch that vocal models like the one developed by ElevenLabs have recently moved beyond simply mimicking human speech — including emotion and intonation — to working alongside the reasoning capabilities of larger language models. The result, he said, is a shift in how people interact with technology.

“In the coming years, we hope that all our phones will return to our pockets, and we will be able to immerse ourselves in the real world around us, with voice as the mechanism that controls the technology,” he said.

This vision has fueled ElevenLabs An increase of $500 million This week it’s worth $11 billion, and is increasingly shared across the AI industry. OpenAI and Google Both have made audio a main focus of their next-gen models, while Apple appears to be quietly building adjacent audio technologies that always work through… Acquisitions such as Q.ai. As AI spreads into wearables, cars and other new devices, control is becoming less by tapping on screens and more by speaking, making voice a key battleground for the next stage of AI development.

Seth Pierrepont, general partner of Iconiq Capital, echoed this sentiment on stage at Web Summit, arguing that while displays will continue to be important for gaming and entertainment, traditional input methods like keyboards are starting to feel “old-fashioned.”

As AI systems become more effective, the interaction itself will also change, Pierrepont said, as models gain the guardrails, integration and context needed to respond to less explicit prompts from users.

Staniszewski pointed to this active shift as one of the biggest changes underway. Instead of spelling out every instruction, he said future voice systems will increasingly rely on persistent memory and context accumulated over time, making interactions feel more natural and require less effort from users.

TechCrunch event

Boston, MA
|
June 23, 2026

He added that this development will affect how audio models are deployed. While high-quality audio models largely live in the cloud, Staniszewski said ElevenLabs is working on a hybrid approach that blends cloud processing with on-device processing — a move aimed at supporting new devices, including headphones and other wearables, where audio becomes a constant companion rather than a feature you decide when to engage with.

ElevenLabs is already collaborating with Meta to bring its audio technology to products including Instagram and Horizon Worlds, the company’s virtual reality platform. Staniszewski said he would also be open to working with Meta on Ray-Ban smart glasses as audio interfaces expand into new form factors.

But as voice becomes more persistent and integrated into everyday devices, it opens the door to serious concerns about privacy, surveillance, and the amount of personal data that voice-based systems will store as they move closer to users’ daily lives — something Companies like Google They have already been accused of abuse.

Leave a ReplyCancel Reply