ElevenLabs CEO says voice will become the next major interface for AI.
Speaking at Web Summit in Doha, Staniszewski mentioned that voice models, such as those developed by ElevenLabs, have evolved past basic speech imitation—including tone and emotion—and are now being integrated with the reasoning abilities of large language models. He said this advancement is driving a fundamental change in the way users interact with technology.
That vision helped drive ElevenLabs’ recent $500 million funding round, which valued the company at $11 billion, and it reflects a broader shift across the AI industry. Both OpenAI and Google have made voice a core component of their next-generation models, while Apple is reportedly developing always-on, voice-related technologies through acquisitions such as Q.ai. As AI expands into wearables, vehicles, and other forms of hardware, interaction is moving away from touchscreens toward spoken commands, positioning voice as a central arena in the next stage of AI innovation.
Sharing a similar perspective onstage at Web Summit, Iconiq Capital general partner Seth Pierrepont noted that although screens will remain important for gaming and entertainment, conventional input tools like keyboards are beginning to feel increasingly outdated.
Pierrepont added that as AI systems become more agentic, the nature of interaction will evolve as well, with models equipped with guardrails, integrations, and contextual awareness that allow them to respond with far less explicit instruction from users.
Staniszewski highlighted this move toward agentic AI as one of the most significant shifts underway, explaining that future voice-based systems will depend more on long-term memory and accumulated context rather than detailed, step-by-step commands, resulting in more intuitive and effortless user experiences.
He added that this progression will shape how voice models are rolled out. While advanced audio models have traditionally relied on cloud-based processing, Staniszewski said ElevenLabs is moving toward a hybrid model that combines cloud infrastructure with on-device computing. This approach is designed to support emerging hardware such as headphones and other wearables, where voice is always present as an ongoing interface rather than something users activate only when needed.
ElevenLabs is already collaborating with Meta to integrate its voice technology into products like Instagram and Horizon Worlds, Meta’s virtual reality platform. Staniszewski also said he would be open to partnering with Meta on its Ray-Ban smart glasses as voice-first interfaces continue to expand into new device categories.
However, as voice technology becomes more continuous and deeply embedded in everyday devices, it raises significant concerns around privacy, surveillance, and the volume of personal data these systems may collect as they move closer to users’ daily routines—issues that companies such as Google have previously faced criticism over.