DeepL, known for translating text, now wants to translate your voice

DeepL, a translation company known for its text tools, released a voice-to-voice translation suite today covering use cases such as meetings, mobile and web conversations, and group chats for frontline workers through dedicated apps. The company is also releasing an application programming interface (API) that allows third-party developers and companies to leverage DeepL technology for custom use cases, such as contact centers.

“After spending many years translating text, audio was a natural step for us,” DeepL CEO Jarek Kotilovsky said in an interview with TechCrunch. “We’ve come a long way when it comes to text translation and document translation. But we thought there wasn’t a great product for real-time voice translation.”

The challenges of creating an interpretation product revolve around striking a balance between minimizing latency — the delay between a person speaking and playing the translated audio — and maintaining accurate results, Kotilovsky said.

DeepL is launching extensions for platforms like Zoom and Microsoft Teams, where listeners can either hear real-time translation while others speak in the native languages or follow translated text in real time on screen. This program is currently in early access, and the company invites you to join in Organizations to join the waiting list. The company also has a product for mobile and web conversations that can be conducted in person or remotely.

DeepL also allows users to participate in a group chat in settings such as training courses or workshops, allowing participants to join through a QR code.

DeepL said its voice-to-voice technology can also learn and adapt to custom vocabulary, such as industry-specific terms, company names and personal names.

AI is reimagining what customer service will look like in the coming years, Kotilovsky said. He noted that the translation layer helps companies provide support in languages where hiring qualified employees is scarce and expensive.

TechCrunch event

San Francisco, California
|
October 13-15, 2026

The company said it controls the entire audio-to-audio stack. However, the current system converts speech into text, applies translation, and then converts that back to speech. DeepL believes that because it has been translating texts for years, it has an advantage in translation quality. Going forward, the company wants to develop a comprehensive voice translation model that skips the entire text step.

DeepL faces competition from several well-funded startups operating in adjacent corners of the space. SNAS which was raised last year 65 million dollars From Quadrille Capital and Teleperformance, it uses artificial intelligence to modify a speaker’s accent in real time — a tool primarily aimed at call center agents.

Dubai-based Camb.AI focuses on speech synthesis and translation for media and entertainment companies Amazon Web Services, and helps them Dubbing and localizing video content On a large scale.

Palabra, backed by Reddit co-founder Alexis Ohanian’s Seven Seven Six, is building a real-time speech translation engine designed to preserve sense and meaning. The original voice of the speakerWhich puts it in more direct competition with what DeepL is building now.

Leave a ReplyCancel Reply