Mistral launches a new open source speech generation model

French AI company Mistral released a new open source text-to-speech model on Thursday that can be used by voice AI assistants or in enterprise use cases such as customer support. This model, which allows organizations to build voice agents for sales and customer engagement, puts Mistral in direct competition with the likes of ElevenLabs, Deepgram, and OpenAI.

The new model, called Voxtral TTS, supports nine languages, including English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi and Arabic.

“Our customers asked for a speech model,” Pierre Stock, vice president of scientific operations at Mistral AI, told TechCrunch during a phone interview. “So we built a compact speech model that can fit into a smartwatch, smartphone, laptop, or other peripherals. The cost of this model is a fraction of anything else on the market, but it delivers cutting-edge performance.”

Mistral said the new model can adapt a custom voice to a sample of less than five seconds, as well as pick up characteristics such as subtle accents, inflections, intonation, and irregularities in speech flow. Model, based on Ministry 3bit can switch between languages easily without losing audio characteristics, which is useful in use cases such as dubbing or simultaneous translation. Stock said the company wanted the model to look human, not robotic.

The model was built for real-time performance, according to the company. It has a time to first sound (TTFA) — a measure of when the model begins to “speak” after receiving input — of 90 milliseconds for a 10-second sample of 500 characters. The model also has a Real Time Factor (RTF) of 6x, which means it can render a 10-second clip in approximately 1.6 seconds.

Earlier this year, Mistral was launched A pair of copy modelsone for large batch processing and the other for low-latency real-time use cases. With the new speech model, the company will likely aim to provide a full suite of enterprise voice products.

“We plan to have a comprehensive platform that can handle multimedia input streams, including audio, text, image and output as well. The main benefit of this is that you get more information through a comprehensive proxy system that supports audio as input or output,” Stock said.

TechCrunch event

San Francisco, California
|
October 13-15, 2026

Mistral’s position is that its open source and customization part will help companies embrace their audio models over competitors, as they can configure them the way they want.

Leave a ReplyCancel Reply