Two university students built an Amnesty International speech model to compete with a notebook

A pair of university students, not with extensive experience of artificial intelligence, they say they have created the Amnesty International model available publicly that can create similar podcast clips Google’s Notebooklm.

The artificial speech tools market is vast and growing. ElevenLabs is one of the largest players, but there is no shortage of competitors (see Playaiand SesameAnd so on). Investors believe that these tools have huge potential. According to KotkukStartups that developed AI Tech’s voice more than $ 398 million of VC financing last year.

Toby Kim, one of the founders participating in Korea Fire laboratoriesThe group behind the newly released model said that he and his co -founder’s colleague began learning about AI for talking three months ago. Inspired by the Notebooklm notebook, they wanted to create a model that provides more control of generated sounds and “freedom in the text program”.

Kim says they used Google Tpu Research Cloud, which provides free access to researchers to the company’s TPU AI chips, to train the Nari model, DIA. Upon weight at 1.6 billion teachers, DIA can create a dialogue from the text program, allowing users to allocate headphones tones, insert non -behavior, cough, laughter and other non -verbal sermon.

Teachers are the models of the internal variables that you use to make predictions. Generally, models with more better performance parameters.

Available from AI Dev platform Embroidery and GyrroupDIA can work on most modern computers with at least 10 GB of VRAM. It generates a random voice unless a description of the intended pattern is required, but he can also clone the person’s voice.

In the DIA brief techcrunch test Web experimentalDIA worked well, generating chats in two incomplete directions on any theme. The quality of the sounds looks competitive with other tools there, and the function of the audio cloning is among the easiest of this correspondent that this reporter tried.

This is a sample:

Like many audio generatorsDia provides little in the path of guarantees, however. It will be easy to formulate misleading information or a glimpse. In the DIA project pages, Nari does not encourage misuse of the model to impersonate or deceive or engage in illegal campaigns, but the group says it is “not responsible” for misuse.

Nari did not reveal the data that it was scattered to train DIA. DIA is likely to be developed using copyright content – Commentator In the Hacker News it indicates that one sample looks like the NPR “Planet Money” host. Copyright -based content training models are a large -scale, but legalized practice. Some artificial intelligence companies claim that fair use protects them from responsibility, while rights holders confirm that fair use does not apply to training.

However, Kim says that a fiery plan is to create an artificial platform with a “social aspect” at the head of DIA and the largest future models. Nari also plans to issue a DIA technical report, and expand the support of the model to languages that exceed the English language.

Leave a ReplyCancel Reply