Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124

Unsurprisingly, a lot of the news and product updates Adobe dropped this week were… Centered around generative artificial intelligence. But while most of this year has seen huge leaps in image and video generation, Adobe is focused on raising the bar on its AI offerings in another area: AI voice.
The two new features, Soundtrack Creation and Speech Creation, do exactly what their names suggest. You can create background music and record texts for your video. But each comes with practical controls that make AI audio less of a gamble and more of a useful tool for creators of all skill levels. It’s available in beta now.
Adobe is also releasing a beta of its latest 5th generation Firefly Image Model. It promises to be better at producing realistic images, and you can now use instant editing. There’s also a new experimental video editor for Firefly that comes with a multi-track timeline meant to help you piece together AI-generated clips. Adobe is also expanding its partnerships with two new AI companies, ElevenLabs and Topaz Labs. For more artificial intelligence news, you can learn about AI assistants are coming to Photoshop and Express.
Below is an example of how you could be asked to write your AI music description.
Licensing music is complicated, especially for commercial use. So let me start with the most important part: any music created using the Firefly soundtrack is granted a worldwide license, meaning you can use it for any purpose, indefinitely. Adobe creates its AI tools using content (in this case, audio) that it has permission to use in training the AI. So, in theory, you shouldn’t have to remove Firefly AI’s audio from YouTube or other platforms or get a horrific copyright strike.
“This is a unique time in the world where music licensing is at the forefront of everyone’s mind and creators are either frustrated because they’re trying to do the best thing for their content, or they’re overwhelmed,” Jay LeBoeuf, head of audio AI at Adobe, said in an interview. “So we just hope to clear up the confusion.”
In the demo, Firefly rejected a claim containing the artist’s name because it violated its user guidelines due to copyright concerns. Since the model isn’t trained in Taylor Swift’s music, for example, she can’t create music similar to her own.
Now, the fun stuff: Soundtrack Creation is Adobe’s first AI-powered music tool, designed to take the guesswork out of what you want. You upload your video, and the AI analyzes it. Based on its evaluation, Firefly will write a prompt that it thinks might work well for your video. It’s a Mad Libs style prompt, and you can switch up the descriptions as you see fit. The claim consists of three parts: a description of the overall appearance, style (think type), and purpose (commercial, experimental, etc.). You can also adjust the tempo and energy level.
Once you’re satisfied with your prompt, click Create and less than two minutes later, you’ll have just four different instrument variations ready to play. Your audio will be the same length as your video, but you can adjust it as needed. You can upload videos up to five minutes long.
Now you can experience AI music creation for your videos. Soundtrack creation and speech creation are both available through Firefly and are in beta. Check to see if your Adobe plan includes access to Firefly, and if not, you can get a plan Starting at $10 per month.
Once you have a track you like, you can download the full video (or just the soundtrack) to your computer.
This is an example of four music tracks Firefly made for an AI video it made of some people partying on the beach.
Creating speech in Firefly is simple, and includes a lot of features that will make it useful for almost any project. It’s a simple window where you can type in the words you want the AI voice to read. You can also upload text of up to 7,500 characters – a video approximately 15-20 minutes long. Once downloaded, you can choose from 50 voices, each tagged with an approximate age and gender, including non-binary options. You can generate speech in 20 different languages. But the fun part is what you can do to adjust your claim.
Speaking is more than just reading words on a page. When we read long paragraphs or talk to others, we naturally add emphasis, emotion, and rhythm to our speech. With the new software, you can do the same, adding pauses where you want the AI to take a break and highlighting sections where the tone should change.
If you’re like me and someone doesn’t pronounce your name correctly on the first try, you can use the Fix Pronunciation tool to ensure there aren’t any mistakes. Select a noun or proper name and then add phonetic detail, and the AI will use that to facilitate pronunciation.
These tools, along with your hands-on ability to fine-tune specific sections, are meant to give you more control, something other text-to-speech software doesn’t always provide.
“It’s a way for us to provide a vibrant discourse to creatives, small business owners, educators, everyone who has a story to tell, and maybe they don’t feel as comfortable as us pulling out the mic and speaking,” LeBoeuf said.
Firefly Voice is a completely new paradigm of artificial intelligence. But this is not your only option. Adobe has been steadily adding to its list of third-party AI models this year, for both AI-powered video and images. It expands these choices once again by including ElevenLab’s multilingual V2 model as a speech generation option.
For more, see How the Adobe Project Indigo camera app works, now with iPhone 17 support.