Meta has launched a new AI tool that “promises” to let small business owners easily add a soundtrack to their latest video.
“That’s the promise of AudioCraft — our latest AI tool that generates high-quality, realistic audio and music from text.”
Essentially, a marketer could type into the system “advert scene with a man walking on a gravel drive” and AudioCraft will provide an audio sample that matches that description. The marketer could then use the audio in any context.
Faster, more efficient production
Having access to an unlimited amount of samples that can be created instantly and off-the-cuff has the potential to save businesses a lot of time and money, as marketers can cut down on the cost of hiring and paying for musicians or licenses for adverts and videos. Meta also says that AudioCraft can help lubricate the creative flow.
“We see the AudioCraft family of models as tools for musicians and sound designers to provide inspiration, help people quickly brainstorm and iterate on their compositions in new ways.”
A three-part system
AudioCraft consists of three parts: MusicGen, which generates music from text prompts, AudioGen, which generates audio from text prompts and EnCodec, which allows for high-quality music generation with fewer artefacts.
🎵 Today we’re sharing details about AudioCraft, a family of generative AI models that lets you easily generate high-quality audio and music from text.https://t.co/04XAq4rlap pic.twitter.com/JreMIBGbTF
— Meta Newsroom (@MetaNewsroom) August 2, 2023
MusicGen was trained on Meta-owned and licensed music, while AudioGen was trained on public sound effects.
“We’re also releasing our pre-trained AudioGen models, which let you generate environmental sounds and sound effects like a dog barking, cars honking, or footsteps on a wooden floor. And lastly, we’re sharing all of the AudioCraft model weights and code.”
A development of MusicGen
AudioCraft builds on the work already done on MusicGen, which creates brand-new music based on user prompts, such as text input or even humming a tune. The model has been trained on more than 20,000 hours of music and can provide a range of AI-generated output.
The full version hasn’t been released yet, although users can play around with a demo that will generate 12 seconds of audio based on the description (or audio file) provided.
A late bloomer
While we’ve seen huge strides taken in generative AI for video, images and text, audio has been left behind. Meta blames that on the “highly complicated” nature of generating audio, which requires precise and complex programming.
“Generating high-fidelity audio of any kind requires modelling complex signals and patterns at varying scales. Music is arguably the most challenging type of audio to generate as it’s composed of local and long-range patterns, from a suite of notes to a global musical structure with multiple instruments.”
While there are benefits to be gained from using generative AI for audio, there could be risks as well. Google was recently sued for allegedly using “stolen” data to train its AI products, reported CNN.
If any copyright complications arise because of the way Meta has trained AudioCraft’s AI, it could be extremely problematic for brands who use it to create music or sound effects for its videos.
However, Meta appears to be conscious of this problem – which could be why it took pains to state MusicGen “was trained on Meta-owned and specifically licensed music” and AudioGen was trained “on public sound effects”.
Based on this information, it appears likely that any sound it generates will be free from copyright violations – although it would probably be wise to watch this space to see if any issues pop up with it.
On a non-copyright basis, using only AI to generate sounds could result in marketers publishing videos with ‘samey’ audio, and with that in mind, marketers may wish to proceed cautiously.