Nvidia Unveils AI Model Capable of Modifying Voices and Creating Unique Sounds

Nvidia introduces Fugatto, an AI model for music and audio generation. The technology can modify voices and create unique sounds from text prompts. Nvidia remains cautious about public release due to potential risks and misuse concerns.

The technology, named Fugatto, short for Foundational Generative Audio Transformer Opus 1, is targeted at professionals in the music, film, and video game industries.

As the leading provider of chips and software for AI systems globally, Nvidia clarified that there are no immediate plans to make the technology publicly available. Fugatto joins a range of similar technologies demonstrated by both startups like Runway and major players such as Meta Platforms, enabling the generation of audio or video content from text prompts.

Based in Santa Clara, California, Nvidia's model can create sound effects and music based on text descriptions, including unconventional sounds like transforming a trumpet into a dog's bark. What sets this AI technology apart is its capability to manipulate existing audio, such as converting a piano melody into a human voice or adjusting accents and emotions in spoken recordings.

Bryan Catanzaro, Nvidia's Vice President of Applied Deep Learning Research, highlighted the transformative potential of generative AI in various creative fields. He expressed, "I think that generative AI is going to bring new capabilities to music, to video games and to ordinary folks that want to create things."

While discussions are ongoing between companies like OpenAI and Hollywood studios regarding the use of AI in the entertainment industry, tensions have arisen following allegations from Hollywood actress Scarlett Johansson against OpenAI for mimicking her voice.

Nvidia's innovative model was trained using open-source data, and the company is deliberating on the possibility of a public release. Catanzaro emphasised the need for caution due to potential risks associated with generative technology, including misuse and copyright infringement.

Creators of generative AI models are grappling with challenges in preventing abuse of the technology, such as the dissemination of misinformation or unauthorised use of copyrighted material. OpenAI and Meta have not disclosed their plans for releasing their audio or video-generating models to the public.

Nvidia introduces Fugatto, an AI model for music and audio generation
The technology can modify voices and create unique sounds from text prompts
Nvidia remains cautious about public release due to potential risks and misuse concerns

Source: REUTERS