Nvidia unveils the ‘Swiss Army Knife’ of AI audio tools: Fugato

High-performance computer chip maker Nvidia on Monday unveiled a new AI model developed by its researchers that can create or transform any mix of music, voice and sound narrated with prompts using any combination of text and audio files.

The new AI model called Fugato – for Foundational Generative Audio Transformer Opus – can create a music snippet based on a text prompt, remove or add instruments from an existing song, change the accent or emotion in a voice, even create sounds that can do things that have never been heard before. .

According to Nvidia, supporting numerous audio generation and transformation tasks, Fugato is the first foundational generative AI model to exhibit emergent properties — capabilities that emerge from the interaction of its various trained capabilities — and the ability to combine free-form instructions.

“We wanted to create a model that understands and produces sound like a human,” Rafael Valle, Nvidia’s manager of applied audio research, said in a statement.

“Fugato is our first step into a future where the multitasking learning in audio synthesis and transformation is derived from data and models at scale,” he added.

Nvidia notes that the model is capable of handling tasks it was not previously trained on, as well as generating sounds that change over time, such as the Doppler effect of lightning when a rainstorm passes through an area.

The company added that unlike most models, which can only recreate the training data they’ve been exposed to, Fugato allows users to create never-before-seen soundscapes, such as early morning thunder accompanied by birdsong.

Breakthrough AI model for audio transformation

“NVIDIA’s introduction of Fugato marks a significant advance in AI-powered audio technology,” observed Kaveh Vahdat, founder and president. RiseOppA national CMO services firm based in San Francisco.

“Unlike existing models that specialize in specific tasks — such as music composition, voice synthesis or sound effect generation — Fugato offers a unified framework capable of handling a variety of audio-related functions,” he told TechNewsWorld. “This versatility positions it as a comprehensive tool for audio synthesis and conversion.”

Vahdat explained that Fugatto distinguishes itself through its ability to generate and convert audio based on both text instructions and optional audio input. “This dual-input approach enables users to create complex audio outputs that seamlessly blend different elements, such as combining the melody of a saxophone with the timbre of a mewing cat,” he said.

Additionally, he continued, fugato’s ability to embed instructions in voice synthesis allows for fine-grained control over features such as articulation and emotion, providing a level of customization not typically found in current AI audio tools.

“Fugato is a remarkable step towards AI that can handle multiple processes simultaneously,” added Benjamin LeeProfessor of Engineering at the University of Pennsylvania.

“Using both text and audio input together can produce much more efficient or effective models than using text alone,” he told TechNewsWorld. “The technology is interesting because, looking beyond just text, it expands the amount of training data and the capabilities of generative AI models.”

Nvidia is at its best

Mark N. Vena, President and Chief Analyst SmartTech Research In Las Vegas, emphatically stated that Fugato represents Nvidia at its best.

“The technology introduces advanced capabilities in AI audio processing by enabling the conversion of existing audio into entirely new formats,” he told TechNewsWorld. “These include converting a piano melody into a human vocal line or altering the accent and emotional tone of spoken words, providing unprecedented flexibility in audio manipulation.”

“Unlike existing AI audio tools, Fugatto can generate novel sounds from text descriptions, such as making a trumpet sound like a barking dog,” he said. “These features provide creators of music, movies and gaming with innovative tools for sound design and audio editing.”

Fugato deals with audio holistically — a wide range of sound effects, music, voice, virtually any type of audio, including sounds that have never been heard before — and rightly so, added Ross Rubin, principal analyst. Reticle ResearchA consumer technology consulting firm in New York City.

He gave an example of this listenA service that uses AI to create music. “They’ve just released a new version that has improvements to how human voices and other things are produced, but it doesn’t allow for the kind of precise, creative changes that fugato allows, like adding new instruments to a mix, changing moods from happy to for sadness, or transposing a song from a minor key to a major key,” he told TechNewsWorld.

“Its understanding of the world of audio and the flexibility it offers goes beyond the mask-specific engines we’ve seen for things like generating a human voice or creating a song,” he said.

Opens doors for creatives

Vahadat points out that fugato can be effective in both advertising and language teaching. Agencies can create customized audio content that aligns with brand identity, including voiceovers with specific accents or emotional tones, he noted.

At the same time, in language learning, educational platforms will be able to create personalized audio materials, such as dialogues with different pronunciations or emotional contexts, to support language acquisition.

“Fugato technology opens the door to a wide range of applications in the creative industries,” maintains Vena. “Filmmakers and game developers can use it to create unique soundscapes, such as turning everyday sounds into spectacular or immersive effects,” he said. “It also holds the potential for personalized audio experiences in virtual reality, assistive technology and education, tailoring sounds to specific emotional tones or user preferences.”

“In music production,” he adds, “it can transform instrumental or vocal styles to explore innovative compositions.”

However, further development may be required to obtain better musical results. “All of these results are trivial, and some have been around for longer—and better,” observed Denise Bathory-KitzA musician and composer from Northfield Falls, Vt.

“The voice isolation was clumsy and unmusical,” he told TechNewsWorld. “Additional instruments were also trivial, and most of the transitions were colorless. The only advantage is that it requires no special learning, so the development of music for the AI user will be minimal.”

“It might open up some new uses — real musicians are already amazingly inventive — but the results will be dire if developers don’t have better musical chops to start with,” he said. “They will be musical slop to join visual and verbal slop from AI.”

AGI stand-in

With artificial general intelligence (AGI) looming large in the future, Fugatto could be a model for simulating AGI, which ultimately aims to replicate or surpass human cognitive abilities in a wide range of tasks.

“Fugato is part of a solution that uses generative AI in a collaborative bundle with other AI tools to create an AGI-like solution,” explained Rob Enderle, president and principal analyst. Enderle GroupBend, Ore. is an advisory services firm.

“Until we do AGI work, this approach will be the main way to create more complete AI projects with higher quality and interest,” he told TechNewsWorld.

Breakthrough AI model for audio transformation

Nvidia is at its best

Opens doors for creatives

AGI stand-in

Our Company

About Links

Useful Links

Newsletter

Laest News

Nvidia unveils the ‘Swiss Army Knife’ of AI audio tools: Fugato

Breakthrough AI model for audio transformation

Nvidia is at its best

Opens doors for creatives

AGI stand-in

Nvidia Blackwell and the future of data center cooling

https://fundsforindividuals.odysseyx.in/fellowships/submit-applications-for-gerda-henkel-fellowship-program/

You may also like

Leave a Comment Cancel Reply

Our Company

About Links

Useful Links

Newsletter

Laest News