Home NewsX Practical Guide to Azure Custom Neural Voice: Essential Tips for Success

Practical Guide to Azure Custom Neural Voice: Essential Tips for Success

by info.odysseyx@gmail.com
0 comment 11 views


Teaser image created by DALL E 3Teaser image created by DALL E 3

Custom Neural Voices (CNV) is a feature of Azure Cognitive Services that lets you create personalized synthetic voices for your applications. This text-to-speech feature lets you use human speech samples as training data to develop a voice that sounds very natural for your brand or character.

Recently, while working on a project involving custom voice generation, I encountered some features and hidden issues that are not covered in this document. Official Document. So, I would like to share some tips and tricks in this article. The theoretical aspects are well documented, so the advice in this article is mainly based on my personal experience. I hope you find these insights useful. Let’s get started!

Audio recording

alibekjakupov_0-1724680414279.pngFirst, you need to prepare a balanced script. It is more important to have a good mix of questions, exclamations, and statements than to ensure that the training set closely matches the target domain. In short, a good dataset should include:

  • Statement: 70-80%

  • Questions: 10-20% and equal number of rising and falling tones (yes/no questions use rising tones, while wh questions use falling tones very commonly)

  • Exclamations: 10-20%

  • Short words/phrases: 10%

sound editing software

Screenshot 2024-08-26 16.12.49.png

There are several possible solutions, such as Adobe Audition or Audacity. I recommend using Audacity. Not only because Adobe Audition is paid, but also because Audacity’s limited features are ideal for our needs. We just need to select the speech, export it, and cut it. Minimalism is the key to success. Audacity also makes it easy to navigate the track and minimizes the unnecessary toolbox.

The File menu in Audacity provides commands to create, open, and save projects, and import and export audio files. For example, the Export function is not assigned by default, so you can easily create a shortcut to export a selection. This speeds up the process considerably. In my experience using both Adobe Audition and Audacity, I was able to complete the same amount of work in two days using Audacity, compared to four days using Adobe Audition.

price

alibekjakupov_2-1724691773353.pngHere are my project details:

  • Model Type : Nerve V5.2022.05

  • Engine version : 2023.01.16.0

  • Training time : 30.48

  • Data size : 440 statements

  • price: $1584.27

Pricing may vary depending on engine version and number of training hours, but you will at least get a sample.

Intake form

Screenshot 2024-08-26 16.22.36.png

You probably know that access is granted only after you complete the Intake Form and that decisions are made based on eligibility and usage criteria. Before providing any project information, please refer to the following: Microsoft’s Responsible AI StandardsThis will allow you to tailor your description and scenario accordingly.

Prepare your audio

Screenshot 2024-08-26 16.28.36.pngThe process is very simple. Create a notepad with all the utterances and their IDs. Select the utterances one by one, export them, save them with their IDs, and then delete them from the notepad. Define the optimal size in advance and do not zoom in or out while working. You will become familiar with the timeline size and will be able to add the 100-200 milliseconds of silence you need more easily.





Source link

You may also like

Leave a Comment

Our Company

Welcome to OdysseyX, your one-stop destination for the latest news and opportunities across various domains.

Newsletter

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

Laest News

@2024 – All Right Reserved. Designed and Developed by OdysseyX